AI news
January 27, 2024

Can A Language Model Verify Its Answer?

One of the barriers to the mainstream adoption of LLMs is their tendency to “hallucinate”.

Roshni Ramnani
by 
Roshni Ramnani

One of the main barriers to the mainstream adoption of Large Language Models (LLM’s) is their tendency to “hallucinate” — which simply put is the fact that they provide seemingly accurate but factually incorrect information to user queries. A potential solution to this issue is to refrain from using LLM’s as a knowledge store and instead augment it with external knowledge sources for the LLM’s responses. This approach is known as Retrieval-Augmented Generation (RAG) and is useful in an enterprise context — where the data is company specific or where the data is subject to frequent changes such as the query “who is the CEO of X” [1]

However, what if we do need to use the LLM without external knowledge? or we want to use it to retrieve information that is within the scope of general knowledge but is unlikely to change over time? Till the research community finds a mechanism to remove hallucination in inference time altogether we must consider other options — one such option being evaluating the LLM’s responses for potential hallucinations.

Several approaches have been developed to construct evaluation metrics that check for hallucination. One popular and verified approach is geared towards designing specific metrics for checking the consistancy between multiple responses of the LLM to a particular question. Inconsistancy or contradiction between these responses is usually a sign of hallucination [2][3]. This consistancy can be verified either by using an external language model, or the very same model that generated the response.

The latest research in language models involves increasing the reasoning ability of these models by enabling them to generate internal steps and critique along with the ability to revise their responses accordingly. In line with this trend, an interesting approach for detecting hallucination uses a technique called Chain-Of- Verification (CoVE)[4]. As the title suggests — this involves prompting the language model to help it deliberate on its initial response, check if it contains hallucinations and output a final more accurate response. Here the language model starts with a draft response, plans a set of questions to verify its initial answer and then systematically answers those revised questions to produce an improved response. This turns out to be a very effective self contained method for reducing hallucinations.

The figure below demonstrates an example of CoVE in action. When a query is given to the LLM ( in this case ChatGPT) , the first response ( or baseline response ) to a user query actually contains some hallucinations ( see Hillary Clinton and Micheal Bloomberg). The steps involving the generation of verification questions and the answering of the verification questions highlights this discrepancy. This enables the model to reason and provide a more accurate answer in the subsequent response.

Chain-of-Verification (CoVe) method. [4]

As the use cases and uptake of language models increase, solving practical issues like hallucination becomes extremely important. Research is underway to mitigate this issue at least at two levels — during the training phase, as well as the generation (or inference) phase. We discussed one such promising method that can be applied at the inference phase — called CoVE, which at its core uses a prompting technique to help the language model reason better and spot its own hallucinations.

References:

  1. https://www.promptingguide.ai/techniques/rag
  2. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models: https://arxiv.org/abs/2303.08896
  3. Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation: https://arxiv.org/abs/2305.15852
  4. Chain-of-Verification Reduces Hallucination in Large Language Models: https://arxiv.org/abs/2309.11495