LLM Hallucination: What it Is and Why it Happens
LLMs are a powerful tool. But they hallucinate, i.e., they make things up. The fastest and most effective way to minimize chatbot hallucination is to use Gleen AI.
Hallucinations in large language models are the equivalent of a drifting mind in the generative AI field. While they might seem harmless and even humorous, they can also have serious implications.
In this article, we’ll explore what an LLM hallucination is. We’ll also discuss why it happens to help you address the problem.
What Are LLM Hallucinations?
A large language model or LLM is a type of artificial intelligence (AI) algorithm that recognizes, decodes, predicts, and generates content.
While the model derives some knowledge from its training data, it is prone to “hallucinate.” A hallucination in LLM is a response that contains nonsensical or factually inaccurate text.
What Is an Example of an LLM Hallucination?
A recent Tidio survey found that 72% of users trust AI to provide factual and reliable information. Moreover, 75% of those respondents reported AI to mislead at least once.
So, be mindful of the following examples of what LLMs hallucinate:
Source conflation involves the model generating factual contradictions. This LLM hallucination problem occurs because the model tries to combine extracted details from various sources.
Sometimes, an LLM can even make up sources. (See examples below.)
Language models cannot differentiate between a truth and a lie. As a result, LLMs can generate content with no factual foundations.
Examples of LLM hallucinations with factual errors less common when you trained on accurate data.
Remember, however, that pre-trained models like GPT-3.5 and GPT-4 are trained on the entire internet, and the internet contains many factual errors. As such, it’s still good practice to fact-check everything an LLM generates.
LLMs simply predict the next most probable word in a sentence.
Most of the time, the content they generate makes sense. However, they can also produce grammatically correct text that doesn’t make sense. Worse yet, LLMs can produce responses that sound convincing and authoritative, when in fact the response has no factual basis whatsoever.
Generally, examples of hallucinations in LLMs are harmless and even humorous. Yet, in an interview with Datanami, Got It AI co-founder Peter Relan said that ChatGPT “makes up stuff” 20% of the time.
Here are some cases wherein hallucinations in LLMs could have serious implications:
Lawyers Who Cited Fake Cases
According to a Forbes article, two lawyers might face sanctions for citing six non-existent cases. Steven Schwartz, one of the lawyers involved, said that he sourced the fake court cases from ChatGPT.
Falsely Accused Professor
The Washington Post reported that a ChatGPT response accused a law professor of sexual harassment. The LLM even cited a non-existent Washington Post article as the information source.
Inaccurate Summarization of a Court Case
When asked for a summary of the Second Amendment Foundation v. Ferguson case, ChatGPT responded with factually inaccurate information. The response mentioned that SAF founder Alan Gottlieb sued Georgia radio host Mark Walters.
Moreover, the LLM hallucination mentioned that the latter defrauded and embezzled funds from the foundation. Consequently, Walters filed a lawsuit against OpenAI LLC, claiming that every detail in the summary was false.
What Causes a Hallucination in LLM?
Research from a new start-up found that ChatGPT hallucinates about 3 percent of the time. That’s not surprising, especially since deep learning models can exhibit unpredictable behavior.
In response to LLM hallucination problems, stakeholders must take the initiative in implementing safe, secure, and trustworthy AI. Moreover, it doesn’t hurt to understand why these instances occur.
So, what causes an LLM to hallucinate?
- Unverified training data – By itself, a large language model cannot distinguish between fact and fiction. So, when you feed it with diverse and large training data without properly verifying the sources, it can pick up factual inaccuracies.
- Inadequate/inaccurate prompt context – LLMs may erratically behave when you use inadequate or inaccurate prompts. Moreover, it may generate an incorrect or irrelevant response when your objectives are vague.
- Misaligned objectives – Most public LLMs underwent training for general natural language processes. These models need extra help when inferring responses for domain-specific subjects like law, medicine, and finance.
- Most importantly, it’s just probabilities – LLMs simply predict the next most probable word in a conversation, not the most accurate word. LLMs have no idea if the generated response is accurate or not.
Can LLM Hallucination Be Prevented?
You might wonder if you can eliminate hallucinations in LLMs.
Well, the short answer is “no.”
Hallucinations naturally occur in LLMs because they are a critical part of how these models operate. However, you might notice that some LLMs hallucinate more than others.
A 2023 study fed LLMs 1,000 documents, one at a time. The researchers asked LLMs like GPT, Llama, Cohere, Google Palm, and Mistral to summarize each of the documents. The study found that these LLMs hallucinated 3% to 27% of the time.
So, you can consider hallucination a built-in feature in LLMs.
Is Hallucination a Bad Thing?
While hallucinations in LLMs can have serious implications, they are not always a bad thing. For instance, they can be useful when you want to give more creative freedom to the model.
Since LLMs use training data, hallucinations would enable them to generate unique scenes, characters, and storylines for novels.
Another example where LLM hallucination can be beneficial is when you’re looking for diversity.
The training data will provide the model with the ideas you need. However, allowing it to hallucinate to some extent can let you explore different possibilities.
On the other hand, hallucinations in LLMs can be a bad thing in use cases where accuracy is extremely important. They can propagate false information, amplify societal prejudices, and reduce the credibility of deep-learning models.
Moreover, LLM hallucination can be a problem for fully automated services. For example, chatbot hallucinations can cause poor customer experience.
Pro Tip: Follow our guide on how to select an AI chatbot for customer service.
How to Minimize Hallucinations in Large Language Models
While you cannot prevent hallucinations in LLMs, there are ways to minimize them. You can use the following techniques:
- Highly descriptive prompts – You provide the model with additional context by feeding it with highly descriptive prompts. By giving it clarifying features, it may be less likely to you’re grounding it in truth. Consequently, it is less likely to hallucinate.
- LLM fine-tuning – You can feed the LLM with domain-specific training data. Doing so will fine-tune the model, allowing it to generate more relevant and accurate responses.
- Retrieval-augmented generation (RAG) – This framework focuses on providing LLMs with both the end user question and context around the question, i.e., the most accurate and relevant information in the knowledge base. The LLM uses the both the question and the context to generate or more accurate and relevant response.
- Solve hallucination outside the LLM – Instead of trying to minimize hallucination at the LLM layer, you can deploy generative AI solution like Gleen AI. Gleen AI solves hallucination at the layer of the chatbot by carefully selecting inputs to the LLM and detecting hallucination in the LLM output.
Gleen AI – The Practical and Commercially Available Solution
Note that fine-tuning an LLM model requires consistent training. As such, it can come with significant cost and time implications.
Moreover, it can be challenging to pour most of your resources into RAG and prompt engineering, and RAG and prompt engineering won't eliminate hallucination.
Fortunately, you can deploy a custom generative AI chatbot like Gleen AI. As a commercially available solution, Gleen AI proactively prevents hallucination.
In fact, 80% of Gleen AI’s tech stack focuses on preventing hallucinations. Moreover, unlike LLM fine-tuning, this tool can be ready within a few hours.
Here's a video of Gleen AI versus an Open AI GPT trained on the same knowledge. Gleen AI doesn't hallucinate. The GPT does:
Gleen AI has built a proprietary RAG technology to ensure that it constantly generates highly relevant and accurate responses with minimal hallucination.