Evaluation & Safety

Hallucination Detection and Mitigation

Understanding why LLMs hallucinate, how to detect fabricated information, and techniques to reduce hallucination rates in production systems

Published: 2026-04-14 · Last updated: 2026-04-13

Hallucination Detection and Mitigation

Hallucination is when an LLM confidently produces information that is incorrect, fabricated, or unsupported by evidence. It's the most significant barrier to using LLMs in high-stakes applications.

Types of Hallucination

Type	Description	Example
Intrinsic	Contradicts the provided context	Context says "CEO: Jane", model says "CEO: John"
Extrinsic	Adds information not in context	Context doesn't mention revenue, model invents "$5B revenue"
Factual	Contradicts real-world facts	"The Eiffel Tower is in London"
Faithfulness	Contradicts itself	"Founded in 2010... established 15 years ago" (in 2026)

Why LLMs Hallucinate

Training objective: Models are trained to predict plausible text, not true text
No ground truth access: Models don't "know" facts — they predict likely token sequences
Pressure to respond: Models are trained to always produce an answer, even when uncertain
Amplification by temperature: Higher temperature increases creativity but also fabrication
Prompt-induced: Leading prompts can cause the model to "play along" with false premises

# Prompt-induced hallucination
prompt = "Tell me about Apple's revolutionary iPhone 17, released in 2024."
# The iPhone 17 hasn't been released. The model may still generate plausible-sounding
# details about a non-existent product because the prompt implies it exists.

Detection Techniques

1. Self-Consistency Check

Generate multiple responses and check for contradictions:

def detect_hallucination_by_consistency(model, prompt, n=5) -> dict:
    """If the model gives different answers, flag as uncertain."""
    responses = [model.generate(prompt, temperature=0.7) for _ in range(n)]
    
    # Extract key claims from each response
    claims_per_response = [extract_claims(r) for r in responses]
    
    # Check for contradictions
    contradictions = find_contradictions(claims_per_response)
    
    return {
        "hallucination_risk": "high" if contradictions else "low",
        "consistency_score": 1 - (contradictions / max_possible_contradictions),
        "contradictions": contradictions,
    }

2. NLI-Based Detection

Use Natural Language Inference to check if claims are supported by context:

from transformers import pipeline

nli_model = pipeline("text-classification", model="roberta-large-mnli")

def check_claim_against_context(claim: str, context: str) -> dict:
    """Check if a claim is entailed by, neutral to, or contradicts the context."""
    result = nli_model({"text": context, "text_pair": claim})
    
    return {
        "claim": claim,
        "label": result["label"],  # ENTAILMENT, NEUTRAL, CONTRADICTION
        "confidence": result["score"],
        "supported": result["label"] == "ENTAILMENT" and result["score"] > 0.7,
        "contradicted": result["label"] == "CONTRADICTION",
    }

3. Fact-Checking Against Knowledge Base

def fact_check_claims(response: str, knowledge_base: dict) -> list[dict]:
    """Check extracted claims against a knowledge base."""
    claims = extract_claims(response)
    results = []
    
    for claim in claims:
        if claim["entity"] in knowledge_base:
            kb_fact = knowledge_base[claim["entity"]]
            if kb_fact != claim["value"]:
                results.append({
                    "claim": claim,
                    "kb_fact": kb_fact,
                    "status": "contradiction",
                })
            else:
                results.append({
                    "claim": claim,
                    "status": "verified",
                })
        else:
            results.append({
                "claim": claim,
                "status": "unknown",
            })
    
    return results

4. Confidence-Based Detection

def get_model_confidence(model, prompt: str) -> float:
    """Estimate model confidence from output probabilities."""
    outputs = model.generate(prompt, output_scores=True, return_dict_in_generate=True)
    scores = outputs.scores
    
    # Average the top token probability across all generated tokens
    confidences = [torch.softmax(s, dim=-1).max(dim=-1).values for s in scores]
    avg_confidence = torch.stack(confidences).mean().item()
    
    return avg_confidence

# Low confidence often correlates with hallucination risk
if get_model_confidence(model, prompt) < 0.6:
    response += "\n\n⚠️ I'm not entirely confident in this answer. Please verify."

Mitigation Strategies

1. Retrieval-Augmented Generation (RAG)

The single most effective mitigation strategy:

def grounded_response(query: str, knowledge_base) -> str:
    """Ground response in retrieved evidence."""
    context = retrieve_relevant(query, knowledge_base, k=5)
    
    prompt = f"""Based ONLY on the following context, answer the question.
If the context doesn't contain enough information, say so clearly.

Context:
{context}

Question: {query}"""
    
    return model.generate(prompt, temperature=0.1)

2. Constrained Generation

Force the model to only use provided information:

prompt = """Answer based on the context provided. Use ONLY information from the context.
If you cannot answer from context alone, say "I cannot answer this from the provided information."

Context: {context}
Question: {query}
Answer:"""

3. Temperature Reduction

# For factual queries, use low temperature
factual_response = model.generate(prompt, temperature=0.1, top_p=0.9)

# Reserve higher temperature for creative tasks
creative_response = model.generate(prompt, temperature=0.7, top_p=0.95)

4. Citation Requirements

prompt = f"""Answer the question and cite which part of the context supports each claim.
Format: [Claim] (Source: relevant context excerpt)

Context: {context}
Question: {query}"""

5. Self-Verification

def self_verify(model, prompt: str, initial_response: str) -> str:
    """Ask the model to verify its own answer."""
    verification_prompt = f"""
I previously answered: "{initial_response}"

Now, critically review this answer:
1. Are all claims supported?
2. Is anything potentially fabricated?
3. What's your confidence level?

If issues are found, provide a corrected version.
"""
    verification = model.generate(verification_prompt, temperature=0.1)
    return verification

Production Hallucination Tracking

class HallucinationTracker:
    def __init__(self):
        self.total_responses = 0
        self.flagged_responses = 0
        self.confirmed_hallucinations = 0
    
    def track(self, response: str, context: str, user_feedback: str = None):
        self.total_responses += 1
        
        # Automated check
        nli_result = check_claim_against_context(response, context)
        if nli_result["contradicted"]:
            self.flagged_responses += 1
        
        # User feedback
        if user_feedback == "incorrect":
            self.confirmed_hallucinations += 1
    
    @property
    def hallucination_rate(self) -> float:
        return self.confirmed_hallucinations / max(self.total_responses, 1)

Key Takeaways

Hallucination is a fundamental LLM limitation, not a bug
RAG is the most effective mitigation strategy — ground responses in evidence
Self-consistency and NLI-based detection catch most hallucinations
Low temperature + constrained prompts reduce but don't eliminate fabrication
Track hallucination rate as a key production metric
Always flag uncertain responses to users

RAG Systems — Grounding responses in evidence
Safety and Red-teaming — Testing for hallucination vulnerabilities
Evaluation Metrics — Measuring factuality

Related docs

Deployment Strategies for Production

Serving LLMs in production — API design, autoscaling, load balancing, monitoring, and reliability patterns for high-availability model serving

Prompt Engineering Architecture Patterns

Reference patterns, tradeoffs, and building blocks for designing prompt engineering systems.

Prompt Engineering Cost and Performance

How to trade off latency, throughput, quality, and spend when operating prompt engineering.