Hallucination
Hallucination is when an LLM produces confident output that is factually wrong, fabricated, or inconsistent with its source material.
Hallucination is when an LLM produces confident output that is factually wrong, fabricated, or inconsistent with its source material. The classic example is citing a paper that doesn't exist or calling a library function with a plausible-but-imaginary signature. In coding, hallucinations often show up as invented APIs, wrong argument orders, or references to packages that aren't installed.
Why it matters
Hallucination is the primary reliability failure mode of AI tools. For agentic coding, a hallucinated import or API call usually breaks immediately when the agent runs the tests — which is actually the saving grace: tight feedback loops catch hallucinations fast. Tools like Claude Code, Codex CLI, Qwen Code, and Kimi CLI all self-correct when a test or compiler rejects the hallucinated output.
The dangerous cases are hallucinations the system doesn't catch: plausible wrong docs, invented facts, or silently incorrect logic that happens to pass the tests you wrote. Review habits matter.
How it works
Hallucination is a natural consequence of how LLMs work: they predict the next token from training-data patterns, without a built-in mechanism to distinguish "I know this" from "this sounds like the kind of thing that would go here." The model has no ground-truth database — only weights.
Factors that increase hallucination:
- Long context windows filled with loosely related material
- Prompts that ask for specifics (names, dates, exact APIs) the model doesn't know
- Low-quality or contradictory training data in the domain
- Very small or heavily quantized models
Factors that reduce it:
- RAG with authoritative sources
- Tool use so the model can check facts instead of guessing
- Strong system prompts telling the model to say "I don't know"
- Fast verification loops (run tests, lint, type-check)
How it's used (managing it)
Practical mitigation in agentic coding:
- Let the agent run tests and iterate — hallucinations fail compilation
- Use
read_fileandgreptools aggressively so the model cites real code - Require citations — "quote the exact line" forces the model to verify
- In plan mode, have the model state what it will check before editing
See /blog/catching-llm-hallucinations-in-code.
Related terms
- LLM — where hallucinations come from
- RAG — a standard mitigation
- Tool use — lets the model look things up
- Context window — long windows can both help and hurt
- Prompt engineering — skill to reduce hallucination
FAQ
Do frontier models still hallucinate?
Yes, just less. Modern Claude, GPT, and Qwen models hallucinate much less on well-known topics but still confidently invent details in long-tail domains.
Is hallucination fixable?
Not completely — it's rooted in how LLMs generate. But it's manageable. Combining retrieval, tools, and test-driven feedback pushes observable hallucination rates to low single-digit percentages on many coding workloads.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.