RAG
RAG is a pattern where an LLM retrieves relevant snippets from an external store at query time and conditions its answer on them, keeping knowledge fresh.
Retrieval-Augmented Generation (RAG) is a pattern where an LLM retrieves relevant content from an external data store at query time and conditions its response on that content. Instead of hoping the model remembers a fact from training, you fetch the fact, stuff it into the context window, and ask the model to answer using it.
Why it matters
RAG solves two problems: stale knowledge (training cutoffs are months to years old) and context limits (you can't fit a 10M-line codebase in any window). For agentic coding specifically, RAG lets you build a "talk to your codebase" experience where the agent fetches only the files or functions relevant to a query instead of reading everything.
Many AI coding tools use lightweight RAG under the hood. Claude Code, Codex CLI, and similar agents default to direct file reads but can pair with MCP servers that implement RAG over a repo, a wiki, or external docs. Running them in SpaceSpider doesn't change the retrieval story — it just hosts the CLI.
How it works
A typical RAG pipeline:
- Index — split source documents into chunks, compute an embedding for each, store in a vector database
- Query — embed the user's question, find the top-K nearest chunks
- Augment — construct a prompt that includes the retrieved chunks alongside the question
- Generate — call the LLM with that prompt
Variations include hybrid search (combining keyword BM25 with vector similarity), reranking (a second model reorders retrieved chunks), and multi-hop retrieval (the model requests more chunks based on intermediate reasoning).
Production RAG pipelines are surprisingly hard to tune. Chunk size, overlap, retrieval count, embedding model, and rerank strategy all affect quality.
How it's used
RAG use cases in developer tooling:
- "Ask my docs" — retrieve from a wiki, answer with citations
- Code search — semantic
grepover a monorepo - Agent tools that retrieve from company knowledge before editing
- Long-term memory — store past conversations as embeddings, retrieve relevant bits next time
For simpler cases, modern long-context models make naive RAG unnecessary — just pass the whole file. See /blog/rag-vs-long-context for the tradeoff.
Related terms
- Embedding — the vector form retrieval uses
- Vector database — where embeddings live
- Context window — what RAG populates
- MCP — a common way to expose RAG to coding agents
- LLM — what consumes the retrieved context
FAQ
Is RAG dead now that context windows are huge?
For some use cases, yes — you can just dump a 100k-token file and skip retrieval. But RAG still wins on cost, latency, and very large corpora (millions of tokens) that don't fit any window.
Do I need RAG for Claude Code?
Not for most work. Claude Code's built-in file tools do directed retrieval on demand. RAG matters when your target is outside the local filesystem — external docs, another system's data, long-term memory.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.