Context window
The context window is the maximum number of tokens an LLM can consider at once — the hard limit on how much conversation and code it can see.
The context window is the maximum number of tokens an LLM can consider in a single forward pass. It's a hard limit on everything the model sees at once — system prompt, conversation history, tool definitions, tool observations, and any retrieved snippets. When you exceed it, the client must truncate, summarize, or fail.
Why it matters
Context window is the single biggest pragmatic constraint on agentic coding. Every file read, every bash output, every diff, and every human instruction consumes tokens from a budget that's often 200k or 1M. On long tasks the window fills, and the agent either forgets earlier steps or wastes turns recompacting.
Models with larger windows — Kimi (via Kimi CLI), Claude (via Claude Code), Gemini — handle bigger codebases without losing state. Running several agents in a SpaceSpider grid layout, each with its own window, is a practical way to sidestep the limit: split the task, give each agent a bounded scope.
How it works
Every input character is first tokenized (see token). The model computes attention over all tokens in the window, so compute scales roughly quadratically with window length — which is why "cheap frontier model + long context" often isn't as cheap as the per-token price suggests.
The client manages what fits:
- Rolling truncation — drop the oldest messages
- Summarization — replace old turns with a compact summary
- RAG — retrieve only the most relevant chunks instead of dumping everything
- Caching — providers like Anthropic let you mark prefixes as cacheable so long system prompts don't recompute
How it's used
Practical context-window techniques:
- Keep system prompts tight so more room is left for the task
- Use subagents for scoped investigation so the parent's window stays clean
- Prefer targeted
read_filecalls over dumping whole directories viacat - For very large repos, pair with embeddings + retrieval
See /blog/managing-context-in-claude-code for deeper strategies.
Related terms
- Token — the atomic unit of the window
- LLM — where the window lives
- RAG — how to cheat the window size
- Subagent — the standard context-hygiene tool
- Hallucination — what overflowed context often causes
FAQ
Is a bigger context window always better?
No. Larger windows cost more (compute is quadratic) and performance can degrade on the "lost in the middle" problem — models over-weight the beginning and end and under-attend the middle. A compact, well-organized 40k window often beats a sprawling 400k one.
How do I check token usage during a session?
Most CLIs expose a status line or command showing current token count (Claude Code has /context, Codex CLI similar). Watch it on long sessions to decide when to compact or restart.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.