Token
A token is the atomic unit an LLM processes — typically a short piece of text (a word, part of a word, or symbol) produced by a tokenizer.
A token is the atomic unit an LLM processes. It's a short piece of text — sometimes a whole word, often a subword fragment, sometimes a single character or symbol — produced by a tokenizer that splits input according to a learned vocabulary. The model sees tokens, not characters, and pricing, context limits, and rate limits are all measured in them.
Why it matters
When a provider says "200k context" they mean tokens, not characters. When they charge "$3 per million input tokens," that's the unit. A rough rule for English: 1 token = 4 characters = 0.75 words. Code can be denser (operators and whitespace) or sparser (long identifiers). Non-English languages often tokenize worse — a paragraph of Chinese or Arabic may take 2-3× more tokens than the equivalent English.
For agentic coding, tokens are the currency you spend. Every file read, every tool output, every reasoning trace uses tokens from the context window and bills against the API. Efficient prompting and tight tool outputs save real money.
How it works
Most modern LLMs use byte-pair encoding (BPE) or a variant like SentencePiece. The tokenizer is trained on a large corpus: frequent sequences ("ing", "the", "print") become single tokens; rare ones get split into more pieces. Every model has its own vocabulary — Claude's tokenizer differs from GPT's, which differs from Qwen's — so token counts aren't directly comparable across providers.
Example (GPT-style tokenizer):
- "hello" — 1 token
- "hello world" — 2 tokens
- "antidisestablishmentarianism" — 5-6 tokens
- "SpaceSpider" — 3 tokens (
Space,Sp,ider)
Providers usually ship a count_tokens endpoint or SDK helper so you can estimate cost before sending.
How it's used
Practical token awareness:
- Reading 10k lines of verbose log probably costs more than reading the 500 lines that matter
- Minified code uses more tokens than formatted code (weirdly) because the tokenizer is trained on normal whitespace
- Emoji and exotic Unicode can explode token counts — avoid in prompts unless necessary
- Caching long system prompts saves input tokens on every subsequent call
Related terms
- LLM — the consumer of tokens
- Context window — measured in tokens
- Embedding — a vector form, not tokens per se
- Hallucination — unrelated but another LLM concept
- RAG — reduces token cost by retrieving only what's needed
FAQ
How many tokens is my repo?
Roughly (total source chars) / 4. A 200k-line Python project might be 1-3M tokens, far above most windows — which is why embeddings and RAG exist.
Does the model "understand" tokens or characters?
Tokens. The model has never seen raw characters during training; every input it processes has been tokenized. This is why models sometimes struggle with character-level tasks like counting letters in a word.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.