Definition

Token

A token is the atomic unit an LLM processes — typically a short piece of text (a word, part of a word, or symbol) produced by a tokenizer.

A token is the atomic unit an LLM processes. It's a short piece of text — sometimes a whole word, often a subword fragment, sometimes a single character or symbol — produced by a tokenizer that splits input according to a learned vocabulary. The model sees tokens, not characters, and pricing, context limits, and rate limits are all measured in them.

Why it matters

When a provider says "200k context" they mean tokens, not characters. When they charge "$3 per million input tokens," that's the unit. A rough rule for English: 1 token = 4 characters = 0.75 words. Code can be denser (operators and whitespace) or sparser (long identifiers). Non-English languages often tokenize worse — a paragraph of Chinese or Arabic may take 2-3× more tokens than the equivalent English.

For agentic coding, tokens are the currency you spend. Every file read, every tool output, every reasoning trace uses tokens from the context window and bills against the API. Efficient prompting and tight tool outputs save real money.

How it works

Most modern LLMs use byte-pair encoding (BPE) or a variant like SentencePiece. The tokenizer is trained on a large corpus: frequent sequences ("ing", "the", "print") become single tokens; rare ones get split into more pieces. Every model has its own vocabulary — Claude's tokenizer differs from GPT's, which differs from Qwen's — so token counts aren't directly comparable across providers.

Example (GPT-style tokenizer):

"hello" — 1 token
"hello world" — 2 tokens
"antidisestablishmentarianism" — 5-6 tokens
"SpaceSpider" — 3 tokens (Space, Sp, ider)

Providers usually ship a count_tokens endpoint or SDK helper so you can estimate cost before sending.

How it's used

Practical token awareness:

Reading 10k lines of verbose log probably costs more than reading the 500 lines that matter
Minified code uses more tokens than formatted code (weirdly) because the tokenizer is trained on normal whitespace
Emoji and exotic Unicode can explode token counts — avoid in prompts unless necessary
Caching long system prompts saves input tokens on every subsequent call

LLM — the consumer of tokens
Context window — measured in tokens
Embedding — a vector form, not tokens per se
Hallucination — unrelated but another LLM concept
RAG — reduces token cost by retrieving only what's needed

Token

Why it matters

How it works

How it's used

FAQ

How many tokens is my repo?

Does the model "understand" tokens or characters?

Related terms