Fine-tuning
Fine-tuning adapts a pre-trained LLM to a specific task or domain by continuing training on a smaller, targeted dataset.
Fine-tuning is the process of continuing training a pre-trained LLM on a smaller, focused dataset to specialize it for a task or domain. The base model already "knows" language from pretraining; fine-tuning teaches it to answer in a specific format, adopt a voice, or handle a narrow workload (like your company's ticketing system) better than generic prompting could.
Why it matters
Before fine-tuning, strong prompt engineering and RAG are cheaper, faster to iterate on, and don't lock you to a frozen model checkpoint. Most teams should exhaust those first. Fine-tuning wins when:
- The task has a consistent format you can't reliably prompt into
- Quality matters more than flexibility
- You need to shave tokens from each call by not re-specifying the task
- You're running a smaller self-hosted model and need it to behave
For agentic coding, most users never fine-tune — they use off-the-shelf Claude, GPT, or Qwen through CLIs like Claude Code, Codex CLI, and Qwen Code. Fine-tuning is usually the domain of platform teams building a specialized AI product.
How it works
Fine-tuning variants from most to least expensive:
- Full fine-tuning — update all model weights. Needs large GPUs and a lot of data. Rare outside labs.
- LoRA / QLoRA — train small low-rank adapters on top of frozen base weights. Fast, cheap, runs on consumer hardware.
- Prefix / prompt tuning — learn soft prompts rather than touching weights. Even lighter.
- Supervised fine-tuning (SFT) — standard next-token loss on input/output pairs.
- RLHF / DPO / ORPO — preference-based training with comparison pairs.
After training, the adapter or full checkpoint is deployed. You call it like any other model.
How it's used
Typical fine-tuning projects:
- Domain-specific copilot — medical, legal, scientific text
- Structured output — consistent JSON shape without heavy schema prompts
- Style matching — corporate voice, documentation tone
- Small-model specialization — LoRA a 7B model into something useful for one narrow task
Most coding CLIs don't need fine-tuning — frontier base models already handle code well. If your team runs a private model, fine-tuning on internal code can be worthwhile.
Related terms
- LLM — the thing being tuned
- RAG — the cheaper-first alternative
- System prompt — another cheaper-first alternative
- Prompt engineering — always try this first
- Embedding — unrelated training, similar tooling
FAQ
Should I fine-tune for better code completion?
Rarely. Base frontier models already outperform most fine-tuned smaller ones for general coding. Fine-tune when you have a specific, repetitive format the base model won't respect even with strong prompting.
How much data do I need?
For LoRA on a narrow task, a few hundred high-quality examples often suffice. For broader behavior changes, thousands to tens of thousands. Data quality dominates quantity.
Related terms
- Agentic codingAgentic coding is software development where an LLM-powered agent plans, edits, runs, and verifies code on its own using tools, not just autocomplete.
- AI pair programmingAI pair programming is a collaboration style where an LLM assistant sits alongside you, suggesting code and reviewing changes in real time as you work.
- ANSI escape codesANSI escape codes are control sequences that terminals interpret for colors, cursor movement, and screen clearing — the language of every modern CLI UI.
- Autonomous agentAn autonomous agent is an AI program that perceives, decides, and acts on its own toward a goal — the architecture behind modern coding CLIs.
- CheckpointA checkpoint is a saved snapshot of file state that lets you roll back an AI coding agent's changes to a known-good point.
- Claude CodeClaude Code is Anthropic's official command-line agent that plans, edits, runs, and verifies code across your repo using Claude models and tool use.