Anthropic API Pricing by Model (April 2026)
All prices per million tokens (MTok). Anthropic offers three current-generation models, each at a different price-performance point. Pricing verified against platform.claude.com on April 5, 2026.
| Model | Input (per MTok) | Output (per MTok) | Context Window |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 200K (1M beta) |
| Claude Opus 4.5 | $5.00 | $25.00 | 200K (1M beta) |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K (1M beta) |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K (1M beta) |
| Claude Sonnet 4 | $3.00 | $15.00 | 200K (1M beta) |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K |
| Claude Haiku 3 | $0.25 | $1.25 | 200K |
The current generation (Opus 4.6, Sonnet 4.6, Haiku 4.5) represents a 67% cost reduction from earlier Opus models. Opus 4.1 cost $15/$75 per MTok. If you are still on Opus 4.1, switching to 4.6 cuts your bill by two-thirds with equal or better performance.
Fast Mode: 6x Premium
Opus 4.6 offers a fast mode at $30/$150 per MTok (6x standard rates). Significantly faster output, but the cost adds up quickly for agent workloads. Fast mode is not available with the Batch API. For most coding agent use cases, standard Sonnet 4.6 at $3/$15 delivers better cost-per-task than Opus fast mode.
Why Coding Agents Cost More Than Chat
A chatbot sends a message, gets a response, and moves on. A coding agent reads files, calls tools, handles errors, retries, reads more files, and accumulates context across dozens of turns. The token consumption profile is fundamentally different.
Context Accumulation
Every file read, tool output, and error trace gets appended to the conversation. A 30-turn agent session on a medium codebase easily hits 200K-500K input tokens. At Opus 4.6 rates, 500K input tokens costs $2.50 per session before any output.
Search Overhead
Cognition measured that 60% of coding agent compute goes to search and context retrieval, not code generation. The agent reads 10 files to edit 1. You pay input token rates for all 10 files, but only the edit generates value.
Error Recovery Loops
When a coding agent generates code that fails a test, it re-reads the error output, the failing test, the original file, and tries again. Each retry resends the full conversation history. Three retries on a 100K-token context adds 300K tokens of pure overhead.
Extended Thinking Overhead
Extended thinking tokens are billed as output tokens. A complex reasoning step might generate 2,000-5,000 thinking tokens before producing 500 tokens of visible output. At Opus 4.6 output rates ($25/MTok), thinking tokens on complex problems can 3-5x your output cost.
Anthropic reports that the average Claude Code user spends about $6/day, with 90% of users under $12/day. But these are averages across all use cases. Developers running autonomous agents on large codebases regularly hit $20-50/day. The difference comes from context length: a 30-turn session at 300K tokens per turn costs 10x more than a 30-turn session at 30K tokens per turn.
Batch API: 50% Off Everything
The Batch API processes requests asynchronously within a 24-hour window and charges half price on both input and output tokens. Every Claude model is eligible. No code changes beyond switching to the batch endpoint.
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 |
| Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
| Haiku 3 | $0.25 | $0.125 | $1.25 | $0.625 |
For coding agents, batch processing works for non-interactive tasks: code review across a repository, bulk test generation, documentation generation, or offline analysis. You cannot use batch for interactive agent sessions where the developer is waiting for results.
Batch + Caching Stack
The batch discount stacks with prompt caching. A batch request with a cache hit on Opus 4.6 costs $0.25 per MTok input, down from $5.00 standard. That is a 95% reduction.
Prompt Caching: 90% Off Repeated Context
Prompt caching is the single most impactful cost lever for coding agents. System prompts, file contents, and conversation history that repeat across requests get cached. Subsequent cache hits cost 10% of the standard input price.
| Operation | Price Multiplier | Duration | Break-even |
|---|---|---|---|
| 5-minute cache write | 1.25x base input | 5 minutes | 1 cache hit |
| 1-hour cache write | 2x base input | 1 hour | 2 cache hits |
| Cache hit (read) | 0.1x base input | Same as write | Immediate savings |
Concrete numbers: a coding agent with a 50,000-token system prompt (instructions, repository map, coding conventions) on Sonnet 4.6. Standard cost per request for that prompt: $0.15. With 5-minute caching, the first request costs $0.1875 (write). Every subsequent request within 5 minutes costs $0.015 (hit). Over 20 requests in a session, total system prompt cost drops from $3.00 to $0.47.
| Model | Standard Input | Cache Hit (0.1x) | Savings |
|---|---|---|---|
| Opus 4.6 | $5.00 | $0.50 | 90% |
| Sonnet 4.6 | $3.00 | $0.30 | 90% |
| Haiku 4.5 | $1.00 | $0.10 | 90% |
| Haiku 3 | $0.25 | $0.025 | 90% |
Automatic Caching
Add a single cache_control field at the top level of your API request and Anthropic manages cache breakpoints automatically as conversations grow. For fine-grained control, place cache_control on individual content blocks.
Extended Thinking Token Costs
Extended thinking lets Claude reason step-by-step before producing a response. The thinking tokens are billed as output tokens at the same rate as visible output. There is no separate pricing tier for thinking.
| Scenario | Visible Output | Thinking Tokens | Total Output Cost (Opus 4.6) |
|---|---|---|---|
| Simple code edit | 500 tokens | 0 | $0.0125 |
| Complex refactor | 500 tokens | 2,000 tokens | $0.0625 |
| Multi-file architecture | 1,000 tokens | 5,000 tokens | $0.15 |
| Debugging with reasoning | 800 tokens | 4,000 tokens | $0.12 |
For coding agents, extended thinking is valuable on complex problems but expensive on routine ones. The thinking budget is configurable (minimum 1,024 tokens). Setting an appropriate budget per task type prevents runaway thinking costs. Simple file edits do not need 5,000 tokens of reasoning. Architecture decisions do.
Reducing Context Size with Compaction
Prompt caching reduces the cost of repeated context. Compaction reduces the size of context itself. Different problem, complementary solution.
As a coding agent runs, its context window fills with file contents, tool outputs, error traces, and prior conversation turns. Much of this is redundant or no longer relevant. A file read from turn 3 that was edited in turn 15 is stale. An error trace from a failed attempt that was since fixed is noise.
Morph Compact compresses agent context at 33,000 tokens per second with 50-70% reduction. Every surviving line is character-for-character from the original input. No paraphrasing, no hallucinated details, no information synthesis. The model selects which lines to keep and discards the rest.
The cost math: a coding agent session with 400K tokens of accumulated context, running on Sonnet 4.6. Without compaction, every subsequent turn sends 400K input tokens at $3/MTok = $1.20 per turn. After compaction to 160K tokens (60% reduction), each turn costs $0.48. Over a 20-turn continuation, that saves $14.40 in input costs alone.
| Metric | No Compaction | With Morph Compact | Savings |
|---|---|---|---|
| Context size (after 30 turns) | 400K tokens | 160K tokens | 60% |
| Input cost per turn (Sonnet 4.6) | $1.20 | $0.48 | $0.72/turn |
| 20-turn continuation cost | $24.00 | $9.60 | $14.40 |
| Stays under 200K threshold | No (long-context premium) | Yes (standard pricing) | Avoids 2x surcharge |
Avoiding the Long-Context Premium
Anthropic charges 2x input and 1.5x output for requests exceeding 200K input tokens. Compacting a 400K context to 160K keeps you under the threshold, avoiding the premium entirely. That is an additional 50% savings on top of the reduction from fewer tokens.
Anthropic vs OpenAI Pricing for Coding Agents
Per-token, OpenAI is generally cheaper. But per-task cost depends on how many tokens each model needs to complete the work, and how effectively you can cache and compress context.
| Factor | Claude Sonnet 4.6 | GPT-5.4 | Claude Opus 4.6 | GPT-5.2 |
|---|---|---|---|---|
| Input price (per MTok) | $3.00 | $2.50 | $5.00 | $1.75 |
| Output price (per MTok) | $15.00 | $15.00 | $25.00 | $14.00 |
| Batch discount | 50% | 50% | 50% | 50% |
| Cache hit discount | 90% off input | ~50% off input | 90% off input | ~50% off input |
| Context window | 200K (1M beta) | 128K | 200K (1M beta) | 128K |
| Extended thinking | Yes | N/A (use o3) | Yes | N/A (use o3) |
Two factors shift the math in Claude's favor for coding agents. First, Claude's prompt caching discount (90% off cache hits) is deeper than OpenAI's cached input discount (~50% off). For agents that reuse large system prompts and file contexts across turns, this makes Claude's effective input cost lower despite higher sticker price. Second, Claude's 200K-1M context window avoids the chunking overhead that 128K-limited models require on large codebases.
The counterargument: OpenAI's GPT-5.4-mini at $0.15/$0.60 is dramatically cheaper for simple tasks (test generation, docstrings, formatting). A tiered approach using Claude for complex reasoning and a cheaper model for routine tasks gives the best cost profile.
Cost Optimization Playbook
Four techniques stack to cut Anthropic API costs for coding agents. Each one is independent; combining all four gives the maximum reduction.
1. Right-size Your Model
Use Haiku 4.5 ($1/$5) for classification, linting, and simple edits. Use Sonnet 4.6 ($3/$15) for code generation and multi-step reasoning. Reserve Opus 4.6 ($5/$25) for architecture decisions and complex debugging. Most coding tasks do not need Opus. Morph routes between models automatically based on task complexity.
2. Enable Prompt Caching
System prompts, repository maps, and coding conventions repeat on every request. Caching these at 10% of input price saves 85-90% on the static portion of your context. Break-even is one cache hit for 5-minute caching. If your agent makes more than one request per 5 minutes (it does), caching pays for itself immediately.
3. Compact Context Between Turns
After 20-30 turns, agent context is 60-80% stale information: superseded file reads, resolved errors, abandoned approaches. Morph Compact compresses context by 50-70% at 33,000 tok/s, keeping only relevant lines. This cuts per-turn input cost proportionally and can keep you under the 200K long-context premium threshold.
4. Batch Non-Interactive Work
Code review, test generation, documentation, and bulk analysis do not require real-time responses. Route these through the Batch API for a flat 50% discount. The 24-hour processing window is fine for CI/CD pipeline tasks that run overnight.
| Configuration | Price per MTok | Savings vs Standard |
|---|---|---|
| Standard | $3.00 | 0% |
| + Prompt caching (hit) | $0.30 | 90% |
| + Context compaction (60%) | $0.12 effective | 96% |
| + Batch processing | $0.06 effective | 98% |
The compaction row works differently from the others. Caching and batch reduce the per-token price. Compaction reduces the number of tokens. If you compact 300K tokens to 120K tokens, you send 120K tokens at whatever per-token rate applies. The $0.12 effective rate reflects sending 120K tokens at the $0.30 cache-hit rate, divided by the original 300K to get an effective per-MTok cost on the original context.
Frequently Asked Questions
How much does the Anthropic API cost?
Per million tokens: Opus 4.6 costs $5 input / $25 output. Sonnet 4.6 costs $3 / $15. Haiku 4.5 costs $1 / $5. Haiku 3 (still available) costs $0.25 / $1.25. Batch processing gives 50% off all models. Prompt caching gives 90% off repeated input. These discounts stack.
How much does a Claude coding agent cost per day?
Anthropic reports an average of $6/day per developer for Claude Code, with 90% of users under $12/day. Heavy autonomous agent usage on large codebases can reach $20-50/day. The main variable is context length per turn: sessions that accumulate 300K+ tokens per request cost 10x more than sessions that stay under 30K through compaction and good context management.
What is the Batch API discount?
50% off both input and output tokens. Requests are processed asynchronously within 24 hours. Every Claude model is eligible. The discount stacks with prompt caching. Opus 4.6 batch + cache hit costs $0.25 per MTok input, down from $5.00 standard. That is 95% savings.
How does prompt caching work?
You mark portions of your prompt for caching. The first request pays a write premium (1.25x for 5-minute cache, 2x for 1-hour cache). Subsequent requests within the cache window read from cache at 10% of the standard input price. For coding agents, system prompts and file contexts are natural candidates. Break-even is one cache hit on 5-minute caching.
Is Claude cheaper than OpenAI for coding?
Per-token, OpenAI is cheaper at most tiers. GPT-5.4 costs $2.50/$15 vs Claude Sonnet 4.6 at $3/$15. But Claude's prompt caching (90% off) is deeper than OpenAI's (~50% off), and Claude's larger context window (200K-1M) avoids chunking overhead. Per-task cost depends on your workload. For long-context coding with heavy caching, Claude is often cheaper. For short, simple tasks, OpenAI wins. See our detailed comparison.
What are Anthropic's rate limit tiers?
Four tiers based on deposit amount. Tier 1 ($5 deposit): 50 RPM. Tier 2 ($40): 1,000 RPM. Tier 3 ($200): 2,000 RPM. Tier 4 ($400): 4,000 RPM and access to the 1M token context window. Cached tokens do not count against input token-per-minute limits, so caching effectively multiplies your throughput.
How does extended thinking affect cost?
Thinking tokens are billed as output tokens at output rates. On Opus 4.6 ($25/MTok output), a response with 2,000 thinking tokens and 500 visible output tokens costs $0.0625. Without thinking, the same visible output costs $0.0125. The budget is configurable (minimum 1,024 tokens). Set appropriate limits per task type to control cost.
How does Morph Compact reduce Claude API costs?
Morph Compact compresses agent context by 50-70% before sending it to Claude. Fewer input tokens = lower cost, proportionally. A 400K context compacted to 160K saves 60% on input costs every turn. It also keeps requests under the 200K long-context threshold, avoiding the 2x input premium. Compact processes at 33,000 tok/s with zero hallucination risk because it is extractive (every surviving line is verbatim from the original). See how Compact works.
Reduce your Claude API costs with Morph Compact
Morph Compact compresses coding agent context by 50-70%, cutting your Anthropic API bill proportionally. 33,000 tok/s. Extractive only. No hallucination risk.