Anthropic API Pricing: The Real Cost of Claude for Coding Agents (2026)

Claude Opus 4.6 costs $5/$25 per million tokens. Sonnet 4.6 is $3/$15. Haiku 4.5 is $1/$5. But coding agents burn through tokens 10-50x faster than chat. Batch API saves 50%, prompt caching saves 90%, and Morph Compact cuts context size by 50-70%. Full cost breakdown for agent builders.

April 5, 2026 ยท 1 min read

Anthropic API Pricing by Model (April 2026)

All prices per million tokens (MTok). Anthropic offers three current-generation models, each at a different price-performance point. Pricing verified against platform.claude.com on April 5, 2026.

$5 / $25
Opus 4.6 (input / output)
$3 / $15
Sonnet 4.6 (input / output)
$1 / $5
Haiku 4.5 (input / output)
ModelInput (per MTok)Output (per MTok)Context Window
Claude Opus 4.6$5.00$25.00200K (1M beta)
Claude Opus 4.5$5.00$25.00200K (1M beta)
Claude Sonnet 4.6$3.00$15.00200K (1M beta)
Claude Sonnet 4.5$3.00$15.00200K (1M beta)
Claude Sonnet 4$3.00$15.00200K (1M beta)
Claude Haiku 4.5$1.00$5.00200K
Claude Haiku 3.5$0.80$4.00200K
Claude Haiku 3$0.25$1.25200K

The current generation (Opus 4.6, Sonnet 4.6, Haiku 4.5) represents a 67% cost reduction from earlier Opus models. Opus 4.1 cost $15/$75 per MTok. If you are still on Opus 4.1, switching to 4.6 cuts your bill by two-thirds with equal or better performance.

Fast Mode: 6x Premium

Opus 4.6 offers a fast mode at $30/$150 per MTok (6x standard rates). Significantly faster output, but the cost adds up quickly for agent workloads. Fast mode is not available with the Batch API. For most coding agent use cases, standard Sonnet 4.6 at $3/$15 delivers better cost-per-task than Opus fast mode.

Why Coding Agents Cost More Than Chat

A chatbot sends a message, gets a response, and moves on. A coding agent reads files, calls tools, handles errors, retries, reads more files, and accumulates context across dozens of turns. The token consumption profile is fundamentally different.

Context Accumulation

Every file read, tool output, and error trace gets appended to the conversation. A 30-turn agent session on a medium codebase easily hits 200K-500K input tokens. At Opus 4.6 rates, 500K input tokens costs $2.50 per session before any output.

Search Overhead

Cognition measured that 60% of coding agent compute goes to search and context retrieval, not code generation. The agent reads 10 files to edit 1. You pay input token rates for all 10 files, but only the edit generates value.

Error Recovery Loops

When a coding agent generates code that fails a test, it re-reads the error output, the failing test, the original file, and tries again. Each retry resends the full conversation history. Three retries on a 100K-token context adds 300K tokens of pure overhead.

Extended Thinking Overhead

Extended thinking tokens are billed as output tokens. A complex reasoning step might generate 2,000-5,000 thinking tokens before producing 500 tokens of visible output. At Opus 4.6 output rates ($25/MTok), thinking tokens on complex problems can 3-5x your output cost.

Anthropic reports that the average Claude Code user spends about $6/day, with 90% of users under $12/day. But these are averages across all use cases. Developers running autonomous agents on large codebases regularly hit $20-50/day. The difference comes from context length: a 30-turn session at 300K tokens per turn costs 10x more than a 30-turn session at 30K tokens per turn.

Batch API: 50% Off Everything

The Batch API processes requests asynchronously within a 24-hour window and charges half price on both input and output tokens. Every Claude model is eligible. No code changes beyond switching to the batch endpoint.

ModelStandard InputBatch InputStandard OutputBatch Output
Opus 4.6$5.00$2.50$25.00$12.50
Sonnet 4.6$3.00$1.50$15.00$7.50
Haiku 4.5$1.00$0.50$5.00$2.50
Haiku 3$0.25$0.125$1.25$0.625

For coding agents, batch processing works for non-interactive tasks: code review across a repository, bulk test generation, documentation generation, or offline analysis. You cannot use batch for interactive agent sessions where the developer is waiting for results.

Batch + Caching Stack

The batch discount stacks with prompt caching. A batch request with a cache hit on Opus 4.6 costs $0.25 per MTok input, down from $5.00 standard. That is a 95% reduction.

Prompt Caching: 90% Off Repeated Context

Prompt caching is the single most impactful cost lever for coding agents. System prompts, file contents, and conversation history that repeat across requests get cached. Subsequent cache hits cost 10% of the standard input price.

OperationPrice MultiplierDurationBreak-even
5-minute cache write1.25x base input5 minutes1 cache hit
1-hour cache write2x base input1 hour2 cache hits
Cache hit (read)0.1x base inputSame as writeImmediate savings

Concrete numbers: a coding agent with a 50,000-token system prompt (instructions, repository map, coding conventions) on Sonnet 4.6. Standard cost per request for that prompt: $0.15. With 5-minute caching, the first request costs $0.1875 (write). Every subsequent request within 5 minutes costs $0.015 (hit). Over 20 requests in a session, total system prompt cost drops from $3.00 to $0.47.

ModelStandard InputCache Hit (0.1x)Savings
Opus 4.6$5.00$0.5090%
Sonnet 4.6$3.00$0.3090%
Haiku 4.5$1.00$0.1090%
Haiku 3$0.25$0.02590%

Automatic Caching

Add a single cache_control field at the top level of your API request and Anthropic manages cache breakpoints automatically as conversations grow. For fine-grained control, place cache_control on individual content blocks.

Extended Thinking Token Costs

Extended thinking lets Claude reason step-by-step before producing a response. The thinking tokens are billed as output tokens at the same rate as visible output. There is no separate pricing tier for thinking.

ScenarioVisible OutputThinking TokensTotal Output Cost (Opus 4.6)
Simple code edit500 tokens0$0.0125
Complex refactor500 tokens2,000 tokens$0.0625
Multi-file architecture1,000 tokens5,000 tokens$0.15
Debugging with reasoning800 tokens4,000 tokens$0.12

For coding agents, extended thinking is valuable on complex problems but expensive on routine ones. The thinking budget is configurable (minimum 1,024 tokens). Setting an appropriate budget per task type prevents runaway thinking costs. Simple file edits do not need 5,000 tokens of reasoning. Architecture decisions do.

Reducing Context Size with Compaction

Prompt caching reduces the cost of repeated context. Compaction reduces the size of context itself. Different problem, complementary solution.

As a coding agent runs, its context window fills with file contents, tool outputs, error traces, and prior conversation turns. Much of this is redundant or no longer relevant. A file read from turn 3 that was edited in turn 15 is stale. An error trace from a failed attempt that was since fixed is noise.

Morph Compact compresses agent context at 33,000 tokens per second with 50-70% reduction. Every surviving line is character-for-character from the original input. No paraphrasing, no hallucinated details, no information synthesis. The model selects which lines to keep and discards the rest.

50-70%
Context size reduction
33,000
Tokens per second throughput
$0
Hallucination risk (extractive only)

The cost math: a coding agent session with 400K tokens of accumulated context, running on Sonnet 4.6. Without compaction, every subsequent turn sends 400K input tokens at $3/MTok = $1.20 per turn. After compaction to 160K tokens (60% reduction), each turn costs $0.48. Over a 20-turn continuation, that saves $14.40 in input costs alone.

MetricNo CompactionWith Morph CompactSavings
Context size (after 30 turns)400K tokens160K tokens60%
Input cost per turn (Sonnet 4.6)$1.20$0.48$0.72/turn
20-turn continuation cost$24.00$9.60$14.40
Stays under 200K thresholdNo (long-context premium)Yes (standard pricing)Avoids 2x surcharge

Avoiding the Long-Context Premium

Anthropic charges 2x input and 1.5x output for requests exceeding 200K input tokens. Compacting a 400K context to 160K keeps you under the threshold, avoiding the premium entirely. That is an additional 50% savings on top of the reduction from fewer tokens.

Anthropic vs OpenAI Pricing for Coding Agents

Per-token, OpenAI is generally cheaper. But per-task cost depends on how many tokens each model needs to complete the work, and how effectively you can cache and compress context.

FactorClaude Sonnet 4.6GPT-5.4Claude Opus 4.6GPT-5.2
Input price (per MTok)$3.00$2.50$5.00$1.75
Output price (per MTok)$15.00$15.00$25.00$14.00
Batch discount50%50%50%50%
Cache hit discount90% off input~50% off input90% off input~50% off input
Context window200K (1M beta)128K200K (1M beta)128K
Extended thinkingYesN/A (use o3)YesN/A (use o3)

Two factors shift the math in Claude's favor for coding agents. First, Claude's prompt caching discount (90% off cache hits) is deeper than OpenAI's cached input discount (~50% off). For agents that reuse large system prompts and file contexts across turns, this makes Claude's effective input cost lower despite higher sticker price. Second, Claude's 200K-1M context window avoids the chunking overhead that 128K-limited models require on large codebases.

The counterargument: OpenAI's GPT-5.4-mini at $0.15/$0.60 is dramatically cheaper for simple tasks (test generation, docstrings, formatting). A tiered approach using Claude for complex reasoning and a cheaper model for routine tasks gives the best cost profile.

Cost Optimization Playbook

Four techniques stack to cut Anthropic API costs for coding agents. Each one is independent; combining all four gives the maximum reduction.

1. Right-size Your Model

Use Haiku 4.5 ($1/$5) for classification, linting, and simple edits. Use Sonnet 4.6 ($3/$15) for code generation and multi-step reasoning. Reserve Opus 4.6 ($5/$25) for architecture decisions and complex debugging. Most coding tasks do not need Opus. Morph routes between models automatically based on task complexity.

2. Enable Prompt Caching

System prompts, repository maps, and coding conventions repeat on every request. Caching these at 10% of input price saves 85-90% on the static portion of your context. Break-even is one cache hit for 5-minute caching. If your agent makes more than one request per 5 minutes (it does), caching pays for itself immediately.

3. Compact Context Between Turns

After 20-30 turns, agent context is 60-80% stale information: superseded file reads, resolved errors, abandoned approaches. Morph Compact compresses context by 50-70% at 33,000 tok/s, keeping only relevant lines. This cuts per-turn input cost proportionally and can keep you under the 200K long-context premium threshold.

4. Batch Non-Interactive Work

Code review, test generation, documentation, and bulk analysis do not require real-time responses. Route these through the Batch API for a flat 50% discount. The 24-hour processing window is fine for CI/CD pipeline tasks that run overnight.

ConfigurationPrice per MTokSavings vs Standard
Standard$3.000%
+ Prompt caching (hit)$0.3090%
+ Context compaction (60%)$0.12 effective96%
+ Batch processing$0.06 effective98%

The compaction row works differently from the others. Caching and batch reduce the per-token price. Compaction reduces the number of tokens. If you compact 300K tokens to 120K tokens, you send 120K tokens at whatever per-token rate applies. The $0.12 effective rate reflects sending 120K tokens at the $0.30 cache-hit rate, divided by the original 300K to get an effective per-MTok cost on the original context.

Frequently Asked Questions

How much does the Anthropic API cost?

Per million tokens: Opus 4.6 costs $5 input / $25 output. Sonnet 4.6 costs $3 / $15. Haiku 4.5 costs $1 / $5. Haiku 3 (still available) costs $0.25 / $1.25. Batch processing gives 50% off all models. Prompt caching gives 90% off repeated input. These discounts stack.

How much does a Claude coding agent cost per day?

Anthropic reports an average of $6/day per developer for Claude Code, with 90% of users under $12/day. Heavy autonomous agent usage on large codebases can reach $20-50/day. The main variable is context length per turn: sessions that accumulate 300K+ tokens per request cost 10x more than sessions that stay under 30K through compaction and good context management.

What is the Batch API discount?

50% off both input and output tokens. Requests are processed asynchronously within 24 hours. Every Claude model is eligible. The discount stacks with prompt caching. Opus 4.6 batch + cache hit costs $0.25 per MTok input, down from $5.00 standard. That is 95% savings.

How does prompt caching work?

You mark portions of your prompt for caching. The first request pays a write premium (1.25x for 5-minute cache, 2x for 1-hour cache). Subsequent requests within the cache window read from cache at 10% of the standard input price. For coding agents, system prompts and file contexts are natural candidates. Break-even is one cache hit on 5-minute caching.

Is Claude cheaper than OpenAI for coding?

Per-token, OpenAI is cheaper at most tiers. GPT-5.4 costs $2.50/$15 vs Claude Sonnet 4.6 at $3/$15. But Claude's prompt caching (90% off) is deeper than OpenAI's (~50% off), and Claude's larger context window (200K-1M) avoids chunking overhead. Per-task cost depends on your workload. For long-context coding with heavy caching, Claude is often cheaper. For short, simple tasks, OpenAI wins. See our detailed comparison.

What are Anthropic's rate limit tiers?

Four tiers based on deposit amount. Tier 1 ($5 deposit): 50 RPM. Tier 2 ($40): 1,000 RPM. Tier 3 ($200): 2,000 RPM. Tier 4 ($400): 4,000 RPM and access to the 1M token context window. Cached tokens do not count against input token-per-minute limits, so caching effectively multiplies your throughput.

How does extended thinking affect cost?

Thinking tokens are billed as output tokens at output rates. On Opus 4.6 ($25/MTok output), a response with 2,000 thinking tokens and 500 visible output tokens costs $0.0625. Without thinking, the same visible output costs $0.0125. The budget is configurable (minimum 1,024 tokens). Set appropriate limits per task type to control cost.

How does Morph Compact reduce Claude API costs?

Morph Compact compresses agent context by 50-70% before sending it to Claude. Fewer input tokens = lower cost, proportionally. A 400K context compacted to 160K saves 60% on input costs every turn. It also keeps requests under the 200K long-context threshold, avoiding the 2x input premium. Compact processes at 33,000 tok/s with zero hallucination risk because it is extractive (every surviving line is verbatim from the original). See how Compact works.

Reduce your Claude API costs with Morph Compact

Morph Compact compresses coding agent context by 50-70%, cutting your Anthropic API bill proportionally. 33,000 tok/s. Extractive only. No hallucination risk.