Anthropic API Pricing: The Real Cost of Claude for Coding Agents (2026)

Anthropic API Pricing by Model (April 2026)

All prices per million tokens (MTok). Anthropic offers three current-generation models, each at a different price-performance point. Pricing verified against platform.claude.com on April 5, 2026.

$5 / $25

Opus 4.6 (input / output)

$3 / $15

Sonnet 4.6 (input / output)

$1 / $5

Haiku 4.5 (input / output)

All Active Claude Models

Model	Input (per MTok)	Output (per MTok)	Context Window
Claude Opus 4.6	$5.00	$25.00	1M (GA)
Claude Opus 4.5	$5.00	$25.00	200K
Claude Sonnet 4.6	$3.00	$15.00	1M (GA)
Claude Sonnet 4.5	$3.00	$15.00	200K
Claude Sonnet 4	$3.00	$15.00	200K
Claude Haiku 4.5	$1.00	$5.00	200K
Claude Haiku 3.5	$0.80	$4.00	200K
Claude Haiku 3	$0.25	$1.25	200K

The current generation (Opus 4.6, Sonnet 4.6, Haiku 4.5) represents a 67% cost reduction from earlier Opus models. Opus 4.1 cost $15/$75 per MTok. If you are still on Opus 4.1, switching to 4.6 cuts your bill by two-thirds with equal or better performance.

Fast Mode: 6x Premium

Opus 4.6 offers a fast mode at $30/$150 per MTok (6x standard rates). Significantly faster output, but the cost adds up quickly for agent workloads. Fast mode is not available with the Batch API. For most coding agent use cases, standard Sonnet 4.6 at $3/$15 delivers better cost-per-task than Opus fast mode.

Why Coding Agents Cost More Than Chat

A chatbot sends a message, gets a response, and moves on. A coding agent reads files, calls tools, handles errors, retries, reads more files, and accumulates context across dozens of turns. The token consumption profile is fundamentally different.

Context Accumulation

Every file read, tool output, and error trace gets appended to the conversation. A 30-turn agent session on a medium codebase easily hits 200K-500K input tokens. At Opus 4.6 rates, 500K input tokens costs $2.50 per session before any output.

Search Overhead

Cognition measured that 60% of coding agent compute goes to search and context retrieval, not code generation. The agent reads 10 files to edit 1. You pay input token rates for all 10 files, but only the edit generates value.

Error Recovery Loops

When a coding agent generates code that fails a test, it re-reads the error output, the failing test, the original file, and tries again. Each retry resends the full conversation history. Three retries on a 100K-token context adds 300K tokens of pure overhead.

Extended Thinking Overhead

Extended thinking tokens are billed as output tokens. A complex reasoning step might generate 2,000-5,000 thinking tokens before producing 500 tokens of visible output. At Opus 4.6 output rates ($25/MTok), thinking tokens on complex problems can 3-5x your output cost.

Anthropic reports that the average Claude Code user spends about $6/day, with 90% of users under $12/day. But these are averages across all use cases. Developers running autonomous agents on large codebases regularly hit $20-50/day. The difference comes from context length: a 30-turn session at 300K tokens per turn costs 10x more than a 30-turn session at 30K tokens per turn.

Batch API: 50% Off Everything

The Batch API processes requests asynchronously within a 24-hour window and charges half price on both input and output tokens. Every Claude model is eligible. No code changes beyond switching to the batch endpoint.

Batch API Pricing (50% off standard)

Model	Standard Input	Batch Input	Standard Output	Batch Output
Opus 4.6	$5.00	$2.50	$25.00	$12.50
Sonnet 4.6	$3.00	$1.50	$15.00	$7.50
Haiku 4.5	$1.00	$0.50	$5.00	$2.50
Haiku 3	$0.25	$0.125	$1.25	$0.625

For coding agents, batch processing works for non-interactive tasks: code review across a repository, bulk test generation, documentation generation, or offline analysis. You cannot use batch for interactive agent sessions where the developer is waiting for results.

Batch + Caching Stack

The batch discount stacks with prompt caching. A batch request with a cache hit on Opus 4.6 costs $0.25 per MTok input, down from $5.00 standard. That is a 95% reduction.

Prompt Caching: 90% Off Repeated Context

Prompt caching is the single most impactful cost lever for coding agents. System prompts, file contents, and conversation history that repeat across requests get cached. Subsequent cache hits cost 10% of the standard input price.

Cache Operation Pricing

Operation	Price Multiplier	Duration	Break-even
5-minute cache write	1.25x base input	5 minutes	1 cache hit
1-hour cache write	2x base input	1 hour	2 cache hits
Cache hit (read)	0.1x base input	Same as write	Immediate savings

Concrete numbers: a coding agent with a 50,000-token system prompt (instructions, repository map, coding conventions) on Sonnet 4.6. Standard cost per request for that prompt: $0.15. With 5-minute caching, the first request costs $0.1875 (write). Every subsequent request within 5 minutes costs $0.015 (hit). Over 20 requests in a session, total system prompt cost drops from $3.00 to $0.47.

Cache Hit Pricing by Model (per MTok)

Model	Standard Input	Cache Hit (0.1x)	Savings
Opus 4.6	$5.00	$0.50	90%
Sonnet 4.6	$3.00	$0.30	90%
Haiku 4.5	$1.00	$0.10	90%
Haiku 3	$0.25	$0.025	90%

Automatic Caching

Add a single cache_control field at the top level of your API request and Anthropic manages cache breakpoints automatically as conversations grow. For fine-grained control, place cache_control on individual content blocks.

Extended Thinking Token Costs

Extended thinking lets Claude reason step-by-step before producing a response. The thinking tokens are billed as output tokens at the same rate as visible output. There is no separate pricing tier for thinking.

Extended Thinking Cost Impact

Scenario	Visible Output	Thinking Tokens	Total Output Cost (Opus 4.6)
Simple code edit	500 tokens	0	$0.0125
Complex refactor	500 tokens	2,000 tokens	$0.0625
Multi-file architecture	1,000 tokens	5,000 tokens	$0.15
Debugging with reasoning	800 tokens	4,000 tokens	$0.12

For coding agents, extended thinking is valuable on complex problems but expensive on routine ones. The thinking budget is configurable (minimum 1,024 tokens). Setting an appropriate budget per task type prevents runaway thinking costs. Simple file edits do not need 5,000 tokens of reasoning. Architecture decisions do.

Reducing Context Size with Compaction

Prompt caching reduces the cost of repeated context. Compaction reduces the size of context itself. Different problem, complementary solution.

As a coding agent runs, its context window fills with file contents, tool outputs, error traces, and prior conversation turns. Much of this is redundant or no longer relevant. A file read from turn 3 that was edited in turn 15 is stale. An error trace from a failed attempt that was since fixed is noise.

Morph Compact compresses agent context at 33,000 tokens per second with 50-70% reduction. Every surviving line is character-for-character from the original input. No paraphrasing, no hallucinated details, no information synthesis. The model selects which lines to keep and discards the rest.

50-70%

Context size reduction

33,000

Tokens per second throughput

Hallucination risk (extractive only)

The cost math: a coding agent session with 400K tokens of accumulated context, running on Sonnet 4.6. Without compaction, every subsequent turn sends 400K input tokens at $3/MTok = $1.20 per turn. After compaction to 160K tokens (60% reduction), each turn costs $0.48. Over a 20-turn continuation, that saves $14.40 in input costs alone.

Agent Session Cost: With vs Without Compaction

Metric	No Compaction	With Morph Compact	Savings
Context size (after 30 turns)	400K tokens	160K tokens	60%
Input cost per turn (Sonnet 4.6)	$1.20	$0.48	$0.72/turn
20-turn continuation cost	$24.00	$9.60	$14.40
Effective input cost (Sonnet 4.6)	$1.20/turn	$0.48/turn	60% lower per turn

Compaction + Caching Stack

Compaction and prompt caching attack different parts of the bill. Caching reduces the per-token price of repeated context. Compaction reduces the number of tokens you send in the first place. A 400K context compacted to 160K, with the remaining tokens served from cache at 0.1x rates, brings effective input cost down by 96%.

Anthropic vs OpenAI Pricing for Coding Agents

Per-token, OpenAI is generally cheaper. But per-task cost depends on how many tokens each model needs to complete the work, and how effectively you can cache and compress context.

Head-to-Head: Coding Agent Models (April 2026)

Factor	Claude Sonnet 4.6	GPT-5.4	Claude Opus 4.6	GPT-5.2
Input price (per MTok)	$3.00	$2.50	$5.00	$1.75
Output price (per MTok)	$15.00	$15.00	$25.00	$14.00
Batch discount	50%	50%	50%	50%
Cache hit discount	90% off input	~50% off input	90% off input	~50% off input
Context window	1M (GA)	128K	1M (GA)	128K
Extended thinking	Yes	N/A (use o3)	Yes	N/A (use o3)

Two factors shift the math in Claude's favor for coding agents. First, Claude's prompt caching discount (90% off cache hits) is deeper than OpenAI's cached input discount (~50% off). For agents that reuse large system prompts and file contexts across turns, this makes Claude's effective input cost lower despite higher sticker price. Second, Claude's 1M context window (GA on Opus 4.6 and Sonnet 4.6, standard pricing, no surcharge) avoids the chunking overhead that 128K-limited models require on large codebases.

The counterargument: OpenAI's GPT-5.4-mini at $0.15/$0.60 is dramatically cheaper for simple tasks (test generation, docstrings, formatting). A tiered approach using Claude for complex reasoning and a cheaper model for routine tasks gives the best cost profile.

Cost Optimization Playbook

Four techniques stack to cut Anthropic API costs for coding agents. Each one is independent; combining all four gives the maximum reduction.

1. Right-size Your Model

Use Haiku 4.5 ($1/$5) for classification, linting, and simple edits. Use Sonnet 4.6 ($3/$15) for code generation and multi-step reasoning. Reserve Opus 4.6 ($5/$25) for architecture decisions and complex debugging. Most coding tasks do not need Opus. Morph routes between models automatically based on task complexity.

2. Enable Prompt Caching

System prompts, repository maps, and coding conventions repeat on every request. Caching these at 10% of input price saves 85-90% on the static portion of your context. Break-even is one cache hit for 5-minute caching. If your agent makes more than one request per 5 minutes (it does), caching pays for itself immediately.

3. Compact Context Between Turns

After 20-30 turns, agent context is 60-80% stale information: superseded file reads, resolved errors, abandoned approaches. Morph Compact compresses context by 50-70% at 33,000 tok/s, keeping only relevant lines. Fewer tokens sent = lower cost, proportionally.

4. Batch Non-Interactive Work

Code review, test generation, documentation, and bulk analysis do not require real-time responses. Route these through the Batch API for a flat 50% discount. The 24-hour processing window is fine for CI/CD pipeline tasks that run overnight.

Stacked Savings: Sonnet 4.6 Input Cost per MTok

Configuration	Price per MTok	Savings vs Standard
Standard	$3.00	0%
+ Prompt caching (hit)	$0.30	90%
+ Context compaction (60%)	$0.12 effective	96%
+ Batch processing	$0.06 effective	98%

The compaction row works differently from the others. Caching and batch reduce the per-token price. Compaction reduces the number of tokens. If you compact 300K tokens to 120K tokens, you send 120K tokens at whatever per-token rate applies. The $0.12 effective rate reflects sending 120K tokens at the $0.30 cache-hit rate, divided by the original 300K to get an effective per-MTok cost on the original context.

Frequently Asked Questions

How much does the Anthropic API cost?

Per million tokens: Opus 4.6 costs $5 input / $25 output. Sonnet 4.6 costs $3 / $15. Haiku 4.5 costs $1 / $5. Haiku 3 (still available) costs $0.25 / $1.25. Batch processing gives 50% off all models. Prompt caching gives 90% off repeated input. These discounts stack.

How much does a Claude coding agent cost per day?

Anthropic reports an average of $6/day per developer for Claude Code, with 90% of users under $12/day. Heavy autonomous agent usage on large codebases can reach $20-50/day. The main variable is context length per turn: sessions that accumulate 300K+ tokens per request cost 10x more than sessions that stay under 30K through compaction and good context management.

What is the Batch API discount?

50% off both input and output tokens. Requests are processed asynchronously within 24 hours. Every Claude model is eligible. The discount stacks with prompt caching. Opus 4.6 batch + cache hit costs $0.25 per MTok input, down from $5.00 standard. That is 95% savings.

How does prompt caching work?

You mark portions of your prompt for caching. The first request pays a write premium (1.25x for 5-minute cache, 2x for 1-hour cache). Subsequent requests within the cache window read from cache at 10% of the standard input price. For coding agents, system prompts and file contexts are natural candidates. Break-even is one cache hit on 5-minute caching.

Is Claude cheaper than OpenAI for coding?

Per-token, OpenAI is cheaper at most tiers. GPT-5.4 costs $2.50/$15 vs Claude Sonnet 4.6 at $3/$15. But Claude's prompt caching (90% off) is deeper than OpenAI's (~50% off), and Claude's 1M context window avoids chunking overhead. Per-task cost depends on your workload. For long-context coding with heavy caching, Claude is often cheaper. For short, simple tasks, OpenAI wins. See our detailed comparison.

What are Anthropic's rate limit tiers?

Four tiers based on deposit amount. Tier 1 ($5 deposit): 50 RPM. Tier 2 ($40): 1,000 RPM. Tier 3 ($200): 2,000 RPM. Tier 4 ($400): 4,000 RPM. The 1M context window is GA on Opus 4.6 and Sonnet 4.6 at standard pricing across all tiers. Cached tokens do not count against input token-per-minute limits, so caching effectively multiplies your throughput.

How does extended thinking affect cost?

Thinking tokens are billed as output tokens at output rates. On Opus 4.6 ($25/MTok output), a response with 2,000 thinking tokens and 500 visible output tokens costs $0.0625. Without thinking, the same visible output costs $0.0125. The budget is configurable (minimum 1,024 tokens). Set appropriate limits per task type to control cost.

How does Morph Compact reduce Claude API costs?

Morph Compact compresses agent context by 50-70% before sending it to Claude. Fewer input tokens = lower cost, proportionally. A 400K context compacted to 160K saves 60% on input costs every turn. Compact processes at 33,000 tok/s with zero hallucination risk because it is extractive (every surviving line is verbatim from the original). See how Compact works.

Reduce your Claude API costs with Morph Compact

Morph Compact compresses coding agent context by 50-70%, cutting your Anthropic API bill proportionally. 33,000 tok/s. Extractive only. No hallucination risk.

Try Morph Compact

See Compact Benchmarks

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers