Claude Code API Cost (2026): Per-Token Math + How to Cut It

Claude Code bills by API token consumption. The rates are published: Sonnet 4.6 at $3/$15 per million input/output tokens, Opus 4.6 at $5/$25, Haiku 4.5 at $1/$5. The rate is not what surprises teams. The volume is. A coding agent re-sends the entire conversation on every turn, so a session can burn millions of input tokens from a few thousand characters of typing. Anthropic puts the average at $13 per developer per active day and $150-250 per month.

$0.34

Typical Claude Code session cost

$3 / $15

Sonnet 4.6 per M input / output

$150-250

Per developer per month (Anthropic avg)

0.1x

Cache read price vs base input (90% off)

Does Claude Code Cost Money?

Yes. There are two billing paths. On the API path, you supply an Anthropic API key and pay per token with no free tier for Claude Code itself. On the subscription path, Claude Code is included in Pro ($20/month), Max 5x ($100/month), Max 20x ($200/month), Team, and Enterprise plans. This page is about the API path: what each token costs, what a session costs, and what a team bill looks like.

The published averages from Anthropic's own cost documentation: around $13 per developer per active day and $150-250 per developer per month, with 90% of users staying below $30 per active day. Those are wide ranges because cost depends on model selection, codebase size, and usage patterns like running multiple instances or automation.

For a full plan-by-plan breakdown and the subscription-versus-API breakeven, see Claude Code pricing. For the raw Anthropic rate card across all models, see Anthropic API pricing.

Claude Code Token Cost: Per-Token Rates

Claude Code runs on the standard Claude API rate card. The cost of any call is (input tokens x input price) + (output tokens x output price), plus cache adjustments. Output tokens cost 5x input across the family, and extended-thinking tokens are billed as output, so reasoning-heavy turns are the expensive ones.

Claude API per-token cost (per 1M tokens, 2026)

Model	Input	Output	Cache write (1.25x)	Cache read (0.1x)
Haiku 4.5	$1.00	$5.00	$1.25	$0.10
Sonnet 4.6	$3.00	$15.00	$3.75	$0.30
Opus 4.6	$5.00	$25.00	$6.25	$0.50

Two discounts apply on top of the base rates. Prompt caching charges 1.25x the base input price to write a cache entry and 0.1x to read it, a 90% discount on repeated content. Batch processing takes 50% off both input and output for any request that tolerates a 24-hour turnaround. Caching applies inside Claude Code automatically. Batch does not apply to interactive Claude Code sessions, but it matters for headless and CI runs.

The number that drives your bill

It is not the per-token rate, it is the cache read price. Claude Code re-sends a stable system prompt and growing history on every turn. At a 70% cache hit rate, most of those re-sent tokens cost $0.30/M (Sonnet cache read) instead of $3/M (Sonnet input). The cache hit rate, shown in /cost, is the single best predictor of your monthly spend.

How Claude Code Consumes Tokens

Claude Code does not make one API call per task. It makes dozens. Every call sends four kinds of tokens, and three of them grow over the session:

System prompt and CLAUDE.md: loaded once per session and re-sent on every call. A 200-line CLAUDE.md plus the base system prompt is several thousand tokens that repeat on every turn.
Conversation history: every prior message, every tool call, every tool result. This is the part that grows. By turn 50 it can be 150K-200K tokens.
Tool outputs: file reads, grep results, test logs, command output. A single 10,000-line log file read can add tens of thousands of tokens to the context for the rest of the session.
Output and thinking tokens: the model's response plus extended-thinking tokens, both billed at the output rate (5x input).

The result: the total tokens consumed in a session is typically 50-100x the number of characters you actually typed. You type a 200-character request; Claude Code sends 45K input tokens because it carries the entire working context on every turn.

45K

Input tokens, typical session

13K

Output tokens, typical session

38K

Cache-read tokens, typical session

dozens

API calls per task

Cost of One Session: Worked Example

Take the representative session: 45K input tokens, 13K output tokens, 38K cache-read tokens, running on Sonnet 4.6. Walk the arithmetic line by line.

Input: 45,000 tokens x $3/M = $0.135
Output: 13,000 tokens x $15/M = $0.195
Cache reads: 38,000 tokens x $0.30/M = $0.011
Session total: ~$0.34

Output is the largest line item at $0.195, more than half the session, even though it is the smallest token count. That is the 5x output multiplier at work. The 38K cache-read tokens contribute almost nothing ($0.011) because caching put them at $0.30/M. Without caching, those 38K tokens would cost $0.114 as full-price input, tripling that portion of the bill.

Now run the same session on Opus 4.6 ($5/$25). Input becomes $0.225, output becomes $0.325, cache reads $0.019. Session total: ~$0.57, about 1.7x the Sonnet cost for the same token counts. Run it on Haiku 4.5 ($1/$5): input $0.045, output $0.065, total ~$0.11. Same work, 3x cheaper than Sonnet, 5x cheaper than Opus. This is the entire argument for model routing.

Same session (45K in / 13K out / 38K cache read), three models

Model	Input	Output	Cache reads	Session total
Haiku 4.5	$0.045	$0.065	$0.004	$0.11
Sonnet 4.6	$0.135	$0.195	$0.011	$0.34
Opus 4.6	$0.225	$0.325	$0.019	$0.57

Monthly Team Bill: Worked Example

The per-session number is small. The monthly number is not, because sessions multiply. Take a 20-developer team, each developer running an average of 15 active days per month.

Use Anthropic's published per-developer average of $13 per active day. That gives $13 x 15 days = $195 per developer per month, which lands inside the documented $150-250 range. For 20 developers: $3,900/month, or $46,800/year.

Heavy usage moves the number fast. A developer running long sessions on Opus with low cache hit rates can hit the $30/active-day ceiling that 10% of users exceed. At $30 x 15 days x 20 developers, the same team pays $9,000/month, or $108,000/year. The difference between the two scenarios is not headcount. It is model choice, session length, and cache hit rate.

Light: $13/active day

Anthropic's published average. 20 devs x 15 active days = $3,900/month. Mostly Sonnet, short sessions, /clear between tasks, high cache hit rate.

Heavy: $30/active day

The ceiling 10% of users exceed. 20 devs x 15 active days = $9,000/month. Long Opus sessions, low cache hit rate, broad scanning prompts.

Optimized: routing + compaction

Route easy edits to Haiku, compact context before each call. The same workload at 40-70% lower cost: $1,500-2,500/month for the light team.

The compounding problem

Every unnecessary token in a Claude Code conversation is paid for on every subsequent turn. 1,000 wasted tokens in turn 1 of a 30-turn session is 30,000 tokens of waste, paid at the input rate on every re-send. Across a 20-developer team running 50 sessions/day, persistent context bloat is the difference between the $3,900 bill and the $9,000 bill.

Why Cost Compounds With Context

The pricing page suggests a linear model: more tokens, more cost. Agent usage is not linear in the number of turns, because each turn re-sends everything before it. The cost of turn N includes the full weight of turns 1 through N-1.

A concrete shape: if the conversation grows by 4K tokens per turn, then turn 1 sends 4K, turn 10 sends 40K, turn 50 sends 200K. The cumulative input across 50 turns is not 50 x 4K = 200K. It is the sum 4K + 8K + ... + 200K, which is roughly 5.1 million input tokens. On Sonnet at $3/M that is $15.30 of input for a single long session, before output. This is why a 200K-token conversation costs about 10x what a 20K one costs: the late turns dominate, and they carry the entire history.

Two structural drivers make this worse than it needs to be. First, the system prompt and CLAUDE.md repeat on every call; a 3,000-token preamble across 200 calls is 600K tokens of pure repetition. Second, tool outputs never leave the context on their own. A log file read in turn 5 is still being re-sent in turn 80. Caching addresses the first driver. Compaction addresses the second.

The compounding cost of a growing conversation

// Conversation grows ~4K tokens per turn, Sonnet 4.6 ($3/M input)
// Turn N re-sends the entire history before it.

let tokensThisTurn = 0;
let cumulativeInput = 0;

for (let turn = 1; turn <= 50; turn++) {
  tokensThisTurn += 4000;            // history grows each turn
  cumulativeInput += tokensThisTurn; // every turn pays for all prior turns
}

// cumulativeInput ~= 5,100,000 tokens
// input cost = 5.1M * $3/M = $15.30  (one long session, input only)

// A 20K-token conversation, same 50 turns but no growth:
//   50 * 20K = 1M tokens * $3/M = $3.00
// The growing session costs ~5x more from the SAME number of turns.

The /cost Command and Cache Hit Rate

Claude Code ships two commands that make the bill visible. /cost prints total session cost, API duration, and a per-model breakdown into input, output, cache read, and cache write tokens. /usage shows the same session block plus, on subscription plans, how recent usage maps to skills, subagents, plugins, and MCP servers.

The dollar figure is computed locally from token counts. On Pro and Max plans it is an estimate of what you would have paid via the API; the authoritative number is in the Claude Console. On the API path it tracks your actual spend closely.

The metric to watch is the cache hit rate. Cache reads cost 0.1x the base input rate, roughly 10x less than uncached input. If your cache hit rate is high, most of your re-sent context is billing at $0.30/M (Sonnet) instead of $3/M. If it is low, because you /clear constantly or your prompt prefix keeps changing, you are paying full input price on every re-send. Cache hit rate, not raw token count, is what separates a $0.34 session from a $1+ session.

What /cost prints (illustrative)

/cost

Total cost:            $0.34
Total duration (API):  4m 12.3s
Total duration (wall): 1h 18m 40.1s

Per-model breakdown (claude-sonnet-4-6):
  Input tokens:        45,000   ($0.135)
  Output tokens:       13,000   ($0.195)
  Cache read tokens:   38,000   ($0.011)   <- 90% off input
  Cache write tokens:   2,500   ($0.009)

Cache hit rate:        ~70%     <- the leading cost metric

Built-in habits that cut cost first

Before reaching for external tooling, use what Claude Code gives you: /clear between unrelated tasks so stale context stops billing on every message, specific prompts ("add validation to login in auth.ts") instead of vague ones ("improve this codebase") that trigger broad scanning, and /model to drop to Sonnet or Haiku for routine work. Anthropic's docs list these as the first-line cost levers.

Lever 1: Model Routing (40-70% Savings)

Most requests in a Claude Code session do not need the most expensive model. Adding a comment, renaming a variable, formatting output, writing a boilerplate test. These run as well on Haiku 4.5 at $1/M as on Opus 4.6 at $5/M. Without routing, every request hits whichever model the session is configured to use, usually the most capable and most expensive one.

A model router reads the prompt, classifies the difficulty, and sends each request to the cheapest model that can handle it. Easy tasks go to Haiku, medium tasks to Sonnet, hard tasks (architecture, multi-file debugging) to Opus. The economics work because 60-80% of coding agent requests are routine: if 70% of requests route to a model that costs 5x less on input, the weighted average drops 40-70%.

Routing matrix

Each classified difficulty × ambiguity maps to a Claude model and reasoning effort. Domain overrides match first, then this grid.

Difficulty \ Ambiguity

Clear

Some ambiguity

Vague

Easy

Trivial edits, formatting, simple lookups

Haiku

Low

Haiku

Medium

Haiku

High

Medium

Typical feature work and multi-file edits

Sonnet

Low

Sonnet

Medium

Sonnet

High

Hard

Architecture, tricky debugging, large refactors

Opus

Low

Opus

Medium

Opus

High

Domain overrides

Summary

Haiku

Medium

Each classified difficulty × ambiguity maps to a Claude tier and reasoning effort. Routine turns drop to Haiku; only hard work reaches Opus.

$0.001

Per classification (Morph Router)

~430ms

Router classification latency

60-80%

Requests routable to cheaper models

40-70%

Typical cost savings

Morph Router classifies each request into easy, medium, hard, or needs_info at $0.001 per classification in about 430ms. Your application maps the tier to a model. The router does not change what the agent produces: easy tasks routed to Haiku give the same output as Opus would for those tasks, and genuinely hard prompts still go to the frontier model.

Route Claude Code requests by difficulty

import Morph from "morphllm";

const morph = new Morph({ apiKey: process.env.MORPH_API_KEY });

const MODEL_TIERS = {
  easy:       "claude-haiku-4-5",   // $1/M input
  medium:     "claude-sonnet-4-6",  // $3/M input
  hard:       "claude-opus-4-6",    // $5/M input
  needs_info: "claude-sonnet-4-6",
} as const;

async function routedCompletion(messages: Message[]) {
  const { difficulty } = await morph.router.classify({ messages }); // $0.001
  return morph.chat.completions.create({
    model: MODEL_TIERS[difficulty],
    messages,
  });
}

// 70% easy + 20% medium + 10% hard:
//   weighted input price ~$1.60/M vs $5/M all-Opus = ~68% lower

For multi-agent setups, routing also applies at the role level: a frontier model for the planner, a cheaper model for the executor. See the multi-agent model routing guide and the planner-executor benchmarks.

Lever 2: Prompt Caching (90% Off Repeated Content)

Claude Code re-sends a stable system prompt and CLAUDE.md on every turn. Prompt caching stores the processed representation of that repeated content so the model does not recompute it. The first call pays a 1.25x write premium; every subsequent call with the same prefix reads at 0.1x the base input rate, a 90% discount.

Run the math on a 3,000-token system prompt across a 200-call session on Sonnet 4.6. Without caching: 200 x 3,000 x $3/M = $1.80 for the repeated prompt alone. With caching: one write at $3.75/M ($0.011) plus 199 reads at $0.30/M ($0.179) = $0.19. That is 89% off the system-prompt portion of the bill, and Claude Code applies caching automatically.

Prompt caching on the Anthropic API (Python)

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": SYSTEM_PROMPT,       # ~3,000 tokens, repeats every turn
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=conversation,
)

# First call: 3K tokens at write rate ($3.75/M) = $0.011
# Next 199 calls: 3K tokens at read rate ($0.30/M) = $0.0009 each
# Without caching: 200 * 3K * $3/M = $1.80
# With caching:    $0.011 + 199 * $0.0009 = $0.19  (~89% off)

The lever that compounds with this one is keeping your prefix stable. Every time the cached prefix changes (a CLAUDE.md edit mid-session, a reordered system block), the cache invalidates and the next call pays the write premium again. Stable prefix plus high cache hit rate is the cheapest configuration.

Lever 3: Context Compaction (50-70% Fewer Tokens)

Caching handles the repeated prefix. Compaction handles the part that grows: the conversation history and accumulated tool outputs. Claude Code auto-compacts when it nears the context limit, but by then the agent has already paid full price for 100+ turns of bloated context. The cheaper move is to keep context lean throughout.

Summarization rewrites history in fewer words, but it loses specifics: file paths become "a config file," error codes become "an error," and the agent then spends tokens re-acquiring what the summary discarded. Morph Compact takes a different approach: verbatim deletion. It removes low-signal tokens (redundant formatting, repeated boilerplate, verbose metadata) while keeping every surviving sentence character-for-character identical. File paths, error codes, and function signatures all survive intact.

33,000

Tokens per second (Morph Compact)

50-70%

Typical token reduction

Hallucination rate (verbatim deletion)

every turn

When to compact, not just at the cliff

The cost impact is direct and compounding. Compacting a 200K-token conversation to 80K (60% reduction) cuts the input cost of the next call by 60%, and the compacted history is what gets re-sent on every remaining turn. Over a long session, compacting early saves on every subsequent call, not just the next one.

Compact before each call, then route to a model

import Morph from "morphllm";

const morph = new Morph({ apiKey: process.env.MORPH_API_KEY });

async function compactAndSend(messages: Message[]) {
  const compacted = await morph.compact({
    model: "morph-compact-v1",
    messages,
    system: "Preserve file paths, error codes, function signatures, numbers.",
  });

  return morph.chat.completions.create({
    model: "claude-sonnet-4-6",
    messages: compacted.choices[0].message.content,
  });
}

// Before: 200K tokens/turn * $3/M (Sonnet) = $0.60/turn
// After:    80K tokens/turn * $3/M          = $0.24/turn
// Over 200 turns: $120 vs $48 = 60% lower input cost

Stack the three levers

Caching cuts the repeated prefix 90%. Compaction cuts the growing history 50-70%. Routing cuts the per-token rate 40-70% by sending easy work to cheaper models. They are independent and they stack. Compact first, then cache the compacted version, then route per-model. A session that costs $0.57 on all-Opus drops toward $0.10-0.15 with all three applied. See LLM cost optimization for the combined-savings math.

Claude Code Cost Calculator

To estimate your own bill, you need four inputs: tokens per session (input + output + cache), sessions per active day, active days per month, and your model mix. The formula:

Monthly cost = sessions/day x active days/month x (input x in-rate + output x out-rate + cache-read x 0.1 x in-rate)

Worked through with the representative session on Sonnet: $0.34/session x 6 sessions/day x 18 active days = ~$37/month per light developer. The same developer on Opus with longer sessions ($1.20/session) x 8 sessions/day x 20 days = ~$192/month, which is where most of Anthropic's $150-250 range comes from. The two variables that move your number most are model choice (Haiku-to-Opus is a 5x input swing) and cache hit rate (a 70% hit rate roughly halves the effective input cost of re-sent context).

Estimated monthly cost per developer by profile

Profile	Per session	Sessions/mo	Monthly cost
Light (Sonnet, /clear, high cache)	$0.34	~108	~$37
Average (mixed models)	$0.80	~200	~$160
Heavy (Opus, long sessions)	$1.20	~160	~$192
Optimized (routing + compaction)	$0.12	~200	~$24

The optimized row is the same workload as the average row, run through routing and compaction. The per-session cost drops from $0.80 to $0.12 because easy turns move to Haiku and the growing history is compacted before each call. Same code produced, 85% lower bill.

Frequently Asked Questions

Does Claude Code cost money?

Yes. On the API path you pay per token with no free tier for Claude Code; on the subscription path it is included in Pro ($20/mo), Max ($100-200/mo), Team, and Enterprise. Anthropic's published averages are about $13 per developer per active day and $150-250 per developer per month, with 90% of users under $30 per active day. See Claude Code pricing for the plan breakdown.

How much do Claude Code tokens cost?

Per million tokens: Sonnet 4.6 is $3 input / $15 output, Opus 4.6 is $5/$25, Haiku 4.5 is $1/$5. Cache writes cost 1.25x base input; cache reads cost 0.1x (90% off). Output is 5x input, and extended-thinking tokens bill as output. Full rate card: Anthropic API pricing.

What is the cost of a typical Claude Code session?

About $0.34 on Sonnet: roughly 45K input, 13K output, 38K cache-read tokens. The output line dominates ($0.195) because output costs 5x input. The same session is ~$0.11 on Haiku and ~$0.57 on Opus. Run /cost to see your exact figure with a per-model breakdown.

Why does Claude Code use so many tokens?

It re-sends the full conversation on every turn. Each tool call adds output, each response adds reasoning, and all of it is re-sent on the next call. A 200K-token conversation costs about 10x a 20K one per turn. The system prompt and CLAUDE.md also repeat on every call. Total tokens consumed run 50-100x the characters you typed.

How do I see Claude Code cost per token in my session?

Run /cost or /usage inside Claude Code. /cost prints total session cost, API duration, and a per-model breakdown of input, output, cache read, and cache write tokens. On Pro and Max the dollar figure is an estimate of API-equivalent cost; the authoritative number is in the Claude Console. Watch the cache hit rate, it predicts your spend.

How can I reduce Claude Code API costs?

Three levers cut the most: model routing (easy edits to Haiku, 40-70% savings), prompt caching (90% off repeated prompts, automatic in Claude Code), and context compaction (remove 50-70% of conversation tokens before each call). Built-in habits help too: /clear between tasks, specific prompts, and delegating verbose operations to subagents.

Is the Claude Code API cheaper than the subscription?

It depends on volume. The $100/mo Max plan is $3.33/day; if your daily API spend would stay under that, the subscription wins. API billing typically only beats Pro for light users (under ~50 sessions/month) or spiky workloads. Heavy daily use favors a subscription with API overflow for spikes. See the breakeven math.

What is the difference between cache read and cache write cost?

Cache writes cost 1.25x the base input rate; cache reads cost 0.1x, a 90% discount. The first call with a prefix pays the write premium; every subsequent call with the same prefix reads at the discount. Because Claude Code re-sends a stable system prompt every turn, a high cache hit rate is what keeps the bill low.

Related Resources

Cut Your Claude Code API Bill 40-70%

Start with Morph Router: classify each request at $0.001 in ~430ms and send easy edits to Haiku instead of Opus. Add Morph Compact for long sessions: 50-70% fewer conversation tokens at 33,000 tok/s with zero hallucination. Both integrate with one API call.

Get Started

View Docs

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers

Claude Code API Cost (2026): Exact Per-Token Math, Session Cost, and How to Cut It

Does Claude Code Cost Money?

Claude Code Token Cost: Per-Token Rates

How Claude Code Consumes Tokens

Cost of One Session: Worked Example

Monthly Team Bill: Worked Example

Light: $13/active day

Heavy: $30/active day

Optimized: routing + compaction

Why Cost Compounds With Context

The compounding cost of a growing conversation

The /cost Command and Cache Hit Rate

What /cost prints (illustrative)

Lever 1: Model Routing (40-70% Savings)

Route Claude Code requests by difficulty

Lever 2: Prompt Caching (90% Off Repeated Content)

Prompt caching on the Anthropic API (Python)

Lever 3: Context Compaction (50-70% Fewer Tokens)

Compact before each call, then route to a model

Claude Code Cost Calculator

Frequently Asked Questions

Does Claude Code cost money?

How much do Claude Code tokens cost?

What is the cost of a typical Claude Code session?

Why does Claude Code use so many tokens?

How do I see Claude Code cost per token in my session?

How can I reduce Claude Code API costs?

Is the Claude Code API cheaper than the subscription?

What is the difference between cache read and cache write cost?

Related Resources

Cut Your Claude Code API Bill 40-70%