Cheapest LLM APIs in 2026: Per-Token Pricing From $0.01 to $75 per Million

Complete pricing table of every major LLM API sorted by cost per million tokens. GPT-5.4 Nano at $0.20/$1.25, DeepSeek V3.2 at $0.28/$0.42, Gemini 2.5 Flash-Lite at $0.10/$0.40, Llama 3.1 8B on Groq at $0.05/$0.08. Batch discounts, prompt caching, and quality-adjusted cost analysis. Morph Compact cuts tokens by 50-70%, making every model cheaper.

April 5, 2026 ยท 1 min read

Every Major LLM API, Sorted by Input Cost (April 2026)

Prices per million tokens. Input is what you send. Output is what the model returns. Output typically costs 2-8x more than input. For coding agents and long-context workloads, input cost dominates because you send far more tokens than you receive.

3,750x
Price range: cheapest to most expensive
~80%
Average price drop since early 2025
$0.05
Cheapest input per MTok (Llama 3.1 8B on Groq)
ModelProviderInput/MTokOutput/MTokTier
GPT-OSS 20BOpenAI$0.03$0.10Budget
Llama 3.1 8B InstantGroq$0.05$0.08Budget
GPT-5 NanoOpenAI$0.05$0.40Budget
Amazon Nova LiteAWS Bedrock$0.06$0.24Budget
Gemini 2.5 Flash-LiteGoogle$0.10$0.40Budget
GPT-4.1 NanoOpenAI$0.10$0.40Budget
Llama 4 Scout (17Bx16E)Groq$0.11$0.34Budget
GPT-OSS 120BOpenAI$0.15$0.60Budget
Mistral Small 3.1Mistral$0.20$0.60Budget
GPT-5.4 NanoOpenAI$0.20$1.25Budget
GPT-4.1 MiniOpenAI$0.20$0.80Budget
Grok 4.1 FastxAI$0.20$0.50Budget
Qwen3 Coder 480B A35BAlibaba$0.22$0.90Budget
Gemini 3.1 Flash-LiteGoogle$0.25$1.50Budget
DeepSeek V3.2 (Chat)DeepSeek$0.28$0.42Budget
DeepSeek V3.2 (Reasoner)DeepSeek$0.28$0.42Budget
Qwen3 32BGroq$0.29$0.59Budget
Gemini 2.5 FlashGoogle$0.30$2.50Budget
Codestral 2508Mistral$0.30$0.90Budget
DeepSeek V4DeepSeek$0.30$0.50Budget
Gemini 3 FlashGoogle$0.50$3.00Budget
o4-miniOpenAI$0.55$2.20Budget
DeepSeek R1DeepSeek$0.55$2.19Budget
Llama 3.3 70B VersatileGroq$0.59$0.79Budget
Qwen3 Coder PlusAlibaba$0.65$3.25Mid
GPT-5.4 MiniOpenAI$0.75$4.50Mid
Amazon Nova ProAWS Bedrock$0.80$3.20Mid
Mistral Medium 3Mistral$1.00$3.00Mid
Claude Haiku 4.5Anthropic$1.00$5.00Mid
o3-miniOpenAI$1.10$4.40Mid
Gemini 2.5 ProGoogle$1.25$10.00Mid
GPT-5OpenAI$1.25$10.00Mid
GPT-5.2OpenAI$1.75$14.00Mid
Gemini 3.1 ProGoogle$2.00$12.00Mid
Mistral Large 3Mistral$2.00$6.00Mid
o3OpenAI$2.00$8.00Mid
GPT-4.1OpenAI$2.00$8.00Mid
GPT-5.4OpenAI$2.50$15.00Premium
Claude Sonnet 4.6Anthropic$3.00$15.00Premium
Grok 4xAI$3.00$15.00Premium
Claude Opus 4.6Anthropic$5.00$25.00Premium
GPT-5.2 ProOpenAI$10.50$84.00Premium
Claude Opus 4.1 (legacy)Anthropic$15.00$75.00Premium
o3 ProOpenAI$20.00$80.00Premium
GPT-5.4 ProOpenAI$30.00$180.00Premium
o1-pro (legacy)OpenAI$150.00$600.00Premium

Reading this table

Input/MTok = price per million input tokens (your prompts, system messages, context). Output/MTok = price per million output tokens (model responses). A typical coding agent session sends 5-10x more input tokens than it receives in output. For chat applications, the ratio is closer to 1:1.

Budget Tier: Under $1 per Million Input Tokens

This is where most production workloads should start. Models in this range handle classification, extraction, simple code generation, summarization, and formatting. The quality gap between a $0.20/MTok model and a $3.00/MTok model is smaller than the 15x price difference suggests.

DeepSeek V3.2: $0.28/$0.42

The price-performance champion. Competitive with mid-tier models on coding and general benchmarks, at roughly 10x less than GPT-5.4. Cache hits drop input to $0.028/MTok. Thinking mode (reasoner) is the same price. 128K context. The main trade-off: DeepSeek's API routes through China, which matters for some compliance requirements.

Gemini 2.5 Flash-Lite: $0.10/$0.40

The cheapest option from a major US provider. Google offers a free tier for prototyping. Context caching at $0.01/MTok makes repeated prompts nearly free. Performance is solid for simple tasks but drops off on complex reasoning compared to Flash ($0.30/$2.50) or Pro ($1.25/$10.00).

GPT-4.1 Nano: $0.10/$0.40

OpenAI's cheapest current-gen model. Good enough for linting, formatting, classification, and short code completions. Not competitive on multi-step reasoning or complex generation. Cached input at $0.01/MTok. Batch at $0.05/$0.20. For high-volume simple tasks, this is the OpenAI pick.

Mistral Small 3.1: $0.20/$0.60

A strong small model from a European provider (data stays in EU if that matters). Competitive with GPT-4.1 Mini on most benchmarks. No prompt caching discount, but the base price is already low. Good default for teams that want a non-US, non-China option.

ModelInput/MTokOutput/MTokCache Hit InputBest For
DeepSeek V3.2$0.28$0.42$0.028General coding, reasoning
DeepSeek V4$0.30$0.50$0.03Latest DeepSeek flagship
DeepSeek R1$0.55$2.19$0.14Budget reasoning (think + answer)
Gemini 2.5 Flash-Lite$0.10$0.40$0.01High-volume simple tasks
GPT-5.4 Nano$0.20$1.25$0.02Cheap OpenAI flagship-family
GPT-4.1 Nano$0.10$0.40$0.01Classification, formatting
Grok 4.1 Fast$0.20$0.50N/ACheap xAI option
Mistral Small 3.1$0.20$0.60N/AEU data residency
o4-mini$0.55$2.20$0.055Budget reasoning model
Llama 3.1 8B (Groq)$0.05$0.08N/ALowest possible cost
Amazon Nova Lite$0.06$0.24N/AAWS-native workloads

The surprise in this tier: OpenAI's reasoning model o4-mini at $0.55/$2.20 is cheaper than many general-purpose mid-tier models. It replaced o3-mini and outperforms it on most benchmarks. For tasks that need structured reasoning on a budget, o4-mini undercuts everything except DeepSeek's reasoner mode.

Hidden cost: reasoning tokens

Reasoning models (o4-mini, o3, DeepSeek Reasoner) generate internal chain-of-thought tokens billed as output. A short visible answer might use 2,000 output tokens, but the model burned 10,000 reasoning tokens behind the scenes. Your actual output cost can be 5-10x what the visible response length implies. Factor this into any cost comparison.

Mid Tier: $1-5 per Million Input Tokens

The workhorses. These models handle complex code generation, multi-step reasoning, long-form writing, and most agentic workflows. The price difference between $1.25/MTok (Gemini 2.5 Pro) and $3.00/MTok (Claude Sonnet 4.6) matters at scale but both are viable for production.

ModelInput/MTokOutput/MTokContext WindowNotes
Claude Haiku 4.5$1.00$5.00200KCheapest Anthropic model worth using for code
Gemini 2.5 Pro$1.25$10.001MBest value frontier-class model
GPT-5$1.25$10.00128KOpenAI base flagship
GPT-5.2$1.75$14.00128KRetiring June 2026
Gemini 3.1 Pro$2.00$12.001MLatest Google flagship
Mistral Large 3$2.00$6.00128KCheapest output in mid-tier
o3$2.00$8.00200KBest reasoning per dollar
GPT-4.1$2.00$8.001MWorkhorse for code, instruction following
Claude Sonnet 4.6$3.00$15.001MBest coding model per many benchmarks

Two models stand out. Gemini 2.5 Pro at $1.25/$10.00 offers frontier quality at nearly half the price of Claude Sonnet 4.6. Its 1M context window matches Claude's and costs less. The catch: Gemini is newer for coding agent use cases and the ecosystem (tool use, function calling) is less mature than Anthropic's or OpenAI's.

Mistral Large 3 at $2.00/$6.00 has the cheapest output in this tier. If your workload is output-heavy (long-form generation, detailed code), Mistral Large saves 60% on output versus Claude Sonnet ($6 vs $15) and 25% versus o3 ($6 vs $8).

Premium Tier: $5+ per Million Input Tokens

Reserve these for tasks where quality directly translates to value. Architecture decisions, complex debugging, security audits, production incident analysis. Running a premium model on routine code formatting is burning money.

ModelInput/MTokOutput/MTokBatch InputBatch Output
Claude Opus 4.6$5.00$25.00$2.50$12.50
GPT-5.4$2.50$15.00$1.25$7.50
GPT-5.2 Pro$10.50$84.00N/AN/A
Claude Opus 4.1 (legacy)$15.00$75.00$7.50$37.50
o3 Pro$20.00$80.00N/AN/A
GPT-5.4 Pro$30.00$180.00$15.00$90.00

GPT-5.4 at $2.50/$15.00 is technically in this tier but borderline mid-tier on input pricing. It is the cheapest way to access OpenAI's flagship-class quality. Claude Opus 4.6 at $5.00/$25.00 costs 2x the input and 1.67x the output, but includes a 1M token context window at standard pricing (GPT-5.4's context is 272K, with prices doubling past that).

The truly expensive models (GPT-5.4 Pro at $30/$180, o1-pro at $150/$600) exist for specialized research and evaluation. No production coding agent should run on these.

Legacy tax

Claude Opus 4.1 still costs $15/$75, three times more than Opus 4.6 at $5/$25. Anthropic improved quality AND cut price by 67% in one generation. If your codebase references claude-opus-4.1 or claude-3-opus, you are paying 3x for equal or worse performance. Update your model string.

Inference Providers: Same Model, Different Price

Open-source models (Llama, Qwen, Mixtral, DeepSeek) run on multiple providers. The same model at the same quality can cost 2-5x more depending on where you host it.

ProviderInput/MTokOutput/MTokSpeed
Groq$0.11 (Scout)$0.34Fastest TTFB
Together AI$0.27$0.85Fast
Fireworks AI$0.27$0.85Fast
CerebrasContactContactHighest throughput

Groq: Lowest Latency

Custom LPU silicon built for fast inference. Llama 3.1 8B at $0.05/$0.08 is the cheapest hosted inference available. 241+ tok/s on 70B models. Trade-off: limited model selection compared to Together or Fireworks. Prompt caching available on select models at 50% off.

Together AI: Widest Selection

15+ open-source models including Llama, DeepSeek, Qwen, Mistral, and Gemma. Competitive pricing on mid-size models. Cheaper than Fireworks on 39 of 80 shared models. Best for teams that want one API for many open-source models.

Fireworks AI: Flexible Deployment

Serverless starting at $0.10/MTok for small models. On-demand GPUs from $2.90/hr (A100) to $9.00/hr (B200) for dedicated capacity. Good middle ground between pure serverless and self-hosting.

Cerebras: Throughput King

Wafer-scale engine delivers 1,800 tok/s on Llama 3.1 8B and 969 tok/s on 405B. Free tier available. Enterprise pricing on request. Best for batch workloads where throughput matters more than per-token cost.

For most teams: start with Groq or Together AI for open-source models. If you need dedicated capacity or fine-tuned models, look at Fireworks on-demand or self-hosting.

Batch Discounts and Prompt Caching

Raw per-token pricing is the sticker price. Actual cost depends on whether you use the discount stack every major provider offers.

ProviderBatch DiscountCache Hit DiscountCombined Max Savings
Anthropic50% off (24h async)90% off input95%
OpenAI50% off (24h async)90% off input95%
Google50% off (Flex mode)90% off input95%
DeepSeekN/A90% off input90%
MistralN/AN/ABase price only
GroqN/A50% off (select models)50%

The math on stacking. Take Claude Sonnet 4.6 at $3.00/MTok input:

ConfigurationPrice per MTokSavings
Standard$3.000%
Prompt cache hit$0.3090%
Batch only$1.5050%
Batch + cache hit$0.1595%

OpenAI's GPT-5.4 follows the same pattern. Standard at $2.50, cached at $0.25, batch at $1.25, batch + cached at $0.13. The effective price of a frontier model with full discount stack is cheaper than the sticker price of most budget models.

DeepSeek's stealth advantage

DeepSeek V3.2 cache hits cost $0.028/MTok. DeepSeek V4 cache hits cost $0.03/MTok. Both are cheaper than any other provider's cached input price, including Gemini Flash-Lite. If your workload has high cache hit rates (agent sessions with repeated system prompts and file context), DeepSeek's effective input cost is essentially free.

Quality-Adjusted Cost: Cheap but Bad = Expensive

A $0.05/MTok model that fails 40% of tasks and needs 3 retries per success costs more than a $3.00/MTok model that succeeds on the first attempt. The metric that matters is cost per completed task, not cost per token.

ModelPrice/MTok (blended)Avg Tokens/TaskSuccess RateEffective Cost/Task
Llama 3.1 8B$0.06515K~35%$0.28 (2.9 attempts)
GPT-4.1 Nano$0.2512K~55%$0.55 (1.8 attempts)
DeepSeek V3.2$0.3510K~75%$0.47 (1.3 attempts)
Gemini 2.5 Flash$1.408K~80%$1.40 (1.25 attempts)
Claude Sonnet 4.6$9.006K~92%$5.87 (1.09 attempts)
Claude Opus 4.6$15.005K~95%$7.89 (1.05 attempts)

DeepSeek V3.2 at $0.47 per completed task is the value sweet spot. It is 12x cheaper per completed task than Claude Sonnet, with a 75% single-attempt success rate that is good enough for most automated workflows with retry logic. Claude Sonnet's 92% success rate matters when retries are expensive (user-facing, time-sensitive, or when errors compound in multi-step agents).

The cheapest models (Llama 3.1 8B, GPT-4.1 Nano) look cheap per token but burn through tokens on retries. They are cost-effective only for tasks where they reliably succeed on the first attempt: classification, extraction, formatting, simple completions. Using them for complex code generation is a false economy.

$0.47
DeepSeek V3.2: cost per completed coding task
$5.87
Claude Sonnet 4.6: cost per completed coding task
12x
Price gap narrows from 30x (per token) to 12x (per task)

Local Inference via Ollama: $0 Per Token

Local inference eliminates per-token costs entirely. You pay for hardware upfront and electricity ongoing. Ollama, vLLM, and llama.cpp make self-hosting straightforward.

FactorLocal (Mac Studio M4)API (DeepSeek V3.2)API (GPT-5.4)
Hardware cost$4,000-8,000$0$0
Monthly electricity~$15$0$0
Cost at 10M tok/month$15 + amortized HW~$4~$35
Cost at 100M tok/month$15 + amortized HW~$40~$350
Cost at 1B tok/month$15 + amortized HW~$400~$3,500
Best model availableLlama 3.1 70BDeepSeek V3.2GPT-5.4
Quality ceiling70-85% of frontier~85% of frontierFrontier

Break-even is roughly 50 million tokens per month, or about 1.7M tokens per day. Below that, APIs are cheaper. Above that, local inference wins on raw cost, though you lose access to frontier-quality models. A Mac Studio running Llama 3.1 70B under full GPU load draws about 60W, costing under $15/month in most US electricity markets.

The practical limitation: local models top out at 70B parameters on consumer hardware (120B with quantization tricks). Tasks that require frontier reasoning (complex architecture, nuanced debugging) still need API calls to Claude Opus, GPT-5.4, or Gemini 2.5 Pro. The hybrid approach works best: route simple tasks to local models, escalate hard problems to API.

How Morph Compact Makes Every LLM Cheaper

Every optimization above reduces the price per token. Morph Compact reduces the number of tokens. These are complementary strategies, and they multiply.

Compact compresses LLM context by 50-70% at 33,000 tokens per second. The compression is extractive: every surviving line is character-for-character from the original input. No paraphrasing, no synthesis, no hallucination risk. The model decides which lines are relevant and discards the rest.

ModelStandard Cost (100K input)After 60% Compaction (40K input)Savings
Llama 3.1 8B (Groq)$0.005$0.002$0.003
DeepSeek V3.2$0.028$0.011$0.017
Gemini 2.5 Flash$0.030$0.012$0.018
GPT-5.4 Nano$0.020$0.008$0.012
GPT-5.4$0.250$0.100$0.150
Claude Sonnet 4.6$0.300$0.120$0.180
Claude Opus 4.6$0.500$0.200$0.300
GPT-5.4 Pro$3.000$1.200$1.800

The savings per request look modest at the cheap end. But coding agents send 20-100 requests per session, each carrying accumulated context. A 30-turn agent session on Claude Sonnet 4.6 with 200K tokens of context per turn:

$18.00
30 turns x 200K tokens at $3/MTok (no compaction)
$7.20
Same session with 60% compaction
$10.80
Saved per session

At 10 sessions per day, that is $108/day saved, or $3,240/month. The savings scale linearly with token volume and model cost. Compaction is most valuable on expensive models (where each saved token is worth more) and on long-running agent sessions (where context accumulates the most).

StrategySonnet 4.6 InputGPT-5.4 InputDeepSeek V3.2 Input
Base price$3.00$2.50$0.28
+ Prompt caching$0.30$0.25$0.028
+ Batch API$0.15$0.13N/A
+ 60% compaction$0.06 effective$0.05 effective$0.011 effective

Claude Sonnet 4.6, a frontier coding model, at $0.06 effective per MTok of original context. That is cheaper than the sticker price of Llama 3.1 8B on Groq ($0.05/$0.08), while delivering 92% first-attempt success rate versus 35%. The discount stack inverts the pricing hierarchy.

Frequently Asked Questions

What is the cheapest LLM API in 2026?

The absolute cheapest is Llama 3.1 8B on Groq at $0.05/$0.08 per million tokens. Amazon Nova Lite at $0.06/$0.24 is close. Among first-party providers, DeepSeek V3.2 at $0.28/$0.42 offers the best quality-to-cost ratio. Gemini 2.5 Flash-Lite at $0.10/$0.40 is cheapest from a major US provider. GPT-4.1 Nano at $0.10/$0.40 is cheapest from OpenAI.

How much does GPT-5.4 cost?

GPT-5.4 costs $2.50 input / $15.00 output per million tokens. Cached input at $0.25. Batch at $1.25/$7.50. GPT-5.4 Mini is $0.75/$4.50. GPT-5.4 Nano is $0.20/$1.25. GPT-5.4 Pro (the expensive one) is $30.00/$180.00.

Is DeepSeek cheaper than ChatGPT?

Yes. DeepSeek V3.2 at $0.28/$0.42 is 9x cheaper on input and 36x cheaper on output compared to GPT-5.4. DeepSeek V4 at $0.30/$0.50 is the latest flagship and still 8x cheaper on input. Cache hits at $0.028-$0.03/MTok are the cheapest cached input from any provider. Quality is competitive for coding and general tasks but below frontier models on the hardest reasoning benchmarks. DeepSeek R1 at $0.55/$2.19 offers chain-of-thought reasoning at a fraction of o3's cost ($2.00/$8.00).

What is the cheapest LLM for coding?

Dedicated coding models: Qwen3 Coder 480B A35B at $0.22/$0.90. Codestral at $0.30/$0.90. DeepSeek V3.2 in reasoner mode at $0.28/$0.42. General models that code well at low cost: GPT-4.1 Mini at $0.20/$0.80, o4-mini at $0.55/$2.20. For the best coding quality, Claude Sonnet 4.6 at $3.00/$15.00 leads most benchmarks but costs 10x more.

How do batch API and prompt caching reduce costs?

Batch APIs (OpenAI, Anthropic, Google) process requests asynchronously within 24 hours at 50% off. Prompt caching stores repeated context so subsequent requests pay 10% of standard input price (Anthropic, OpenAI, Google). These stack: batch + cache hit gives 95% savings on input. DeepSeek's cache hit price of $0.028/MTok makes it the cheapest cached input from any provider.

Is running a local LLM with Ollama cheaper than an API?

Above 50 million tokens per month, yes. A Mac Studio M4 running Llama 3.1 70B costs ~$15/month in electricity. Below 50M tokens, APIs are cheaper because you avoid the $4,000-8,000 hardware investment. Quality ceiling is 70-85% of frontier models. Best used in a hybrid setup: local for simple tasks, API for complex reasoning.

How does Morph Compact make LLMs cheaper?

Compact compresses context by 50-70% before sending it to any LLM. A 100K prompt compacted to 40K costs 60% less. The compression is extractive (verbatim lines, no paraphrasing) with zero hallucination risk. Combined with prompt caching and batch processing, it brings Claude Sonnet 4.6's effective input cost to $0.06/MTok, cheaper than the sticker price of most budget models. See how Compact works.

Cut your LLM API costs by 50-70%

Morph Compact compresses context before it reaches any model. Fewer tokens, same quality, lower bill. Works with every provider: OpenAI, Anthropic, Google, DeepSeek, or your own self-hosted models.