Cheapest LLM APIs in 2026: Per-Token Pricing From $0.01 to $75 per Million

Every Major LLM API, Sorted by Input Cost (April 2026)

Prices per million tokens. Input is what you send. Output is what the model returns. Output typically costs 2-8x more than input. For coding agents and long-context workloads, input cost dominates because you send far more tokens than you receive.

3,750x

Price range: cheapest to most expensive

~80%

Average price drop since early 2025

$0.05

Cheapest input per MTok (Llama 3.1 8B on Groq)

Model	Provider	Input/MTok	Output/MTok	Tier
GPT-OSS 20B	OpenAI	$0.03	$0.10	Budget
Llama 3.1 8B Instant	Groq	$0.05	$0.08	Budget
GPT-5 Nano	OpenAI	$0.05	$0.40	Budget
Amazon Nova Lite	AWS Bedrock	$0.06	$0.24	Budget
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	Budget
GPT-4.1 Nano	OpenAI	$0.10	$0.40	Budget
Llama 4 Scout (17Bx16E)	Groq	$0.11	$0.34	Budget
GPT-OSS 120B	OpenAI	$0.15	$0.60	Budget
Mistral Small 3.1	Mistral	$0.20	$0.60	Budget
GPT-5.4 Nano	OpenAI	$0.20	$1.25	Budget
GPT-4.1 Mini	OpenAI	$0.20	$0.80	Budget
Grok 4.1 Fast	xAI	$0.20	$0.50	Budget
Qwen3 Coder 480B A35B	Alibaba	$0.22	$0.90	Budget
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50	Budget
DeepSeek V3.2 (Chat)	DeepSeek	$0.28	$0.42	Budget
DeepSeek V3.2 (Reasoner)	DeepSeek	$0.28	$0.42	Budget
Qwen3 32B	Groq	$0.29	$0.59	Budget
Gemini 2.5 Flash	Google	$0.30	$2.50	Budget
Codestral 2508	Mistral	$0.30	$0.90	Budget
DeepSeek V4-Flash	DeepSeek	$0.14	$0.28	Budget
Gemini 3 Flash	Google	$0.50	$3.00	Budget
o4-mini	OpenAI	$0.55	$2.20	Budget
DeepSeek R1	DeepSeek	$0.55	$2.19	Budget
Llama 3.3 70B Versatile	Groq	$0.59	$0.79	Budget
Qwen3 Coder Plus	Alibaba	$0.65	$3.25	Mid
GPT-5.4 Mini	OpenAI	$0.75	$4.50	Mid
Amazon Nova Pro	AWS Bedrock	$0.80	$3.20	Mid
Mistral Medium 3	Mistral	$1.00	$3.00	Mid
Claude Haiku 4.5	Anthropic	$1.00	$5.00	Mid
o3-mini	OpenAI	$1.10	$4.40	Mid
Gemini 2.5 Pro	Google	$1.25	$10.00	Mid
GPT-5	OpenAI	$1.25	$10.00	Mid
GPT-5.2	OpenAI	$1.75	$14.00	Mid
Gemini 3.1 Pro	Google	$2.00	$12.00	Mid
Mistral Large 3	Mistral	$2.00	$6.00	Mid
o3	OpenAI	$2.00	$8.00	Mid
GPT-4.1	OpenAI	$2.00	$8.00	Mid
GPT-5.4	OpenAI	$2.50	$15.00	Premium
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	Premium
Grok 4	xAI	$3.00	$15.00	Premium
Claude Opus 4.6	Anthropic	$5.00	$25.00	Premium
GPT-5.2 Pro	OpenAI	$10.50	$84.00	Premium
Claude Opus 4.1 (legacy)	Anthropic	$15.00	$75.00	Premium
o3 Pro	OpenAI	$20.00	$80.00	Premium
GPT-5.4 Pro	OpenAI	$30.00	$180.00	Premium
o1-pro (legacy)	OpenAI	$150.00	$600.00	Premium

Reading this table

Input/MTok = price per million input tokens (your prompts, system messages, context). Output/MTok = price per million output tokens (model responses). A typical coding agent session sends 5-10x more input tokens than it receives in output. For chat applications, the ratio is closer to 1:1.

Budget Tier: Under $1 per Million Input Tokens

This is where most production workloads should start. Models in this range handle classification, extraction, simple code generation, summarization, and formatting. The quality gap between a $0.20/MTok model and a $3.00/MTok model is smaller than the 15x price difference suggests.

DeepSeek V3.2: $0.28/$0.42

The price-performance champion. Competitive with mid-tier models on coding and general benchmarks, at roughly 10x less than GPT-5.4. Cache hits drop input to $0.028/MTok. Thinking mode (reasoner) is the same price. 128K context. The main trade-off: DeepSeek's API routes through China, which matters for some compliance requirements.

Gemini 2.5 Flash-Lite: $0.10/$0.40

The cheapest option from a major US provider. Google offers a free tier for prototyping. Context caching at $0.01/MTok makes repeated prompts nearly free. Performance is solid for simple tasks but drops off on complex reasoning compared to Flash ($0.30/$2.50) or Pro ($1.25/$10.00).

GPT-4.1 Nano: $0.10/$0.40

OpenAI's cheapest current-gen model. Good enough for linting, formatting, classification, and short code completions. Not competitive on multi-step reasoning or complex generation. Cached input at $0.01/MTok. Batch at $0.05/$0.20. For high-volume simple tasks, this is the OpenAI pick.

Mistral Small 3.1: $0.20/$0.60

A strong small model from a European provider (data stays in EU if that matters). Competitive with GPT-4.1 Mini on most benchmarks. No prompt caching discount, but the base price is already low. Good default for teams that want a non-US, non-China option.

Model	Input/MTok	Output/MTok	Cache Hit Input	Best For
DeepSeek V3.2	$0.28	$0.42	$0.028	General coding, reasoning
DeepSeek V4-Flash	$0.14	$0.28	$0.014	Latest DeepSeek flagship (budget tier)
DeepSeek R1	$0.55	$2.19	$0.14	Budget reasoning (think + answer)
Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.01	High-volume simple tasks
GPT-5.4 Nano	$0.20	$1.25	$0.02	Cheap OpenAI flagship-family
GPT-4.1 Nano	$0.10	$0.40	$0.01	Classification, formatting
Grok 4.1 Fast	$0.20	$0.50	N/A	Cheap xAI option
Mistral Small 3.1	$0.20	$0.60	N/A	EU data residency
o4-mini	$0.55	$2.20	$0.055	Budget reasoning model
Llama 3.1 8B (Groq)	$0.05	$0.08	N/A	Lowest possible cost
Amazon Nova Lite	$0.06	$0.24	N/A	AWS-native workloads

The surprise in this tier: OpenAI's reasoning model o4-mini at $0.55/$2.20 is cheaper than many general-purpose mid-tier models. It replaced o3-mini and outperforms it on most benchmarks. For tasks that need structured reasoning on a budget, o4-mini undercuts everything except DeepSeek's reasoner mode.

Hidden cost: reasoning tokens

Reasoning models (o4-mini, o3, DeepSeek Reasoner) generate internal chain-of-thought tokens billed as output. A short visible answer might use 2,000 output tokens, but the model burned 10,000 reasoning tokens behind the scenes. Your actual output cost can be 5-10x what the visible response length implies. Factor this into any cost comparison.

Mid Tier: $1-5 per Million Input Tokens

The workhorses. These models handle complex code generation, multi-step reasoning, long-form writing, and most agentic workflows. The price difference between $1.25/MTok (Gemini 2.5 Pro) and $3.00/MTok (Claude Sonnet 4.6) matters at scale but both are viable for production.

Model	Input/MTok	Output/MTok	Context Window	Notes
Claude Haiku 4.5	$1.00	$5.00	200K	Cheapest Anthropic model worth using for code
Gemini 2.5 Pro	$1.25	$10.00	1M	Best value frontier-class model
GPT-5	$1.25	$10.00	128K	OpenAI base flagship
GPT-5.2	$1.75	$14.00	128K	Retiring June 2026
Gemini 3.1 Pro	$2.00	$12.00	1M	Latest Google flagship
Mistral Large 3	$2.00	$6.00	128K	Cheapest output in mid-tier
o3	$2.00	$8.00	200K	Best reasoning per dollar
GPT-4.1	$2.00	$8.00	1M	Workhorse for code, instruction following
Claude Sonnet 4.6	$3.00	$15.00	1M	Best coding model per many benchmarks

Two models stand out. Gemini 2.5 Pro at $1.25/$10.00 offers frontier quality at nearly half the price of Claude Sonnet 4.6. Its 1M context window matches Claude's and costs less. The catch: Gemini is newer for coding agent use cases and the ecosystem (tool use, function calling) is less mature than Anthropic's or OpenAI's.

Mistral Large 3 at $2.00/$6.00 has the cheapest output in this tier. If your workload is output-heavy (long-form generation, detailed code), Mistral Large saves 60% on output versus Claude Sonnet ($6 vs $15) and 25% versus o3 ($6 vs $8).

Premium Tier: $5+ per Million Input Tokens

Reserve these for tasks where quality directly translates to value. Architecture decisions, complex debugging, security audits, production incident analysis. Running a premium model on routine code formatting is burning money.

Model	Input/MTok	Output/MTok	Batch Input	Batch Output
Claude Opus 4.6	$5.00	$25.00	$2.50	$12.50
GPT-5.4	$2.50	$15.00	$1.25	$7.50
GPT-5.2 Pro	$10.50	$84.00	N/A	N/A
Claude Opus 4.1 (legacy)	$15.00	$75.00	$7.50	$37.50
o3 Pro	$20.00	$80.00	N/A	N/A
GPT-5.4 Pro	$30.00	$180.00	$15.00	$90.00

GPT-5.4 at $2.50/$15.00 is technically in this tier but borderline mid-tier on input pricing. It is the cheapest way to access OpenAI's flagship-class quality. Claude Opus 4.6 at $5.00/$25.00 costs 2x the input and 1.67x the output, but includes a 1M token context window at standard pricing (GPT-5.4's context is 272K, with prices doubling past that).

The truly expensive models (GPT-5.4 Pro at $30/$180, o1-pro at $150/$600) exist for specialized research and evaluation. No production coding agent should run on these.

Legacy tax

Claude Opus 4.1 still costs $15/$75, three times more than Opus 4.6 at $5/$25. Anthropic improved quality AND cut price by 67% in one generation. If your codebase references claude-opus-4.1 or claude-3-opus, you are paying 3x for equal or worse performance. Update your model string.

Inference Providers: Same Model, Different Price

Open-source models (Llama, Qwen, Mixtral, DeepSeek) run on multiple providers. The same model at the same quality can cost 2-5x more depending on where you host it.

Provider	Input/MTok	Output/MTok	Speed
Groq	$0.11 (Scout)	$0.34	Fastest TTFB
Together AI	$0.27	$0.85	Fast
Fireworks AI	$0.27	$0.85	Fast
Cerebras	Contact	Contact	Highest throughput

Groq: Lowest Latency

Custom LPU silicon built for fast inference. Llama 3.1 8B at $0.05/$0.08 is the cheapest hosted inference available. 241+ tok/s on 70B models. Trade-off: limited model selection compared to Together or Fireworks. Prompt caching available on select models at 50% off.

Together AI: Widest Selection

15+ open-source models including Llama, DeepSeek, Qwen, Mistral, and Gemma. Competitive pricing on mid-size models. Cheaper than Fireworks on 39 of 80 shared models. Best for teams that want one API for many open-source models.

Fireworks AI: Flexible Deployment

Serverless starting at $0.10/MTok for small models. On-demand GPUs from $2.90/hr (A100) to $9.00/hr (B200) for dedicated capacity. Good middle ground between pure serverless and self-hosting.

Cerebras: Throughput King

Wafer-scale engine delivers 1,800 tok/s on Llama 3.1 8B and 969 tok/s on 405B. Free tier available. Enterprise pricing on request. Best for batch workloads where throughput matters more than per-token cost.

For most teams: start with Groq or Together AI for open-source models. If you need dedicated capacity or fine-tuned models, look at Fireworks on-demand or self-hosting.

Batch Discounts and Prompt Caching

Raw per-token pricing is the sticker price. Actual cost depends on whether you use the discount stack every major provider offers.

Provider	Batch Discount	Cache Hit Discount	Combined Max Savings
Anthropic	50% off (24h async)	90% off input	95%
OpenAI	50% off (24h async)	90% off input	95%
Google	50% off (Flex mode)	90% off input	95%
DeepSeek	N/A	90% off input	90%
Mistral	N/A	N/A	Base price only
Groq	N/A	50% off (select models)	50%

The math on stacking. Take Claude Sonnet 4.6 at $3.00/MTok input:

Configuration	Price per MTok	Savings
Standard	$3.00	0%
Prompt cache hit	$0.30	90%
Batch only	$1.50	50%
Batch + cache hit	$0.15	95%

OpenAI's GPT-5.4 follows the same pattern. Standard at $2.50, cached at $0.25, batch at $1.25, batch + cached at $0.13. The effective price of a frontier model with full discount stack is cheaper than the sticker price of most budget models.

DeepSeek's stealth advantage

DeepSeek V3.2 cache hits cost $0.028/MTok. DeepSeek V4-Flash cache hits cost $0.014/MTok, the cheapest cached input from any provider. V4-Pro cache hits cost $0.174/MTok. If your workload has high cache hit rates (agent sessions with repeated system prompts and file context), DeepSeek's effective input cost is essentially free.

Quality-Adjusted Cost: Cheap but Bad = Expensive

A $0.05/MTok model that fails 40% of tasks and needs 3 retries per success costs more than a $3.00/MTok model that succeeds on the first attempt. The metric that matters is cost per completed task, not cost per token.

Model	Price/MTok (blended)	Avg Tokens/Task	Success Rate	Effective Cost/Task
Llama 3.1 8B	$0.065	15K	~35%	$0.28 (2.9 attempts)
GPT-4.1 Nano	$0.25	12K	~55%	$0.55 (1.8 attempts)
DeepSeek V3.2	$0.35	10K	~75%	$0.47 (1.3 attempts)
Gemini 2.5 Flash	$1.40	8K	~80%	$1.40 (1.25 attempts)
Claude Sonnet 4.6	$9.00	6K	~92%	$5.87 (1.09 attempts)
Claude Opus 4.6	$15.00	5K	~95%	$7.89 (1.05 attempts)

DeepSeek V3.2 at $0.47 per completed task is the value sweet spot. It is 12x cheaper per completed task than Claude Sonnet, with a 75% single-attempt success rate that is good enough for most automated workflows with retry logic. Claude Sonnet's 92% success rate matters when retries are expensive (user-facing, time-sensitive, or when errors compound in multi-step agents).

The cheapest models (Llama 3.1 8B, GPT-4.1 Nano) look cheap per token but burn through tokens on retries. They are cost-effective only for tasks where they reliably succeed on the first attempt: classification, extraction, formatting, simple completions. Using them for complex code generation is a false economy.

$0.47

DeepSeek V3.2: cost per completed coding task

$5.87

Claude Sonnet 4.6: cost per completed coding task

12x

Price gap narrows from 30x (per token) to 12x (per task)

Local Inference via Ollama: $0 Per Token

Local inference eliminates per-token costs entirely. You pay for hardware upfront and electricity ongoing. Ollama, vLLM, and llama.cpp make self-hosting straightforward.

Factor	Local (Mac Studio M4)	API (DeepSeek V3.2)	API (GPT-5.4)
Hardware cost	$4,000-8,000	$0	$0
Monthly electricity	~$15	$0	$0
Cost at 10M tok/month	$15 + amortized HW	~$4	~$35
Cost at 100M tok/month	$15 + amortized HW	~$40	~$350
Cost at 1B tok/month	$15 + amortized HW	~$400	~$3,500
Best model available	Llama 3.1 70B	DeepSeek V3.2	GPT-5.4
Quality ceiling	70-85% of frontier	~85% of frontier	Frontier

Break-even is roughly 50 million tokens per month, or about 1.7M tokens per day. Below that, APIs are cheaper. Above that, local inference wins on raw cost, though you lose access to frontier-quality models. A Mac Studio running Llama 3.1 70B under full GPU load draws about 60W, costing under $15/month in most US electricity markets.

The practical limitation: local models top out at 70B parameters on consumer hardware (120B with quantization tricks). Tasks that require frontier reasoning (complex architecture, nuanced debugging) still need API calls to Claude Opus, GPT-5.4, or Gemini 2.5 Pro. The hybrid approach works best: route simple tasks to local models, escalate hard problems to API.

How Morph Compact Makes Every LLM Cheaper

Every optimization above reduces the price per token. Morph Compact reduces the number of tokens. These are complementary strategies, and they multiply.

Compact compresses LLM context by 50-70% at 33,000 tokens per second. The compression is extractive: every surviving line is character-for-character from the original input. No paraphrasing, no synthesis, no hallucination risk. The model decides which lines are relevant and discards the rest.

Model	Standard Cost (100K input)	After 60% Compaction (40K input)	Savings
Llama 3.1 8B (Groq)	$0.005	$0.002	$0.003
DeepSeek V3.2	$0.028	$0.011	$0.017
Gemini 2.5 Flash	$0.030	$0.012	$0.018
GPT-5.4 Nano	$0.020	$0.008	$0.012
GPT-5.4	$0.250	$0.100	$0.150
Claude Sonnet 4.6	$0.300	$0.120	$0.180
Claude Opus 4.6	$0.500	$0.200	$0.300
GPT-5.4 Pro	$3.000	$1.200	$1.800

The savings per request look modest at the cheap end. But coding agents send 20-100 requests per session, each carrying accumulated context. A 30-turn agent session on Claude Sonnet 4.6 with 200K tokens of context per turn:

$18.00

30 turns x 200K tokens at $3/MTok (no compaction)

$7.20

Same session with 60% compaction

$10.80

Saved per session

At 10 sessions per day, that is $108/day saved, or $3,240/month. The savings scale linearly with token volume and model cost. Compaction is most valuable on expensive models (where each saved token is worth more) and on long-running agent sessions (where context accumulates the most).

Strategy	Sonnet 4.6 Input	GPT-5.4 Input	DeepSeek V3.2 Input
Base price	$3.00	$2.50	$0.28
+ Prompt caching	$0.30	$0.25	$0.028
+ Batch API	$0.15	$0.13	N/A
+ 60% compaction	$0.06 effective	$0.05 effective	$0.011 effective

Claude Sonnet 4.6, a frontier coding model, at $0.06 effective per MTok of original context. That is cheaper than the sticker price of Llama 3.1 8B on Groq ($0.05/$0.08), while delivering 92% first-attempt success rate versus 35%. The discount stack inverts the pricing hierarchy.

Frequently Asked Questions

What is the cheapest LLM API in 2026?

The absolute cheapest is Llama 3.1 8B on Groq at $0.05/$0.08 per million tokens. Amazon Nova Lite at $0.06/$0.24 is close. Among first-party providers, DeepSeek V3.2 at $0.28/$0.42 offers the best quality-to-cost ratio. Gemini 2.5 Flash-Lite at $0.10/$0.40 is cheapest from a major US provider. GPT-4.1 Nano at $0.10/$0.40 is cheapest from OpenAI.

How much does GPT-5.4 cost?

GPT-5.4 costs $2.50 input / $15.00 output per million tokens. Cached input at $0.25. Batch at $1.25/$7.50. GPT-5.4 Mini is $0.75/$4.50. GPT-5.4 Nano is $0.20/$1.25. GPT-5.4 Pro (the expensive one) is $30.00/$180.00.

Is DeepSeek cheaper than ChatGPT?

Yes. DeepSeek V4-Flash at $0.14/$0.28 is 18x cheaper on input and 54x cheaper on output compared to GPT-5.4. V4-Pro at $1.74/$3.48 delivers 80.6% SWE-bench Verified, near Claude Opus quality at a fraction of the price. Cache hits as low as $0.014/MTok are the cheapest cached input from any provider. Quality is competitive for coding and general tasks. DeepSeek R1 at $0.55/$2.19 offers chain-of-thought reasoning at a fraction of o3's cost ($2.00/$8.00).

What is the cheapest LLM for coding?

Dedicated coding models: Qwen3 Coder 480B A35B at $0.22/$0.90. Codestral at $0.30/$0.90. DeepSeek V3.2 in reasoner mode at $0.28/$0.42. General models that code well at low cost: GPT-4.1 Mini at $0.20/$0.80, o4-mini at $0.55/$2.20. For the best coding quality, Claude Sonnet 4.6 at $3.00/$15.00 leads most benchmarks but costs 10x more.

How do batch API and prompt caching reduce costs?

Batch APIs (OpenAI, Anthropic, Google) process requests asynchronously within 24 hours at 50% off. Prompt caching stores repeated context so subsequent requests pay 10% of standard input price (Anthropic, OpenAI, Google). These stack: batch + cache hit gives 95% savings on input. DeepSeek's cache hit price of $0.028/MTok makes it the cheapest cached input from any provider.

Is running a local LLM with Ollama cheaper than an API?

Above 50 million tokens per month, yes. A Mac Studio M4 running Llama 3.1 70B costs ~$15/month in electricity. Below 50M tokens, APIs are cheaper because you avoid the $4,000-8,000 hardware investment. Quality ceiling is 70-85% of frontier models. Best used in a hybrid setup: local for simple tasks, API for complex reasoning.

How does Morph Compact make LLMs cheaper?

Compact compresses context by 50-70% before sending it to any LLM. A 100K prompt compacted to 40K costs 60% less. The compression is extractive (verbatim lines, no paraphrasing) with zero hallucination risk. Combined with prompt caching and batch processing, it brings Claude Sonnet 4.6's effective input cost to $0.06/MTok, cheaper than the sticker price of most budget models. See how Compact works.

Cut your LLM API costs by 50-70%

Morph Compact compresses context before it reaches any model. Fewer tokens, same quality, lower bill. Works with every provider: OpenAI, Anthropic, Google, DeepSeek, or your own self-hosted models.

Try Morph Compact

See Compact Benchmarks

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers