OpenAI bills the API per token. As of June 2026, the flagship GPT-5.5 costs $5.00/1M input, $0.50/1M cached input, and $30.00/1M output with a 1M context window. The cheapest model, GPT-5.4-nano, runs $0.20/1M input and $1.25/1M output. GPT-5 and GPT-4o are no longer listed, succeeded by this 5.4/5.5 family. Morph's router cuts spend 40-70% by routing each call to the cheapest sufficient model.
The Current OpenAI API Price Table
The table below is the current OpenAI API lineup as listed on the official pricing page, in USD per million tokens. Input is the cost to send tokens to the model. Cached input is the discounted rate for a repeated prompt prefix. Output is the cost of generated tokens. All figures as of June 2026; verify on openai.com.
| Model | Input | Cached input | Output | Context |
|---|---|---|---|---|
| GPT-5.5 | $5.00 | $0.50 | $30.00 | 1M |
| GPT-5.5-pro | $30.00 | n/a | $180.00 | 1M |
| GPT-5.4 | $2.50 | $0.25 | $15.00 | 1M |
| GPT-5.4-mini | $0.75 | $0.075 | $4.50 | 1M |
| GPT-5.4-nano | $0.20 | $0.02 | $1.25 | 1M |
| GPT-5.4-pro | $30.00 | n/a | $180.00 | 1M |
Reading the table
GPT-5.4 is exactly half the cost of GPT-5.5 on both input and output. The two pro variants (GPT-5.5-pro and GPT-5.4-pro) share the same $30.00/$180.00 rate and target extended-reasoning workloads where output volume is the bottleneck. GPT-5.4-nano is the floor at $0.20/1M input, 25x cheaper than the flagship input rate.
The spread across the lineup is wide. GPT-5.4-nano output at $1.25/1M versus GPT-5.5-pro output at $180.00/1M is a 144x difference. That spread is the entire argument for matching the model to the task instead of defaulting every call to the flagship.
How OpenAI Token Pricing Works
OpenAI charges per token, not per request. A token is roughly 4 characters of English text, so 1,000 tokens is about 750 words. Every API call has an input token count (your prompt plus conversation history plus any tool schemas) and an output token count (the model's response). You pay for both at different rates.
Output costs more than input on every model, usually 5x to 6x. GPT-5.5 charges $5.00/1M input and $30.00/1M output, a 6x ratio. GPT-5.4 charges $2.50/1M input and $15.00/1M output, also 6x. This asymmetry means a read-heavy workload (summarizing a large document, answering questions about code) is far cheaper than a generation-heavy workload (writing new files, producing long completions) at the same total token count.
Context window is the maximum number of tokens the model can hold in a single request, input and output combined. The entire current OpenAI lineup carries a 1M token context window, large enough to hold an entire mid-sized codebase. Context window does not change the per-token price; it caps how much you can send in one call.
Input tokens
Everything you send: the prompt, conversation history, system prompt, tool schemas, and any pasted context. Charged at the input rate ($5.00/1M for GPT-5.5).
Cached input tokens
A repeated prompt prefix charged at roughly 10% of the input rate ($0.50/1M for GPT-5.5). Applies automatically when the same prefix recurs across calls.
Output tokens
The model's generated response. The most expensive component at 5-6x the input rate ($30.00/1M for GPT-5.5). Trim output to control spend.
What Happened to GPT-4o and GPT-5 Pricing
GPT-4o, GPT-5, GPT-5-mini, and GPT-5-nano are no longer listed on OpenAI's official pricing page. They were succeeded by the GPT-5.4 and GPT-5.5 family covered above. People still search for "gpt 4o pricing" and "gpt-5 pricing" out of habit, but OpenAI no longer publishes per-token rates for those models, so any figure quoted for them today is stale.
The mapping to current models is straightforward. If you were running GPT-4o or GPT-5 for general work, the direct equivalent is GPT-5.4 at $2.50/1M input and $15.00/1M output. If you need the top of the lineup, that is now GPT-5.5 at $5.00/1M input and $30.00/1M output. For the smaller GPT-5-mini and GPT-5-nano tier, the successors are GPT-5.4-mini and GPT-5.4-nano.
| Retired model | Current equivalent | Input | Output |
|---|---|---|---|
| GPT-5 (flagship) | GPT-5.5 | $5.00/1M | $30.00/1M |
| GPT-4o (general) | GPT-5.4 | $2.50/1M | $15.00/1M |
| GPT-5-mini | GPT-5.4-mini | $0.75/1M | $4.50/1M |
| GPT-5-nano | GPT-5.4-nano | $0.20/1M | $1.25/1M |
A note on stale price quotes
Blog posts and aggregators still circulate GPT-4o and GPT-5 prices. Those models are gone from the official pricing page, and OpenAI does not guarantee the old rates apply to any successor. Price your workload against the current GPT-5.4 and GPT-5.5 table, and confirm on openai.com before committing to a budget.
Cached Input Discount
Cached input is the largest discount OpenAI exposes directly in the price table. When the same prompt prefix repeats across calls, OpenAI charges the cached portion at roughly 10% of the normal input rate. GPT-5.5 cached input is $0.50/1M versus $5.00/1M uncached, a 90% reduction on the repeated prefix.
The cache triggers automatically when a request shares a long identical prefix with a recent prior request. The common pattern: a fixed system prompt, a stable tool schema, and a constant document all live at the front of the prompt, and only the user's latest message changes. The stable front is cached; only the variable tail is billed at full input rate.
| Model | Uncached input | Cached input | Discount |
|---|---|---|---|
| GPT-5.5 | $5.00 | $0.50 | 90% |
| GPT-5.4 | $2.50 | $0.25 | 90% |
| GPT-5.4-mini | $0.75 | $0.075 | 90% |
| GPT-5.4-nano | $0.20 | $0.02 | 90% |
To capture the discount, structure prompts so the stable content comes first and the variable content comes last. An agent that rebuilds its system prompt on every turn, or interleaves changing context with fixed context, defeats the cache and pays full input rate on everything. Order matters more than most teams realize.
Worked Example: A Coding Session
Assume a coding agent session with 100 LLM calls, each consuming 4,000 tokens (3,000 input, 1,000 output on average). The cost depends entirely on which model handles the calls. The table below prices the same workload across the lineup, with no caching applied.
| Model | Input cost | Output cost | Total |
|---|---|---|---|
| GPT-5.5 | $1.50 | $3.00 | $4.50 |
| GPT-5.4 | $0.75 | $1.50 | $2.25 |
| GPT-5.4-mini | $0.225 | $0.45 | $0.675 |
| GPT-5.4-nano | $0.06 | $0.125 | $0.185 |
The all-GPT-5.5 session costs $4.50. The all-nano session costs $0.185, a 24x difference. Neither extreme is right for a real workload: nano cannot handle the hard 15% of prompts, and GPT-5.5 is wasted on the easy 60%. The cheapest correct answer is to route each call to the model that matches its difficulty.
| Tier | Model | Calls | Cost |
|---|---|---|---|
| Easy (60%) | GPT-5.4-nano | 60 | $0.111 |
| Medium (25%) | GPT-5.4-mini | 25 | $0.169 |
| Hard (15%) | GPT-5.5 | 15 | $0.675 |
| Routed total | Mixed | 100 | $0.955 |
The routed session costs $0.955 versus $4.50 for all-GPT-5.5, a 79% reduction, while keeping the flagship model on the hard 15% of prompts where quality matters. This is the same arithmetic that makes routing profitable across every provider, applied to OpenAI's current price table.
Calling the API
The OpenAI API is OpenAI-SDK-compatible. You set the model string, send messages, and receive a response with a usage block that reports exact input and output token counts. Use that usage block to verify your cost estimates against real traffic instead of guessing.
OpenAI SDK call with usage accounting
import OpenAI from "openai";
const openai = new OpenAI();
const response = await openai.chat.completions.create({
model: "gpt-5.4", // $2.50/1M input, $15.00/1M output
messages: [
{ role: "system", content: "You are a coding assistant." },
{ role: "user", content: "Rename fetchUsers to loadUsers across the file." },
],
});
// The usage block reports exact token counts for cost accounting
const { prompt_tokens, completion_tokens } = response.usage;
const inputCost = (prompt_tokens / 1_000_000) * 2.50;
const outputCost = (completion_tokens / 1_000_000) * 15.00;
console.log("Call cost: $" + (inputCost + outputCost).toFixed(6));Morph exposes the same OpenAI-compatible surface at https://api.morphllm.com/v1. Point the OpenAI SDK's baseURL at it and your existing code runs unchanged, with the router selecting the cheapest sufficient model behind a single endpoint.
Same SDK, routed through Morph's single endpoint
import OpenAI from "openai";
// Point the OpenAI SDK at Morph's OpenAI-compatible endpoint
const client = new OpenAI({
apiKey: process.env.MORPH_API_KEY,
baseURL: "https://api.morphllm.com/v1",
});
const response = await client.chat.completions.create({
model: "auto", // router picks the cheapest model that clears the bar
messages: [
{ role: "user", content: "Add a docstring to the parseConfig function." },
],
});
// Easy prompt routes to a small model; hard prompt routes to a frontier model.
// One endpoint, many providers, 40-70% lower spend.How to Reduce OpenAI API Costs
Three levers move OpenAI API spend, in order of impact. The first two are structural and apply within OpenAI; the third works across providers.
1. Route by difficulty
Send the easy 60% of prompts to GPT-5.4-nano at $1.25/1M output instead of GPT-5.5 at $30.00/1M, a 24x gap on those calls. Keep the flagship for the hard 15%.
2. Use cached input
Put the stable system prompt and tool schema first so the prefix caches at $0.50/1M instead of $5.00/1M on GPT-5.5, a 90% discount on the repeated portion.
3. Trim output tokens
Output costs 5-6x input. Constrain max_tokens, ask for terse responses, and avoid re-emitting unchanged content. Output volume is the dominant cost driver.
The router lever is the one most teams skip because it sounds like rebuilding their stack. It is not. A router classifies each prompt and returns the cheapest model that clears the quality bar, so the easy work stops hitting the flagship. Morph's router classifies in ~430ms (~$0.001 per classification) and routes across many models through one OpenAI-compatible endpoint at api.morphllm.com, cutting spend 40-70% without touching the rest of your code. See how the LLM router works and the cost calculator.
Same lever, every provider
The routing arithmetic is not OpenAI-specific. The same approach cuts spend on Anthropic and Google traffic, and a single router can span providers. Compare the Anthropic API pricing table to see the parallel structure: a flagship, a mid-tier at half the price, and a cheap small model for the bulk of the work.
Frequently Asked Questions
How much does the OpenAI API cost?
OpenAI bills per token. As of June 2026, GPT-5.5 (the flagship) costs $5.00/1M input, $0.50/1M cached input, and $30.00/1M output, with a 1M context window. The cheapest model, GPT-5.4-nano, runs $0.20/1M input and $1.25/1M output. A 100-call coding session at 4,000 tokens each costs roughly $4.50 on GPT-5.5 or $0.185 on nano. Verify current numbers on openai.com.
What is the difference between GPT-5.5 and GPT-5.4 pricing?
GPT-5.5 is the flagship at $5.00/1M input, $0.50/1M cached input, and $30.00/1M output. GPT-5.4 is the mid-tier at $2.50/1M input, $0.25/1M cached input, and $15.00/1M output, exactly half the cost on both input and output. For most non-frontier work, GPT-5.4 or GPT-5.4-mini delivers the cost-quality tradeoff that GPT-5.5 does not.
What happened to GPT-4o and GPT-5 pricing?
GPT-4o, GPT-5, GPT-5-mini, and GPT-5-nano are no longer listed on OpenAI's official pricing page. They were succeeded by the GPT-5.4 and GPT-5.5 family. The current general-work equivalent is GPT-5.4 ($2.50/1M input, $15.00/1M output) and the flagship is GPT-5.5. OpenAI no longer publishes prices for the retired models, so any figure quoted for them is stale.
What is the OpenAI cached input discount?
When the same prompt prefix repeats across calls (a fixed system prompt, a tool schema, a constant document), OpenAI charges the cached portion at roughly 10% of the normal input rate. GPT-5.5 cached input is $0.50/1M versus $5.00/1M uncached, a 90% discount. GPT-5.4 cached input is $0.25/1M. The cache applies automatically to repeated prefixes, so put stable content first.
What is the price difference between GPT-5.4-mini and GPT-5.4-nano?
GPT-5.4-mini costs $0.75/1M input, $0.075/1M cached input, and $4.50/1M output. GPT-5.4-nano costs $0.20/1M input, $0.02/1M cached input, and $1.25/1M output. Nano is roughly 3.7x cheaper on input and 3.6x cheaper on output. Nano fits high-volume classification and simple edits; mini handles light reasoning where nano falls short.
How do I reduce OpenAI API costs?
Three levers. Route easy prompts to cheaper models (GPT-5.4-nano at $1.25/1M output versus GPT-5.5 at $30.00/1M, a 24x gap). Use cached input for repeated prefixes, a 90% discount on that portion. Trim output tokens, which cost 5-6x more than input. A router that auto-routes each call to the cheapest sufficient model captures the first lever without code changes.
Does OpenAI charge the same for input and output tokens?
No. Output costs 5x to 6x more than input across every OpenAI model. GPT-5.5 is $5.00/1M input and $30.00/1M output, a 6x ratio. When estimating spend, weight output tokens heavily: a generation-heavy workload costs far more than a read-heavy one at the same total token count.
Related Resources
Stop Paying Flagship Prices for Easy Prompts
Morph's router classifies each prompt in ~430ms and routes it to the cheapest model that clears the bar, across providers through one OpenAI-compatible endpoint at api.morphllm.com. 40-70% lower API spend, no code rewrite. Point the OpenAI SDK's baseURL at Morph and ship.
