Gemini API pricing is per million tokens, split into input and output, and tiered by context length. gemini-2.5-flash-lite is the cheapest at $0.10/1M input and $0.40/1M output. gemini-2.5-flash is $0.30/1M input and $2.50/1M output. gemini-3.1-pro is $2.00/1M input up to 200k context, jumping to $4.00/1M above it. All rates below are from Google's pricing page at ai.google.dev.
The Full Gemini API Price List
The table below lists every current Gemini model and its standard pay-as-you-go rate per million tokens. For tiered models, the rate shown is the base tier, which applies to requests up to 200k tokens of context. The over-200k surcharge and context-caching rates are explained in the sections below. Audio input, where it differs, is noted.
| Model | Context tier | Input $/1M | Output $/1M |
|---|---|---|---|
| gemini-3.5-flash | flat | $1.50 | $9.00 |
| gemini-3.1-pro | up to 200k | $2.00 | $12.00 |
| gemini-3.1-flash-lite | flat | $0.25 | $1.50 |
| gemini-3-flash | up to 200k | $0.50 | $3.00 |
| gemini-2.5-pro | up to 200k | $1.25 | $10.00 |
| gemini-2.5-flash | flat | $0.30 | $2.50 |
| gemini-2.5-flash-lite | flat | $0.10 | $0.40 |
| gemini-2.0-flash (deprecating) | flat | $0.10 | $0.40 |
gemini-2.0-flash shuts down June 1, 2026
gemini-2.0-flash is deprecated. Its pricing ($0.10/1M input, $0.40/1M output) matches gemini-2.5-flash-lite, which is the supported replacement for cheap, high-volume work. Migrate before the shutdown date to avoid broken requests.
Two patterns stand out. First, output costs 3x to 8x more than input across the lineup, so token-heavy generation tasks (long code, long documents) cost far more than read-heavy tasks (classification, extraction). Second, the gap between the cheapest and most expensive models is large: gemini-2.5-flash-lite output at $0.40/1M is 30x cheaper than gemini-3.1-pro output at $12.00/1M.
How Gemini Charges: Per-Token and Tiered
Every Gemini API request is billed on two counters: input tokens (your prompt, system instructions, files, and conversation history) and output tokens (the model's response). You are charged the model's per-million rate for each, prorated to the exact token count. A request that uses 12,000 input tokens and 800 output tokens on gemini-2.5-flash costs 12,000 / 1,000,000 multiplied by $0.30, plus 800 / 1,000,000 multiplied by $2.50, which is $0.0036 plus $0.002, for $0.0056 total.
The wrinkle that breaks naive cost math is the context tier. For gemini-3.1-pro, gemini-3-flash, and gemini-2.5-pro, the per-token rate is not fixed. It depends on the total context length of the request. Stay under 200k tokens and you pay the base rate. Cross 200k and the higher rate applies to the whole request.
Models marked "flat" in the table (gemini-3.5-flash, the Flash-Lite models, gemini-2.5-flash) charge one rate regardless of context length. This makes them simpler to budget for, which is another reason high-volume workloads gravitate to the Flash and Flash-Lite tiers.
The Over-200k Context Surcharge
Three current models raise their price once a request crosses 200,000 tokens of context. The higher rate applies to the entire request, not only the tokens past 200k. A 201k-token request to gemini-3.1-pro is billed entirely at the above-200k rate.
| Model | Input up to 200k | Input above 200k | Output up to 200k | Output above 200k |
|---|---|---|---|---|
| gemini-3.1-pro | $2.00 | $4.00 | $12.00 | $18.00 |
| gemini-2.5-pro | $1.25 | $2.50 | $10.00 | $15.00 |
| gemini-3-flash | $0.50 | $1.00 | $3.00 | $3.00 |
For gemini-3.1-pro, crossing 200k doubles the input rate and raises output by 50%. For gemini-2.5-pro the input rate doubles and output rises 50%. For gemini-3-flash the input rate doubles while output stays at $3.00/1M. Flash-Lite and gemini-2.5-flash have no over-200k tier.
The practical takeaway
If you can keep a request under 200k tokens, do it. Trimming a 210k-token request to 195k on gemini-3.1-pro halves the input rate on the whole request. Chunking, summarization, and retrieval that cap context length below 200k pay for themselves immediately on the Pro models.
Context Caching Pricing
Context caching lets you store a fixed block of input (a long system prompt, a reference document, a codebase snapshot) once and reuse it across many requests at a reduced per-token rate. Gemini bills caching two ways: a per-token rate to read from the cache, plus a per-hour storage fee for keeping the cache warm.
| Model | Cached input $/1M | Above 200k $/1M | Storage $/hr |
|---|---|---|---|
| gemini-3.5-flash | $0.15 | n/a | $1.00 |
| gemini-3.1-pro | $0.20 | $0.40 | $4.50 |
| gemini-3.1-flash-lite | $0.025 | $0.05 | $1.00 |
| gemini-3-flash | $0.05 | $0.10 | $1.00 |
| gemini-2.5-pro | $0.125 | $0.25 | $4.50 |
| gemini-2.5-flash | $0.03 | $0.10 | $1.00 |
| gemini-2.5-flash-lite | $0.01 | $0.03 | $1.00 |
Caching pays off when the same large context is reused enough times that the discount on input tokens outweighs the storage fee. For gemini-2.5-flash, cached input at $0.03/1M is 10x cheaper than the $0.30/1M standard input rate. If you send the same 100k-token document to 50 requests in an hour, caching it once and reading it cheaply beats paying full input price 50 times, even after the $1.00/hr storage fee.
Caching does not help one-off requests or workloads where the context changes every call. It is a fit for chat sessions over a fixed document, agents grounded in a static knowledge base, or batch jobs that share a long system prompt.
Audio Input Costs More
For the models that accept audio, audio input tokens are billed at a higher rate than text, image, or video input. This is a separate counter, so a multimodal request can mix rates within a single call.
| Model | Text/image/video input | Audio input |
|---|---|---|
| gemini-3.1-flash-lite | $0.25 | $0.50 |
| gemini-2.5-flash | $0.30 | $1.00 |
| gemini-2.5-flash-lite | $0.10 | $0.30 |
| gemini-2.0-flash (deprecating) | $0.10 | $0.70 |
On gemini-2.5-flash, audio input is $1.00/1M versus $0.30/1M for text, more than 3x. If your workload streams audio (transcription, voice agents), the audio rate, not the text rate, drives the bill. Budget against the audio column for those use cases.
Worked Cost Example
Take a workload of 1,000,000 input tokens and 200,000 output tokens, all under 200k context per request. The table compares the cost on three tiers using base rates from the price list.
| Model | Input cost | Output cost | Total |
|---|---|---|---|
| gemini-2.5-flash-lite | $0.10 | $0.08 | $0.18 |
| gemini-2.5-flash | $0.30 | $0.50 | $0.80 |
| gemini-3.1-pro | $2.00 | $2.40 | $4.40 |
The same workload costs $0.18 on Flash-Lite, $0.80 on Flash, and $4.40 on Pro. Flash-Lite is 24x cheaper than Pro here. If a meaningful fraction of these prompts are simple enough for Flash-Lite, sending all of them to Pro overpays by an order of magnitude. The whole pricing structure rewards matching the model to the task.
OpenAI-Compatible vs Native Calls
Gemini exposes two API surfaces at the same per-token prices. The native Gemini SDK (google-genai) uses Google's request and response shapes. The OpenAI-compatible endpoint accepts the standard OpenAI Chat Completions format, so existing OpenAI-SDK code can target Gemini by swapping the base URL and model name. Pricing is identical between the two; only the request schema differs.
Calling Gemini through the OpenAI-compatible endpoint
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GEMINI_API_KEY,
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
});
const response = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Summarize this changelog." }],
});
// Billed at gemini-2.5-flash rates: $0.30/1M input, $2.50/1M output.
// Same prices as the native google-genai SDK.The OpenAI-compatible surface matters for cost control because it lets one codebase address Gemini, OpenAI, and other providers through the same client. That is the foundation routing builds on: if every model speaks the same request format, a router can pick the cheapest one that clears the quality bar without rewriting call sites.
Cutting the Bill with Routing
The 24x spread between Flash-Lite and Pro is the cost-savings opportunity. Most prompts in a real workload (classification, extraction, simple edits, short answers) clear the bar on Flash-Lite or Flash. A minority (multi-step reasoning, long-context synthesis) need Pro. Sending everything to Pro pays Pro prices for work a $0.10/1M model handles identically.
A model router classifies each prompt by difficulty, then sends it to the cheapest model that can handle it. Easy turns drop to Flash-Lite; hard turns reach Pro. Across a mixed workload this cuts 40-70% off the bill. Morph's router classifies a prompt in ~430ms at roughly $0.001 per classification, and it exposes many models, including Gemini-class tiers, through a single OpenAI-compatible endpoint at api.morphllm.com.
The router also keeps requests on the right side of the 200k context line where it can, since that boundary doubles the input rate on Pro models. Combined with caching for reused context, routing plus a model-tier policy is the practical way to keep a Gemini bill proportional to the actual difficulty of the work. See how the LLM router works, the broader cost-optimization guide, and the cost calculator.
Route easy turns to Flash-Lite
Classification, extraction, and short answers run on gemini-2.5-flash-lite at $0.10/1M input. Reserve Pro for the prompts that actually need reasoning.
Stay under 200k context
Crossing 200k doubles the input rate on Pro models for the whole request. Chunking and retrieval that cap context below 200k cut the rate in half.
One OpenAI-compatible endpoint
Morph exposes many model tiers through a single OpenAI-format API. Swap the model the router picks without touching call sites.
Frequently Asked Questions
How much does the Gemini API cost?
Gemini API pricing is per million tokens and varies by model. gemini-2.5-flash-lite is the cheapest at $0.10/1M input and $0.40/1M output. gemini-2.5-flash is $0.30/1M input and $2.50/1M output. gemini-3.1-pro is $2.00/1M input (up to 200k context) and $12.00/1M output. gemini-3.5-flash is $1.50/1M input and $9.00/1M output. These are the standard rates on ai.google.dev as of June 2026.
What is the difference between Gemini 2.5 Flash and Pro pricing?
gemini-2.5-flash costs $0.30/1M input and $2.50/1M output. gemini-2.5-pro costs $1.25/1M input and $10.00/1M output up to 200k context, rising to $2.50/1M input and $15.00/1M output above 200k. Pro is roughly 4x the cost of Flash in exchange for stronger reasoning. Most easy prompts do not need Pro.
How does Gemini context caching pricing work?
Caching is billed two ways: a per-token rate to read cached input, plus a per-hour storage fee. For gemini-2.5-flash, cached input is $0.03/1M (up to 200k) with $1.00/hr storage, 10x cheaper than the $0.30/1M standard input rate. For gemini-3.1-pro it is $0.20/1M (up to 200k) or $0.40/1M (above 200k) plus $4.50/hr storage. Caching pays off when you reuse a large fixed context across many calls.
Does Gemini charge more above 200k tokens?
Yes, for several models. gemini-3.1-pro input goes from $2.00/1M up to 200k to $4.00/1M above, and output from $12.00/1M to $18.00/1M. gemini-2.5-pro input goes from $1.25/1M to $2.50/1M and output from $10.00/1M to $15.00/1M. gemini-3-flash input goes from $0.50/1M to $1.00/1M. The higher rate applies to the entire request once it crosses 200k, not only the overflow tokens.
How much does Gemini Flash-Lite cost?
gemini-2.5-flash-lite costs $0.10/1M input (text, image, video) and $0.40/1M output. gemini-3.1-flash-lite costs $0.25/1M input and $1.50/1M output. Flash-Lite is the cheapest tier, built for high-volume, low-complexity work like classification, extraction, and routing.
Does the Gemini API have a free tier?
Google AI Studio offers a free tier with rate limits for testing, separate from the paid API rates here. The per-token prices on this page are the paid pay-as-you-go rates from ai.google.dev. Free-tier quotas change often, so confirm current limits on Google's pricing page before relying on them in production.
Is gemini-2.0-flash still available?
gemini-2.0-flash ($0.10/1M input, $0.40/1M output) is deprecated and scheduled to shut down June 1, 2026. New projects should target gemini-2.5-flash-lite or gemini-2.5-flash instead, which offer comparable or lower pricing with active support.
Related Resources
Pay Gemini Prices Only Where Pro Is Worth It
Morph's router classifies each prompt in ~430ms and routes it to the cheapest model that clears the bar, cutting 40-70% off the bill. Access many model tiers through one OpenAI-compatible endpoint at api.morphllm.com. $0.001 per classification.
