Grok API Pricing (2026): Per-Token Costs for Every xAI Model

Grok API pricing as of June 2026: grok-4.3 costs $1.25/M input and $2.50/M output with a 1M-token context and $0.20/M cached input. grok-build-0.1 (alias grok-code-fast-1) runs $1.00/M input and $2.00/M output on 256k context. A typical coding session of 100 calls at 4,000 tokens costs roughly $0.60 on grok-4.3. Prices are from x.ai's official docs; verify before relying on them.

June 18, 2026 · 2 min read
Grok API Pricing (2026): Per-Token Costs for Every xAI Model

Grok API pricing is per-token. As of June 2026, xAI's official models page lists grok-4.3 at $1.25 per million input tokens and $2.50 per million output tokens on a 1M-token context, grok-build-0.1 (alias grok-code-fast-1) at $1.00 input and $2.00 output on 256k context, and grok-4.20 at $1.25 input and $2.50 output. Cached input is $0.20 per million. The older grok-4 and grok-3 are no longer listed.

$1.25/M
grok-4.3 input tokens
$2.50/M
grok-4.3 output tokens
$0.20/M
Cached input (all models)
1M
grok-4.3 context window

Prices verified as of June 2026

Every price on this page is from xAI's official docs at docs.x.ai as of June 18, 2026. Model pricing changes frequently. Confirm the current rate on x.ai before relying on these numbers.

Grok API Pricing at a Glance

xAI bills the Grok API per token, with separate rates for input (the tokens you send) and output (the tokens the model generates). Output is more expensive than input on every model, and cached input is the cheapest tier. There is no separate per-request fee.

Three models are listed on xAI's official models page as of June 2026: grok-4.3 (primary chat and coding), grok-build-0.1 aliased grok-code-fast-1 (fast agentic coding), and grok-4.20 (reasoning, non-reasoning, and multi-agent). grok-4.3 and grok-4.20 share the same price and a 1M-token context. grok-code-fast-1 is cheaper per token but has a 256k context.

The practical takeaway: grok-code-fast-1 costs about 20% less per token than grok-4.3 ($1.00 vs $1.25 input, $2.00 vs $2.50 output), so high-volume agentic loops that fit in 256k tokens are cheaper there. Large-context tasks that need the full 1M window run on grok-4.3.

Full Price Table by Model

All rates are per million tokens, in USD, from xAI's official docs as of June 2026. Cached input applies to context that the API has already seen and stored, billed at a discount versus fresh input.

ModelContextInputOutputCached input
grok-4.31M$1.25$2.50$0.20
grok-code-fast-1 (grok-build-0.1)256k$1.00$2.00$0.20
grok-4.20 (grok-4.20-0309)1M$1.25$2.50n/a

grok-4.20-0309 lists $1.25/M input and $2.50/M output on a 1M-token context on xAI's models page; a cached-input rate is not separately published for it, so it is marked n/a above rather than assumed. grok-4.3 and grok-code-fast-1 both publish a $0.20/M cached input rate.

$1.00/M
grok-code-fast-1 input
$2.00/M
grok-code-fast-1 output
256k
grok-code-fast-1 context
20%
Cheaper per token vs grok-4.3

Per-Model Notes

The three listed models map to distinct jobs. Picking the right one is the first lever on cost, before any caching or routing.

grok-4.3

xAI's primary chat and coding model. $1.25/M input, $2.50/M output, $0.20/M cached input, 1M-token context. Default choice for general coding, reasoning, and chat through the API. xAI now recommends it over the retired grok-4.

grok-code-fast-1

Alias for grok-build-0.1, xAI's fast agentic-coding model. $1.00/M input, $2.00/M output, $0.20/M cached input, 256k context. Cheaper per token, built for high-volume agentic loops that fit inside 256k.

grok-4.20

Full id grok-4.20-0309. Covers reasoning, non-reasoning, and multi-agent modes. $1.25/M input, $2.50/M output, 1M-token context. Same price as grok-4.3.

Output dominates the bill

Output tokens cost 2x input on every Grok model ($2.50 vs $1.25 on grok-4.3, $2.00 vs $1.00 on grok-code-fast-1). In agentic coding, generated diffs, tool calls, and reasoning traces are the output. Trimming verbose output, not input, moves the bill the most.

Retired Models: grok-4, grok-3, grok-4-fast

As of June 2026, grok-4, grok-3, and grok-4-fast are no longer listed on xAI's official models pricing page. They have been succeeded by the grok-4.3 generation. xAI now recommends grok-4.3 as the primary chat and coding model.

Because xAI no longer publishes rates for the retired models, this page does not quote prices for them. Any number you find for grok-4 or grok-3 today is from a cached or third-party source and may not match what xAI bills. For a new integration, use grok-4.3 for general work or grok-code-fast-1 for fast agentic coding.

Why retired prices are omitted

Quoting a price xAI no longer publishes would be guessing. The FACTS rule on this page is to state only what the official source confirms. If you have an existing integration pinned to grok-4 or grok-3, check your xAI console for the rate you are actually billed and plan a migration to grok-4.3.

Worked Cost Example: A Coding Session

Take a coding agent session with 100 API calls. Assume each call averages 3,000 input tokens and 1,000 output tokens, for 4,000 tokens per call and 400,000 tokens total (300k input, 100k output). Below is the cost on each model at list price, with no caching.

ModelInput costOutput costTotal
grok-4.3 ($1.25 / $2.50)$0.375$0.250$0.625
grok-code-fast-1 ($1.00 / $2.00)$0.300$0.200$0.500
grok-4.20 ($1.25 / $2.50)$0.375$0.250$0.625

grok-4.3 costs about $0.63 for the session; grok-code-fast-1 about $0.50, roughly 20% less. Now add caching. If 200k of the 300k input tokens are repeated context (system prompt, file contents, tool schemas) billed at $0.20/M instead of full input rate, the grok-4.3 input cost drops from $0.375 to about $0.165 (100k fresh at $1.25/M = $0.125, plus 200k cached at $0.20/M = $0.04). The session total falls to roughly $0.42, a 33% cut from caching alone.

$0.63
grok-4.3, 100-call session
$0.50
grok-code-fast-1, same session
$0.42
grok-4.3 with cached context
33%
Saved by caching input

Scale this to a team running 10,000 sessions per month and the gap is material: about $6,300/month on grok-4.3 at list price, versus about $4,200/month with caching, versus about $5,000/month if every session ran on grok-code-fast-1. Mixing the two by difficulty beats any single-model choice.

How to Call the Grok API

The xAI API is OpenAI-compatible. Point the OpenAI SDK at xAI's base URL, pass your xAI key, and set the model name. Chat-completions code written for OpenAI runs unchanged.

Calling Grok with the OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

# grok-4.3: primary chat/coding, 1M context, $1.25/$2.50 per M
resp = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "system", "content": "You are a coding assistant."},
        {"role": "user", "content": "Refactor this function to use async/await."},
    ],
)
print(resp.choices[0].message.content)

# grok-code-fast-1: fast agentic coding, 256k context, $1.00/$2.00 per M
fast = client.chat.completions.create(
    model="grok-code-fast-1",
    messages=[{"role": "user", "content": "Add error handling to parseConfig()"}],
)

The same pattern works from the TypeScript OpenAI SDK by setting baseURL to https://api.x.ai/v1. Because the surface matches OpenAI, you can swap a Grok call into existing code by changing the base URL, key, and model string only.

Calling Grok with the OpenAI SDK (TypeScript)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "grok-4.3",
  messages: [{ role: "user", content: "Write a unit test for sum()" }],
});
console.log(resp.choices[0].message.content);

How to Reduce Grok API Costs

Four levers cut a Grok bill, in rough order of impact: cache repeated context, route by difficulty, trim output, and pick the right model for each task.

LeverMechanismTypical impact
Cache inputReuse stored context at $0.20/M instead of $1.25/M fresh input20-40% off input
Route by difficultyEasy turns to grok-code-fast-1, hard turns to grok-4.340-70% on mixed loads
Trim outputOutput costs 2x input; cap verbose generationsDirect, per-token
Right-size contextUse 256k grok-code-fast-1 when 1M is not needed20% per token

Routing is the largest lever on a mixed workload because most coding turns are easy. A router classifies each prompt and sends boilerplate, simple edits, and documentation to the cheaper model while reserving grok-4.3 for architecture and complex debugging. The savings come from the volume of easy turns, not from any single expensive call.

Morph's model router automates this. It classifies prompt difficulty in ~430ms into four tiers (easy, medium, hard, needs_info) and routes each call to the cheapest model that clears the quality bar, for 40-70% API cost savings at about $0.001 per classification. It exposes one OpenAI-compatible endpoint at api.morphllm.com across providers, so the same code can reach Grok, Claude, GPT, and Gemini models without per-provider plumbing. See LLM cost optimization for the full set of techniques, and the LLM cost calculator to model your own spend.

Routing beats single-model selection

On a mixed coding workload, no single Grok model is optimal: grok-code-fast-1 is cheapest but caps at 256k context, grok-4.3 carries 1M but costs 20% more per token. A difficulty-aware router gets the cheap rate on the 60% of turns that are easy and the large context on the hard turns that need it, beating any fixed choice.

Frequently Asked Questions

How much does the Grok API cost?

As of June 2026, grok-4.3 costs $1.25 per million input tokens and $2.50 per million output tokens, with $0.20 per million cached input on a 1M-token context. grok-build-0.1 (grok-code-fast-1) costs $1.00 input and $2.00 output per million on a 256k context. grok-4.20 matches grok-4.3 at $1.25 input and $2.50 output. Verify current numbers on x.ai.

What is the difference between grok-4.3 and grok-code-fast pricing?

grok-4.3 costs $1.25/M input and $2.50/M output on a 1M-token context and is xAI's primary chat and coding model. grok-code-fast-1 (grok-build-0.1) costs $1.00/M input and $2.00/M output on a 256k context. grok-code-fast-1 is about 20% cheaper per token but carries a quarter of the context window. Use it for high-volume agentic loops; use grok-4.3 for large-context tasks.

Does the Grok API have a free tier?

xAI's published pricing is per-token usage-based, with no free per-token allowance listed on the official models page as of June 2026. Promotional credits and trials have appeared and changed over time, so check x.ai for any current free credits. The numbers on this page are the standard usage-based rates.

Does Grok API pricing change by context window?

The per-token rate is flat per model regardless of how full the context window is, unlike some providers that charge a higher tier above 200k tokens. grok-4.3 and grok-4.20 bill $1.25/M input and $2.50/M output across their full 1M-token window. grok-code-fast-1 bills $1.00/M input and $2.00/M output across its 256k window. You pay for the tokens you send and receive.

Is the Grok API OpenAI-compatible?

Yes. The xAI API is OpenAI-compatible. Point the OpenAI SDK at https://api.x.ai/v1, supply your xAI API key, and set the model to grok-4.3 or grok-code-fast-1. Existing OpenAI-shaped code that uses chat completions works without a rewrite.

How do I reduce Grok API costs?

Cache repeated context at $0.20/M cached input instead of resending system prompts, files, and tool definitions at full price. Route easy turns to grok-code-fast-1 ($1.00/$2.00 per M) and reserve grok-4.3 for hard turns. Cut output tokens, which cost 2x input. A model router that classifies prompt difficulty automates the routing and can save 40-70% across mixed workloads.

What happened to grok-4 and grok-3 pricing?

As of June 2026, grok-4, grok-3, and grok-4-fast are no longer listed on xAI's official models pricing page. They have been succeeded by the grok-4.3 generation. Because xAI no longer publishes their rates, this page does not quote prices for the retired models. Use grok-4.3 or grok-code-fast-1 for new integrations.

Related Resources

Stop Overpaying for Grok and Every Other Model

Morph's model router classifies prompt difficulty in ~430ms and routes each call to the cheapest model that clears the quality bar, for 40-70% API cost savings at about $0.001 per classification. One OpenAI-compatible endpoint at api.morphllm.com reaches Grok, Claude, GPT, and Gemini without per-provider plumbing.