xAI Grok API Pricing (2026): Per-Token Costs for Every Grok Model

xAI Grok API pricing is per-token. As of July 2026, docs.x.ai lists grok-4.5 at $2.00/M input and $6.00/M output on a 500k context, grok-4.3 at $1.25/M input and $2.50/M output on 1M context, grok-code-fast-1 (grok-build-0.1) at $1.00/M input and $2.00/M output on 256k context, and grok-4.20-0309 at $1.25/M input and $2.50/M output. Cached input runs $0.20 to $0.50/M. Every rate doubles on prompts of 200k tokens or more. The older grok-4 and grok-3 are no longer listed.

$2.00/M

grok-4.5 input tokens

$6.00/M

grok-4.5 output tokens

$1.25/M

grok-4.3 input tokens

Rates above 200k tokens

Prices verified as of July 2026

Every price on this page is from xAI's official docs at docs.x.ai as of July 16, 2026. Model pricing changes frequently. Confirm the current rate on docs.x.ai before relying on these numbers.

xAI Grok API Pricing at a Glance

xAI bills the Grok API per token, with separate rates for input (the tokens you send) and output (the tokens the model generates). Output is more expensive than input on every model, and cached input is the cheapest tier. There is no separate per-request fee, but server-side tools (web search, code execution) bill per call on top of tokens.

Four text models are listed on docs.x.ai as of July 2026: grok-4.5 (flagship, 500k context), grok-4.3 (primary chat and coding, 1M context), grok-build-0.1 aliased grok-code-fast-1 (fast agentic coding, 256k context), and grok-4.20-0309 (reasoning, non-reasoning, and multi-agent, 1M context). grok-4.3 and grok-4.20 share the same price. grok-code-fast-1 is the cheapest per token; grok-4.5 is the most expensive.

The practical takeaway: grok-code-fast-1 costs about 20% less per token than grok-4.3 ($1.00 vs $1.25 input, $2.00 vs $2.50 output), so high-volume agentic loops that fit in 256k tokens are cheaper there. Reach for grok-4.5 only when its top-end reasoning earns the $6.00/M output. And whichever model you pick, keep prompts under 200k tokens or every rate doubles.

Full Price Table by Model

All rates are per million tokens, in USD, from docs.x.ai as of July 2026, for prompts under 200k tokens. Cached input applies to context that the API has already seen and stored, billed at a discount versus fresh input. See the next section for the doubled rates that apply at 200k tokens and above.

xAI Grok API pricing (per 1M tokens, under 200k prompt, July 2026)

Model	Context	Input	Output	Cached input
grok-4.5	500k	$2.00	$6.00	$0.50
grok-4.3	1M	$1.25	$2.50	$0.20
grok-code-fast-1 (grok-build-0.1)	256k	$1.00	$2.00	$0.20
grok-4.20 (grok-4.20-0309)	1M	$1.25	$2.50	$0.20

grok-4.5 is the flagship and the most expensive per token, with $6.00/M output. grok-4.3 and grok-4.20-0309 share $1.25/$2.50 per M and a 1M context. grok-code-fast-1 is the cheapest at $1.00/$2.00 per M, on a 256k context. All four publish a cached-input rate, from $0.20/M on the 4.3-class models up to $0.50/M on grok-4.5.

$1.00/M

grok-code-fast-1 input

$2.00/M

grok-code-fast-1 output

256k

grok-code-fast-1 context

20%

Cheaper per token vs grok-4.3

The 200k-Token Tier: Rates Double

xAI splits pricing at a 200k-token prompt threshold. Requests under 200k tokens pay the standard rate. At 200k tokens or more, input, cached input, and output all double. This is the single detail most Grok pricing pages miss, and it is the difference between a $0.63 session and a $1.26 session on the same model.

Doubled rates at 200k+ prompt tokens (per 1M tokens)

Model	Input	Cached input	Output
grok-4.5	$4.00	$1.00	$12.00
grok-4.3	$2.50	$0.40	$5.00
grok-code-fast-1	$2.00	$0.40	$4.00
grok-4.20-0309	$2.50	$0.40	$5.00

Watch the threshold on repo-scale context

An agent that pastes a large repository or a long document past 200k tokens pays double on the entire request, not just the tokens over the line. Trimming the prompt back under 200k, or splitting the task, can halve the bill. This is a stronger lever than model choice when your context is large.

Server-Side Tool Pricing

xAI bills its server-side tools per call, on top of token usage. If a request invokes web search or runs code, that call is added to the token cost of the same request.

xAI server-side tool pricing

Tool	Price	Billing
Web search	$5.00	per 1,000 calls
X search	$5.00	per 1,000 calls
Code execution	$5.00	per 1,000 calls
Collections search	$2.50	per 1,000 calls
File attachments	$10.00	per 1,000 calls

Image and video understanding and remote MCP tools are billed on tokens rather than per call. On agentic workloads that hit web or X search on every turn, tool calls can rival the token cost, so count them into any budget estimate.

Per-Model Notes

The three listed models map to distinct jobs. Picking the right one is the first lever on cost, before any caching or routing.

grok-4.5

xAI's flagship. $2.00/M input, $6.00/M output, $0.50/M cached input, 500k context. The top-end reasoning model, and the most expensive per token. Reach for it when the task justifies $6.00/M output; otherwise grok-4.3 does general coding for less.

grok-4.3

xAI's primary chat and coding model. $1.25/M input, $2.50/M output, $0.20/M cached input, 1M-token context. Default choice for general coding, reasoning, and chat, and the largest context of the lineup. Recommended over the retired grok-4.

grok-code-fast-1

Alias for grok-build-0.1, xAI's fast agentic-coding model. $1.00/M input, $2.00/M output, $0.20/M cached input, 256k context. Cheapest per token, built for high-volume agentic loops that fit inside 256k.

grok-4.20-0309

Covers reasoning, non-reasoning, and multi-agent modes. $1.25/M input, $2.50/M output, $0.20/M cached input, 1M-token context. Same price as grok-4.3, different serving modes.

Output dominates the bill

Output tokens cost 2x to 3x input on every Grok model ($2.50 vs $1.25 on grok-4.3, $6.00 vs $2.00 on grok-4.5, $2.00 vs $1.00 on grok-code-fast-1). In agentic coding, generated diffs, tool calls, and reasoning traces are the output. Trimming verbose output, not input, moves the bill the most.

Retired Models: grok-4, grok-3, grok-4-fast

As of June 2026, grok-4, grok-3, and grok-4-fast are no longer listed on xAI's official models pricing page. They have been succeeded by the grok-4.3 generation. xAI now recommends grok-4.3 as the primary chat and coding model.

Because xAI no longer publishes rates for the retired models, this page does not quote prices for them. Any number you find for grok-4 or grok-3 today is from a cached or third-party source and may not match what xAI bills. For a new integration, use grok-4.3 for general work or grok-code-fast-1 for fast agentic coding.

Why retired prices are omitted

Quoting a price xAI no longer publishes would be guessing. The FACTS rule on this page is to state only what the official source confirms. If you have an existing integration pinned to grok-4 or grok-3, check your xAI console for the rate you are actually billed and plan a migration to grok-4.3.

Worked Cost Example: A Coding Session

Take a coding agent session with 100 API calls. Assume each call averages 3,000 input tokens and 1,000 output tokens, for 4,000 tokens per call and 400,000 tokens total (300k input, 100k output). Below is the cost on each model at list price, with no caching.

Cost of a 100-call session (300k input, 100k output)

Model	Input cost	Output cost	Total
grok-4.3 ($1.25 / $2.50)	$0.375	$0.250	$0.625
grok-code-fast-1 ($1.00 / $2.00)	$0.300	$0.200	$0.500
grok-4.20 ($1.25 / $2.50)	$0.375	$0.250	$0.625

grok-4.3 costs about $0.63 for the session; grok-code-fast-1 about $0.50, roughly 20% less. Now add caching. If 200k of the 300k input tokens are repeated context (system prompt, file contents, tool schemas) billed at $0.20/M instead of full input rate, the grok-4.3 input cost drops from $0.375 to about $0.165 (100k fresh at $1.25/M = $0.125, plus 200k cached at $0.20/M = $0.04). The session total falls to roughly $0.42, a 33% cut from caching alone.

$0.63

grok-4.3, 100-call session

$0.50

grok-code-fast-1, same session

$0.42

grok-4.3 with cached context

33%

Saved by caching input

Scale this to a team running 10,000 sessions per month and the gap is material: about $6,300/month on grok-4.3 at list price, versus about $4,200/month with caching, versus about $5,000/month if every session ran on grok-code-fast-1. Mixing the two by difficulty beats any single-model choice.

How to Call the Grok API

The xAI API is OpenAI-compatible. Point the OpenAI SDK at xAI's base URL, pass your xAI key, and set the model name. Chat-completions code written for OpenAI runs unchanged.

Calling Grok with the OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

# grok-4.3: primary chat/coding, 1M context, $1.25/$2.50 per M
resp = client.chat.completions.create(
    model="grok-4.3",
    messages=[
        {"role": "system", "content": "You are a coding assistant."},
        {"role": "user", "content": "Refactor this function to use async/await."},
    ],
)
print(resp.choices[0].message.content)

# grok-code-fast-1: fast agentic coding, 256k context, $1.00/$2.00 per M
fast = client.chat.completions.create(
    model="grok-code-fast-1",
    messages=[{"role": "user", "content": "Add error handling to parseConfig()"}],
)

The same pattern works from the TypeScript OpenAI SDK by setting baseURL to https://api.x.ai/v1. Because the surface matches OpenAI, you can swap a Grok call into existing code by changing the base URL, key, and model string only.

Calling Grok with the OpenAI SDK (TypeScript)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "grok-4.3",
  messages: [{ role: "user", content: "Write a unit test for sum()" }],
});
console.log(resp.choices[0].message.content);

How to Reduce Grok API Costs

Four levers cut a Grok bill, in rough order of impact: cache repeated context, route by difficulty, trim output, and pick the right model for each task.

Cost-reduction levers for the Grok API

Lever	Mechanism	Typical impact
Cache input	Reuse stored context at $0.20/M instead of $1.25/M fresh input	20-40% off input
Route by difficulty	Easy turns to grok-code-fast-1, hard turns to grok-4.3	40-70% on mixed loads
Trim output	Output costs 2x input; cap verbose generations	Direct, per-token
Right-size context	Use 256k grok-code-fast-1 when 1M is not needed	20% per token

Routing is the largest lever on a mixed workload because most coding turns are easy. A router classifies each prompt and sends boilerplate, simple edits, and documentation to the cheaper model while reserving grok-4.3 for architecture and complex debugging. The savings come from the volume of easy turns, not from any single expensive call.

Morph's model router automates this. It classifies prompt difficulty in ~430ms into four tiers (easy, medium, hard, needs_info) and routes each call to the cheapest model that clears the quality bar, for 40-70% API cost savings at about $0.001 per classification. It exposes one OpenAI-compatible endpoint at api.morphllm.com across providers, so the same code can reach Grok, Claude, GPT, and Gemini models without per-provider plumbing. See LLM cost optimization for the full set of techniques, and the LLM cost calculator to model your own spend.

Routing beats single-model selection

On a mixed coding workload, no single Grok model is optimal: grok-code-fast-1 is cheapest but caps at 256k context, grok-4.3 carries 1M but costs 20% more per token. A difficulty-aware router gets the cheap rate on the 60% of turns that are easy and the large context on the hard turns that need it, beating any fixed choice.

Frequently Asked Questions

How much does the Grok API cost?

As of June 2026, grok-4.3 costs $1.25 per million input tokens and $2.50 per million output tokens, with $0.20 per million cached input on a 1M-token context. grok-build-0.1 (grok-code-fast-1) costs $1.00 input and $2.00 output per million on a 256k context. grok-4.20 matches grok-4.3 at $1.25 input and $2.50 output. Verify current numbers on x.ai.

What is the difference between grok-4.3 and grok-code-fast pricing?

grok-4.3 costs $1.25/M input and $2.50/M output on a 1M-token context and is xAI's primary chat and coding model. grok-code-fast-1 (grok-build-0.1) costs $1.00/M input and $2.00/M output on a 256k context. grok-code-fast-1 is about 20% cheaper per token but carries a quarter of the context window. Use it for high-volume agentic loops; use grok-4.3 for large-context tasks.

Does the Grok API have a free tier?

xAI's published pricing is per-token usage-based, with no free per-token allowance listed on the official models page as of June 2026. Promotional credits and trials have appeared and changed over time, so check x.ai for any current free credits. The numbers on this page are the standard usage-based rates.

Does Grok API pricing change by context window?

The per-token rate is flat per model regardless of how full the context window is, unlike some providers that charge a higher tier above 200k tokens. grok-4.3 and grok-4.20 bill $1.25/M input and $2.50/M output across their full 1M-token window. grok-code-fast-1 bills $1.00/M input and $2.00/M output across its 256k window. You pay for the tokens you send and receive.

Is the Grok API OpenAI-compatible?

Yes. The xAI API is OpenAI-compatible. Point the OpenAI SDK at https://api.x.ai/v1, supply your xAI API key, and set the model to grok-4.3 or grok-code-fast-1. Existing OpenAI-shaped code that uses chat completions works without a rewrite.

How do I reduce Grok API costs?

Cache repeated context at $0.20/M cached input instead of resending system prompts, files, and tool definitions at full price. Route easy turns to grok-code-fast-1 ($1.00/$2.00 per M) and reserve grok-4.3 for hard turns. Cut output tokens, which cost 2x input. A model router that classifies prompt difficulty automates the routing and can save 40-70% across mixed workloads.

What happened to grok-4 and grok-3 pricing?

As of June 2026, grok-4, grok-3, and grok-4-fast are no longer listed on xAI's official models pricing page. They have been succeeded by the grok-4.3 generation. Because xAI no longer publishes their rates, this page does not quote prices for the retired models. Use grok-4.3 or grok-code-fast-1 for new integrations.

Related Resources

Private deployments

The fastest endpoints are private deployments

Morph's top speeds come from dedicated deployments, not shared public endpoints: speculators trained on your traffic, caching tuned to your workload, and volume discounts over public per-token rates. Over 100 billion tokens per day run this way.

Talk to us about a private deployment

Stop Overpaying for Grok and Every Other Model

Morph's model router classifies prompt difficulty in ~430ms and routes each call to the cheapest model that clears the quality bar, for 40-70% API cost savings at about $0.001 per classification. One OpenAI-compatible endpoint at api.morphllm.com reaches Grok, Claude, GPT, and Gemini without per-provider plumbing.

Try the Router

View API Docs

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers