MiniMax M2: 230B MoE Coding Model, Versions, API, Pricing, Benchmarks (2026)

Last updated July 2026.

~10B active

“MiniMax M2 activates about 10B of its 230B parameters per token. A small active footprint on a big expert pool is why it runs agentic coding cheaply, well below frontier models of similar ability.”

MiniMax M2 technical report, October 2025

MiniMax M2 is an open-weight mixture-of-experts model from MiniMax, released October 27, 2025 and built for agentic coding. Roughly 230B total parameters, ~10B active per token, and a context window near 200K. It is not a single model but a version line: M2, then M2.1, M2.5, and M2.7, each a refresh of the same architecture. Morph serves the latest, M2.7, as morph-minimax27-230b at $0.279/M input and $1.20/M output with a 192K context.

What it is

A ~230B MoE with ~10B active per token, purpose-built for agentic coding and tool use rather than chat. Open weights on Hugging Face. Iterated through M2.1 / M2.5 / M2.7 checkpoints; M2.7 is current.

Why it matters

It set the open-weight agentic-coding price/performance bar at launch: frontier-adjacent coding ability at a fraction of the cost, because only ~10B parameters fire per token. Cheap enough to run inside a many-call agent loop.

What Is MiniMax M2?

MiniMax M2 is an open-weight mixture-of-experts (MoE) model from the Shanghai lab MiniMax, first released October 27, 2025. MiniMax built it for a specific job: agentic coding, terminal and tool-use loops, and the kind of multi-step work a coding agent does, rather than open-ended chat. It has about 230 billion total parameters with roughly 10 billion active per token, so despite the large weight count it is cheap and fast to serve.

$0.279 / $1.20

morph-minimax27-230b input / output per 1M tokens, 192K context

The small active footprint is the whole point. Only ~10B of the 230B parameters fire on any given token, so per-token compute (and therefore price) lands well below dense models or larger MoEs of comparable coding ability. That is what let the M2 line top the open-weight agentic-coding tier cheaply on release. Morph serves the current M2.7 checkpoint as morph-minimax27-230b, one of its fast general coding models, on custom kernels via an OpenAI-compatible API. See pricing.

The M2 Version Line: M2 → M2.1 → M2.5 → M2.7

The single most confusing thing about "MiniMax M2" is that it names a family, not one model. Since the October 2025 launch, MiniMax has shipped several point checkpoints, each a refresh of the same 230B/10B-active MoE, keeping the parameter count, price band, and interleaved-thinking format constant while raising benchmark scores and tuning behavior. Knowing which checkpoint a provider serves matters more than the "M2" label.

MiniMax M2 checkpoints

Checkpoint	Released	What changed	Served by Morph
M2	Oct 27, 2025	Original agentic-coding release; topped the open-weight tier cheaply	—
M2.1	Late 2025	Coding-quality refresh (MiniMax reported ~74% SWE-bench Verified)	—
M2.5	Early 2026	Larger jump (~80% SWE-bench Verified reported)	—
M2.7	Mar 18, 2026	Latest M2-line checkpoint; served as morph-minimax27-230b	Yes

Two takeaways. First, when you read a "MiniMax M2" benchmark online, check which checkpoint it ran, since scores climbed meaningfully across the line. Second, M3 is a different model, not an M2 point release: it is a larger 428B multimodal MoE with a new attention design, covered on the MiniMax M3 page.

Checkpoint dates and scores

The M2 (Oct 27, 2025) and M2.7 (Mar 18, 2026) release dates are from MiniMax and third-party trackers. The M2.1 (~74%) and M2.5 (~80%) SWE-bench Verified figures are MiniMax-reported for those specific checkpoints; intermediate release dates are approximate. Treat per-checkpoint scores as version-specific rather than a single "M2" number.

MiniMax M2 Architecture: 230B MoE, ~10B Active

MiniMax M2 is a sparse mixture-of-experts model: about 230 billion total parameters spread across a large pool of experts, with a router that activates only a small subset per token, roughly 10 billion active parameters. The backbone is a standard decoder-only Transformer with grouped-query attention, tuned for the long tool-use trajectories that agentic coding produces rather than for one-shot chat.

~230B / ~10B

Total / active params (MoE)

~200K

Context window (official)

Agentic coding

Purpose-built target

The design bet is that a small active footprint is worth more than raw size for agent work. An agent makes many model calls per task, so per-call cost and latency compound. A ~10B-active MoE keeps both low while the large expert pool preserves enough capacity to stay competitive on coding benchmarks. That is the same reasoning behind other cheap-to-serve MoEs, executed here specifically for tool-driven coding.

MiniMax M2 architecture at a glance

Specification	MiniMax M2 (M2.7 checkpoint)
Total parameters	~230 billion
Active parameters per token	~10 billion
Architecture	Sparse MoE, decoder-only Transformer
Attention	Grouped-query attention (GQA)
Context window (official)	~200K tokens
Context window (Morph)	196,608 tokens (192K)
Modality	Text in, text out
Original release	October 27, 2025
Morph model id	morph-minimax27-230b

MiniMax M2 Benchmarks

MiniMax positions M2 as an agentic-coding model, and its benchmarks reflect that: the line is measured on SWE-bench Verified, terminal and tool-use tasks, and browsing, not on trivia. Read scores by checkpoint. At launch M2 led the open-weight agentic tier; MiniMax then reported roughly 74% SWE-bench Verified for M2.1 and about 80% for M2.5, with the M2.5 checkpoint also posting strong multi-repo and browsing numbers.

MiniMax M2 line: reported agentic-coding scores by checkpoint

Checkpoint	SWE-bench Verified	Other reported	Source
M2 (launch)	Topped open-weight tier	Led agentic coding at a fraction of frontier cost	MiniMax launch
M2.1	~74%	Coding-quality refresh	MiniMax-reported
M2.5	~80.2%	51.3% Multi-SWE-Bench, 76.3% BrowseComp	MiniMax-reported

The honest framing: these are vendor-reported figures for specific checkpoints, and harness/scaffold choices move SWE-bench numbers by several points, so treat them as directional. For a text-only coding default with the highest aggregate intelligence in the open-weight tier, GLM 5.2 leads; M2's claim is cost per useful agent step, not top of the leaderboard. Check the live Artificial Analysis board before committing volume.

The Interleaved-Thinking Tool-Calling Gotcha

MiniMax M2 uses interleaved thinking: it reasons between tool calls inside <think> tags. The footgun most people hit first is that MiniMax's own docs require you to feed the model's reasoning back on every turn. Strip the thinking content between tool calls and multi-turn tool use degrades. This is the single most common M2 integration bug, and it carries forward unchanged to M3.

The rule, from MiniMax's function-call docs

MiniMax's tool-use guide states the complete model response (the full assistant message) "must be append to the conversation history to maintain the continuity of the reasoning chain." For the OpenAI-format path it is explicit: do not modify the content field; preserve the model's thinking content completely, i.e. <think>reasoning_content</think>. Omitting the thinking "breaks the chain." Many agent frameworks strip reasoning blocks between turns by default to save tokens, which is exactly what silently hurts M2 tool-calling quality.

Practical fixes: on the OpenAI-compatible path, echo back the assistant message with its <think> content intact rather than dropping it; on an Anthropic-style path, append the full response content list (thinking, text, tool_use) to history. If your harness has a "strip reasoning" or "drop thinking" setting, turn it off for M2. This one change is often the difference between M2 looking broken and M2 matching its benchmark scores in a real Cline or Claude Code loop.

Pricing and How to Run MiniMax M2 on Morph

MiniMax's official API lists M2 at $0.30/M input and $1.20/M output. Morph serves the current M2.7 checkpoint as morph-minimax27-230b at $0.279/M input and $1.20/M output with a 196,608-token context, on custom low-level kernels tuned for code generation.

$0.279 / $1.20

Morph input / output per 1M tokens

192K

Context window on Morph

OpenAI-compatible

Drop-in /chat/completions

Point any OpenAI-SDK client at https://api.morphllm.com/v1 and pass the model id. Because Morph is OpenAI-compatible, M2 drops into existing tooling without a proxy:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.morphllm.com/v1",
    api_key="$MORPH_API_KEY",
)

resp = client.chat.completions.create(
    model="morph-minimax27-230b",
    messages=[
        {"role": "user", "content": "Add a retry with backoff to this fetch helper."},
    ],
)
print(resp.choices[0].message.content)

For multi-turn tool calling, echo the full assistant message (including any <think> content) back into messages on the next turn, per the interleaved-thinking rule above. See Morph Open Source Models for the full lineup and pricing for every model's rate.

MiniMax M2 API: Pricing and How to Call It

The MiniMax M2 API is OpenAI-compatible, so any client that speaks the ChatCompletions format can call it by changing two things: the base URL and the model id. Morph serves the M2.7 checkpoint as morph-minimax27-230b at $0.279/M input and $1.20/M output with a 192K context. MiniMax's own platform lists M2 at $0.30/M input and $1.20/M output.

$0.279 / $1.20

Morph input / output per 1M tokens

192K

Context window on Morph

Bearer auth

Authorization header, OpenAI-compatible

Call the MiniMax M2 API with the OpenAI SDK

Point any OpenAI-SDK client at https://api.morphllm.com/v1, pass your key as a Bearer token, and set the model to morph-minimax27-230b. No proxy or shim:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.morphllm.com/v1",
    api_key="YOUR_MORPH_API_KEY",
)

resp = client.chat.completions.create(
    model="morph-minimax27-230b",
    messages=[
        {"role": "user", "content": "Add a retry with backoff to this fetch helper."},
    ],
)
print(resp.choices[0].message.content)

The raw HTTP call is the same shape: POST to https://api.morphllm.com/v1/chat/completions with an Authorization: Bearer YOUR_MORPH_API_KEY header and a JSON body naming morph-minimax27-230b.

MiniMax M2 API pricing compared

MiniMax M2 API pricing: Morph vs official MiniMax (July 2026)

Provider	Model id	Input / 1M	Output / 1M	Context	Notes
Morph	morph-minimax27-230b	$0.279	$1.20	192K	M2.7 checkpoint, codegen kernels
MiniMax (official)	MiniMax-M2	$0.30	$1.20	~200K	First-party pay-as-you-go

Getting a MiniMax API key on Morph

Sign up at morphllm.com, create a key in the dashboard, and pass it as Authorization: Bearer YOUR_MORPH_API_KEY (the OpenAI SDK reads it from api_key). One key works for every model in the lineup, including GLM 5.2, MiniMax M3, and DeepSeek V4 Flash, so switching between them is a model-string change, not a new integration.

MiniMax M2 vs M3: Should You Upgrade?

M3 is a larger, multimodal successor, not an M2 point release. M2 is a ~230B MoE with ~10B active per token, text-only, purpose-built for agentic coding. M3 nearly doubles total parameters to 428B and active params to ~23B, adds native image and video input, swaps in MiniMax Sparse Attention to make a 1M context affordable, and raises the text-intelligence ceiling. The trade is cost: M2 is the cheaper, faster model per token; M3 is the more capable one.

MiniMax M2 (M2.7) vs M3

Dimension	MiniMax M2 / M2.7	MiniMax M3
Total parameters	~230B	428B (1.9x larger)
Active parameters	~10B per token	~23B per token
Modality	Text only	Text + image + video
Attention	GQA	GQA + MiniMax Sparse Attention (MSA)
Context (official)	~200K	1M
Interleaved thinking	Yes	Yes
Morph model id	morph-minimax27-230b	morph-minimax3-428b
Morph input / output / 1M	$0.279 / $1.20	$0.60 / $2.40

Both share the interleaved-thinking format and the same tool-calling rule, so the "don't strip the reasoning" guidance applies to either, and migrating is mostly a model-id change (plus wiring image and video inputs if you want M3's multimodal path). Stay on M2 for cheap text-only agent loops; move to M3 when you need multimodality, the 1M context, or the higher intelligence ceiling.

When to Use MiniMax M2 (and When Not)

Strengths

Cheap per token: ~10B active params keep price and latency low for many-call agent loops
Purpose-built for agentic coding, terminal, and tool-use trajectories
Topped the open-weight agentic-coding tier at launch; M2.5 reported ~80% SWE-bench Verified
Open weights on Hugging Face; run it hosted or self-host
OpenAI-compatible; drops into existing agent tooling with a model-string change
On Morph, $0.279/M input undercuts the official first-party rate

Limitations

Text only: no image or video input (use MiniMax M3 for multimodal)
Not the top text-intelligence model: GLM 5.2 leads the open-weight tier on aggregate
Interleaved thinking breaks if your framework strips <think> between tool calls
Confusing version line (M2 / M2.1 / M2.5 / M2.7); check which checkpoint a provider serves
Smaller context than M3's 1M for very long-horizon tasks

Use M2 (as morph-minimax27-230b) when you want the cheapest MiniMax model for text-only agentic coding and tool use, and cost per step matters more than topping a leaderboard. Reach for GLM 5.2 for text-only planning and long-horizon reasoning, MiniMax M3 for multimodal or 1M-context work, or DeepSeek V4 Flash when raw cost per token is the deciding factor.

Limitations

Known limitations

Text only: M2 has no native image or video input. For screenshot-to-code, UI automation, or video, use MiniMax M3.
Mid-pack on aggregate intelligence: for text-only planning-heavy coding, GLM 5.2 is the stronger open-weight default. M2's advantage is price, not benchmark rank.
Interleaved-thinking fragility: multi-turn tool calling degrades if your harness strips <think> content between turns. MiniMax's docs require preserving it.
Version confusion: "MiniMax M2" spans M2, M2.1, M2.5, and M2.7. Benchmarks and behavior differ by checkpoint; confirm which one you are actually calling.
Vendor-reported scores: the ~74% and ~80% SWE-bench Verified figures are MiniMax-reported for specific checkpoints, not independent leaderboard entries; scaffolds move these numbers.

Frequently Asked Questions

What is MiniMax M2?

An open-weight mixture-of-experts model from MiniMax, released October 27, 2025 and built for agentic coding. About 230B total parameters, ~10B active per token, and a context window near 200K. It led the open-weight agentic-coding tier at launch at a fraction of frontier cost, and has been iterated through M2.1, M2.5, and M2.7. Morph serves M2.7 as morph-minimax27-230b.

What is the difference between M2, M2.1, M2.5, and M2.7?

They are successive checkpoints of the same 230B/10B-active architecture, not new models. Each refresh raised benchmark scores and tuned behavior while keeping the parameter count, price band, and interleaved-thinking format constant. M2.7 (March 2026) is the latest and the one Morph serves. M3 (428B) is a separate, larger, multimodal model.

What is MiniMax M2's context window and parameter count?

About 230B total / ~10B active, with an official context window near 200K tokens. On Morph, morph-minimax27-230b serves a 196,608-token (192K) context.

How much does MiniMax M2 cost?

MiniMax's official API lists M2 at $0.30/M input and $1.20/M output. Morph serves the M2.7 checkpoint at $0.279/M input and $1.20/M output with a 192K context.

Why does MiniMax M2 break in multi-turn tool calling?

Because it uses interleaved thinking and MiniMax's docs require you to append the model's full response, including <think> content, back to history every turn. Frameworks that strip reasoning between tool calls break the chain and degrade multi-turn tool use. Preserve the thinking blocks verbatim.

Is MiniMax M2 good for coding?

Yes. It was designed for agentic coding and tool use rather than chat, and topped the open-weight agentic-coding tier cheaply at launch. Later checkpoints raised SWE-bench Verified further. For text-only planning-heavy coding, GLM 5.2 is a stronger default; M2's edge is cost per agent step.

Can I use MiniMax M2 in Claude Code or Cline?

Yes. Point the tool at an OpenAI-compatible endpoint serving M2 (for example https://api.morphllm.com/v1 with model morph-minimax27-230b) and select the model. The one thing to get right is preserving the model's <think> content between tool calls; disable any "strip reasoning" setting so interleaved thinking works.

Private deployments

The fastest endpoints are private deployments

Morph's top speeds come from dedicated deployments, not shared public endpoints: speculators trained on your traffic, caching tuned to your workload, and volume discounts over public per-token rates. Over 100 billion tokens per day run this way.

Talk to us about a private deployment

Use WarpGrep with MiniMax M2 for Better Code Search Context

WarpGrep is an agentic code search tool that works as an MCP server. Connect it to any M2-powered agent so its context fills with the right code, not noise. Free for 100k requests, then $1 per 1M.

Try WarpGrep Free

See How It Works

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers

MiniMax M2: 230B MoE Agentic Coding Model, Versions, API, Pricing (2026)

What it is

Why it matters

What Is MiniMax M2?

The M2 Version Line: M2 → M2.1 → M2.5 → M2.7

MiniMax M2 Architecture: 230B MoE, ~10B Active

MiniMax M2 Benchmarks

The Interleaved-Thinking Tool-Calling Gotcha

Pricing and How to Run MiniMax M2 on Morph

MiniMax M2 API: Pricing and How to Call It

Call the MiniMax M2 API with the OpenAI SDK

MiniMax M2 API pricing compared

Getting a MiniMax API key on Morph

MiniMax M2 vs M3: Should You Upgrade?

When to Use MiniMax M2 (and When Not)

Limitations

Frequently Asked Questions

What is MiniMax M2?

What is the difference between M2, M2.1, M2.5, and M2.7?

What is MiniMax M2's context window and parameter count?

How much does MiniMax M2 cost?

Why does MiniMax M2 break in multi-turn tool calling?

Is MiniMax M2 good for coding?

Can I use MiniMax M2 in Claude Code or Cline?

Related Articles

The fastest endpoints are private deployments

Use WarpGrep with MiniMax M2 for Better Code Search Context

Sources