MiniMax M2 is a 230B-total, 10B-active Mixture-of-Experts model from MiniMax, built for agentic coding. It scores 69.4 on SWE-bench Verified and 83 on LiveCodeBench, ships under a modified-MIT license with a ~196K-token context window, and prices at $0.30/M input and $1.20/M output through an OpenAI-compatible API. Morph serves a MiniMax-class model on its own fleet at ~140 tok/s.
MiniMax M2 in One Paragraph
MiniMax M2 is an open-weight Mixture-of-Experts model built by MiniMax for agentic coding and tool use. It has 230 billion total parameters but activates only 10 billion per token, so it runs closer to a 10B model on cost while reasoning closer to a much larger dense model. MiniMax released it on October 27, 2025 under a modified-MIT license, per the MiniMax-M2 model card.
The model is tuned for multi-step agent loops: tool calls, terminal commands, and multi-file edits, rather than single-shot completion. Its strongest published scores are on agentic and coding benchmarks (SWE-bench Verified, LiveCodeBench, Terminal-Bench, tau2-bench), which is the workload the model is designed for.
Why 10B active out of 230B matters
In a Mixture-of-Experts model, only a subset of the weights (the routed experts) run on each token. M2 uses 256 local experts with top-8 routing, so 10B of the 230B parameters activate per forward pass. The result: serving cost and latency track the 10B active count, while quality tracks the larger total. This is what lets a 230B model serve at high throughput.
Specs: Parameters and Architecture
The figures below come from the official MiniMax-M2 model card and the transformers MiniMaxM2Config.
| Attribute | Value |
|---|---|
| Total parameters | 230B (MoE) |
| Active parameters per token | 10B |
| Hidden layers | 62 |
| Attention heads | 48 query, 8 key/value |
| Experts | 256 local, top-8 routing |
| Vocabulary size | 200,064 |
| Context window | 196,608 tokens (~196K) |
| License | Modified-MIT |
| Released | October 27, 2025 |
The 8 key/value heads against 48 query heads is grouped-query attention, which shrinks the KV cache and lowers memory pressure during long-context decoding. Combined with the 10B active-parameter count, this is what keeps generation throughput high on a model with a 230B total weight set.
Context Window
MiniMax M2 has a context window of roughly 196K tokens. The official transformers config sets max_position_embeddings to 196,608. That is enough to hold a medium codebase, a long agent trajectory, and the tool outputs accumulated across a multi-step task in a single window.
For comparison, common open coding models sit at 128K (DeepSeek-R1, Llama-3.1) or 256K (Qwen3-Coder native). M2 lands between those, larger than a 128K window but below the 256K-and-up tier. For most agentic coding sessions the ~196K window is not the binding constraint; throughput and tool reliability usually are.
If your workload needs more than ~196K tokens in one call, see LLM context window for how context length, attention cost, and retrieval interact.
Coding and Agentic Benchmarks
The scores below are from the MiniMax-M2 model card. They are weighted toward agentic and coding evaluations rather than single-shot completion, matching the model's design target.
| Benchmark | Score | What it measures |
|---|---|---|
| SWE-bench Verified | 69.4 | Real GitHub issue resolution |
| LiveCodeBench | 83 | Competitive / live coding |
| Terminal-Bench | 46.3 | Terminal and shell task completion |
| tau2-bench | 77.2 | Tool-use agent reliability |
| Multi-SWE-Bench | 36.2 | Multi-language SWE tasks |
| GAIA (text only) | 75.7 | General assistant reasoning |
| BrowseComp | 44 | Web-browsing agent tasks |
| AIME25 | 78 | Competition math |
| GPQA-Diamond | 78 | Graduate-level science QA |
| Artificial Analysis Intelligence | 61 | Composite intelligence index |
The SWE-bench Verified score of 69.4 is the number to anchor on for coding agents. It measures whether the model can resolve a real GitHub issue end to end (read the repo, locate the fix, edit files, pass the tests). At 69.4, M2 sits ahead of older open models like DeepSeek-V3.2-Exp (67.8) and within a few points of the strongest open coding models of its generation.
The tau2-bench score of 77.2 is the agentic-reliability signal. Tool-use benchmarks test whether a model calls the right tool with the right arguments across a long trajectory, which is where many otherwise-capable models fail in production agent loops. A high tau2-bench score is a better predictor of agent success than a single coding-completion number.
MiniMax M2 vs M1
M2 is the successor to MiniMax M1. M1 was MiniMax's earlier model; M2 re-targets the lineup at agentic coding and multi-step tool calling, with a 230B-total / 10B-active MoE design and a ~196K context window.
| Model | Total / Active | Context | SWE-bench Verified | License |
|---|---|---|---|---|
| MiniMax M2 | 230B / 10B | ~196K | 69.4 | Modified-MIT |
| DeepSeek-V3.2-Exp | 685B / - | 160K | 67.8 | MIT |
| Kimi K2 Thinking | 1T / 32B | 256K | 71.3 | Modified-MIT |
| Qwen3-Coder 480B | 480B / 35B | 256K | SOTA open (per Qwen) | Apache 2.0 |
| GLM-4.6 | 357B / - | 200K | - | MIT |
The standout property of M2 in this group is the active-parameter count. At 10B active it activates far fewer parameters per token than Kimi K2 Thinking (32B) or Qwen3-Coder 480B (35B), which is why it can serve at higher throughput for a given GPU budget while staying within a few SWE-bench points of those larger models.
A note on M1, M2.5, and M2.7
This page covers MiniMax M2, the model with published specs on the official model card. Numbers for M1 and any M2.5 / M2.7 point releases are not in the verified fact set used here, so they are not stated. Treat any specific M1 or M2.x figure you see elsewhere as unverified until it appears on the official MiniMax model card or release notes.
API Pricing
MiniMax prices M2 through its official API at $0.30 per million input tokens and $1.20 per million output tokens, per the MiniMax-M2 launch announcement. The 4x output-to-input ratio is typical for reasoning-tuned models, where output tokens carry the chain-of-thought and tool-call payload.
At these rates, M2 is materially cheaper than frontier closed models for the same token volume. For a coding agent that runs many tool-call turns, the output price dominates the bill, so the $1.20/M output rate is the number to model against. For cost modeling across model tiers, see the LLM cost calculator.
How to Call the MiniMax API
MiniMax M2 is served through an OpenAI-compatible chat-completions API. An existing OpenAI SDK client calls it by changing the base URL and the model name. No new SDK and no request-shape changes are required.
Calling MiniMax M2 with the OpenAI SDK
import OpenAI from "openai";
// Point the standard OpenAI client at MiniMax's OpenAI-compatible endpoint
const client = new OpenAI({
apiKey: process.env.MINIMAX_API_KEY,
baseURL: "https://api.minimax.io/v1", // MiniMax OpenAI-compatible base URL
});
const response = await client.chat.completions.create({
model: "MiniMax-M2",
messages: [
{ role: "system", content: "You are a coding agent." },
{ role: "user", content: "Fix the failing test in src/auth/session.ts" },
],
});
console.log(response.choices[0].message.content);
// Pricing: $0.30 / M input tokens, $1.20 / M output tokensThe same request shape works through Morph's router. Point the client at https://api.morphllm.com/v1 and you can reach a MiniMax-class model alongside frontier models through one OpenAI-compatible endpoint, with one API key. See LLM API for the request-format details.
Serving MiniMax-Class Models
Morph runs a MiniMax-class model (minimax27-230b) on its own production GPU fleet and measured roughly 140 tokens per second of generation throughput. That number is a first-hand serving measurement, not a vendor claim, and it reflects the practical effect of the 10B active-parameter design: a 230B-total model that decodes at the speed of a much smaller one.
Throughput at this level is what makes a model usable inside an agent loop. A coding agent that issues dozens of tool-call turns per task needs each turn to return quickly; ~140 tok/s keeps a multi-turn trajectory responsive rather than stalling between steps.
Morph exposes minimax27-230b behind the same OpenAI-compatible model router that fronts its other served models (glm51-754b, qwen35-397b at ~120 tok/s, qwen36-27b, dsv4flash, and the WarpGrep search model). One endpoint, many models, with difficulty-based routing on top. For the broader serving and inference picture, see AI inference.
Tradeoffs
M2 is strong on agentic coding, but it is not the right pick for every workload. The honest limitations:
Narrower than frontier general models
M2 is tuned for coding and tool use. Its composite Artificial Analysis intelligence score is 61, below the top general-purpose frontier models. For broad non-coding reasoning, a general model may serve better.
Context below the 256K tier
At ~196K tokens, M2's window is larger than 128K models but smaller than the 256K-and-up open models (Qwen3-Coder, Kimi K2 Thinking). For very large single-call contexts, those are a better fit.
Self-hosting needs real GPUs
230B total parameters means the full weight set is large even though only 10B activate per token. You still need enough GPU memory to hold all experts. Use the hosted API if you don't want to manage that.
For most teams building coding agents, the practical decision is whether to self-host or call M2 through a hosted, OpenAI-compatible endpoint. If you want to compare M2 against other open coding models before committing, see best open-source coding model 2026.
Frequently Asked Questions
How many parameters is MiniMax M2?
MiniMax M2 is a Mixture-of-Experts model with 230 billion total parameters and 10 billion active parameters per token. It uses 256 local experts with top-8 routing across 62 hidden layers, so each forward pass activates only a small fraction of the full weight set.
What is the difference between MiniMax M2 and M1?
M2 is the successor to MiniMax M1. M1 was the earlier MiniMax model; M2 (released October 27, 2025) is a 230B-total / 10B-active MoE re-targeted at agentic coding and multi-step tool use, with a ~196K context window and benchmarks like 69.4 on SWE-bench Verified and 83 on LiveCodeBench.
What is the MiniMax M2 context window?
MiniMax M2 has a context window of about 196K tokens. The official transformers MiniMaxM2Config sets max_position_embeddings to 196,608 tokens.
How do I use the MiniMax API?
MiniMax M2 is served through an OpenAI-compatible API. Point an OpenAI SDK client at the MiniMax base URL, set the model to MiniMax-M2, and call chat completions as usual. Pricing is $0.30 per million input tokens and $1.20 per million output tokens.
How good is MiniMax M2 at coding?
MiniMax M2 scores 69.4 on SWE-bench Verified, 83 on LiveCodeBench, 46.3 on Terminal-Bench, and 77.2 on tau2-bench for tool use. It also scores 78 on AIME25 and 78 on GPQA-Diamond, with an Artificial Analysis composite intelligence score of 61.
Is MiniMax M2 open source?
MiniMax M2 is released under a modified-MIT license with open weights published on Hugging Face. The modified-MIT terms allow most commercial use; check the license text for any usage conditions before deploying.
What does MiniMax M2 cost to run?
Through MiniMax's official API, M2 is $0.30 per million input tokens and $1.20 per million output tokens. Because only 10B of the 230B parameters activate per token, self-hosting throughput is high relative to total parameter count. Morph measured roughly 140 tokens per second serving a MiniMax-class model on its own fleet.
Related Resources
Reach MiniMax-Class Models Through One API
Morph serves a MiniMax-class model (minimax27-230b) on its own fleet at ~140 tok/s and exposes it, alongside many other models, behind one OpenAI-compatible router at api.morphllm.com. One key, difficulty-based routing, frontier and open models through a single endpoint.
