GLM-4.6 is Z.ai's flagship language model: a 357B-parameter Mixture-of-Experts model with a 200K-token context window, released September 30, 2025 under the MIT license. Its smaller sibling, GLM-4.5-Air, is a 106B-A12B model with a 128K context. Both are MIT-licensed, self-hostable, and OpenAI-compatible at the API. Morph serves a GLM-family model on its own fleet.
What GLM-4.6 Is
GLM-4.6 is the flagship model from Z.ai, the model arm of Zhipu AI. GLM stands for General Language Model, the family name Zhipu has used across releases. GLM-4.6 was released September 30, 2025, per Z.ai's release notes, and its weights are published on Hugging Face under the zai-org organization.
The model uses a Mixture-of-Experts (MoE) architecture with 357B total parameters, per the Z.ai model card. In an MoE model, only a subset of the parameters activates for any given token, which keeps inference cheaper than a dense model of the same total size. Z.ai has not published the active-parameters-per-token figure for GLM-4.6, so this page states only the 357B total and does not guess at an active count.
GLM-4.6 targets coding, agentic workflows, and long-context reasoning. The jump from GLM-4.5 to GLM-4.6 most visibly widened the context window from 128K to 200K tokens, which matters for multi-file code edits and long agent traces that overflow a smaller window.
GLM-4.6 Specs at a Glance
Every number below comes from Z.ai's model card on Hugging Face or its documentation. Where Z.ai has not published a value (such as GLM-4.6 active parameters), the row is omitted rather than estimated.
| Attribute | Value | Source |
|---|---|---|
| Total parameters | 357B (MoE) | Z.ai model card |
| Context window | 200K tokens | Z.ai docs |
| Max output length | 128K tokens | Z.ai docs |
| License | MIT | Z.ai model card |
| Release date | September 30, 2025 | Z.ai release notes |
| Maker | Z.ai / Zhipu AI | Z.ai |
A note on the parameter count
The 357B figure is the total parameter count from the GLM-4.6 model card. The active parameters per token were not published for GLM-4.6 in this fact set, so no active-param figure is stated. By contrast, GLM-4.5-Air publishes both: 106B total and 12B active (106B-A12B).
GLM-4.6 vs GLM-4.5-Air
GLM-4.5-Air is the lighter model in the GLM-4.5 series, released July 28, 2025. It is a hybrid-reasoning MoE model with 106B total parameters and 12B active per token, a 128K context window, and the same MIT license as the flagship. Z.ai shipped FP8 and base/reasoning variants of Air.
The choice between them is a cost-versus-capability tradeoff. GLM-4.6 has roughly 3.4x the total parameters and a wider 200K context, so it handles harder reasoning and longer inputs. GLM-4.5-Air costs about a third as much on the Z.ai API and serves faster, which suits high-volume or latency-sensitive work where the flagship is overkill.
| Attribute | GLM-4.6 | GLM-4.5-Air | Best use |
|---|---|---|---|
| Total parameters | 357B (MoE) | 106B (MoE) | Flagship for hard tasks |
| Active parameters | Not published | 12B | Air cheaper to serve |
| Context window | 200K | 128K | Flagship for long inputs |
| License | MIT | MIT | Both self-hostable |
| Input price (Z.ai API) | $0.60/1M | $0.20/1M | Air for high volume |
| Output price (Z.ai API) | $2.20/1M | $1.10/1M | Air for high volume |
A common pattern is to use both: GLM-4.5-Air for routine turns (boilerplate, simple edits, classification) and GLM-4.6 for the harder reasoning turns. That is the same difficulty-based split an LLM router automates across a model tier.
Verified Benchmarks
Published benchmark scores for these models are reported per-model on the Z.ai model cards. For GLM-4.5-Air, Z.ai reports a GPQA Diamond score of 71.72 and an MMLU-Pro score of 81.4. These are the verified GLM-4.5-Air figures in this fact set.
| Benchmark | GLM-4.5-Air | Measures |
|---|---|---|
| GPQA Diamond | 71.72 | Graduate-level science reasoning |
| MMLU-Pro | 81.4 | Broad multi-domain knowledge |
Why GLM-4.6 coding scores are not tabled here
This page only states benchmark numbers that are verified in the fact set. A directly comparable GLM-4.6 SWE-bench Verified or LiveCodeBench figure was not in the verified set, so it is omitted rather than recalled from memory. For an open-source coding-model comparison built on verified scores across models, see Best Open-Source Coding Model 2026.
License and Open Weights
GLM-4.6 is released under the MIT license, per the Z.ai model card. The MIT license is one of the most permissive open-source licenses: it allows commercial use, modification, fine-tuning, redistribution, and self-hosting, with no per-use royalty owed to Z.ai. GLM-4.5-Air carries the same MIT license and is explicitly usable commercially.
This puts GLM in the same open-weights bracket as DeepSeek (MIT) and Qwen (Apache 2.0), and apart from closed API-only frontier models like Claude. If you need to run the model inside your own infrastructure for data-residency, latency, or cost-control reasons, an MIT license removes the legal blocker. You still need the GPUs to serve a 357B MoE model, which is the practical constraint.
API Pricing
Z.ai prices the GLM models on its hosted API. The numbers below are from Z.ai's pricing page. Self-hosting the open weights changes the cost equation entirely (you pay for GPUs and operations instead of per-token), so these prices apply to the Z.ai-hosted API specifically.
| Model | Input | Output | Notes |
|---|---|---|---|
| GLM-4.6 | $0.60 | $2.20 | Cached input 85% off |
| GLM-4.5-Air | $0.20 | $1.10 | Lighter, faster |
GLM-4.6 cached input is billed at an 85% discount versus standard input, per Z.ai's pricing page. For agent workloads that resend a large stable system prompt and context across turns, that cache discount is a material part of the real bill, not a rounding detail.
Calling the GLM API
Both GLM-4.6 and GLM-4.5-Air speak the OpenAI chat-completions protocol. You initialize the OpenAI SDK with Z.ai's base URL and your Z.ai key, then pass glm-4.6 or glm-4.5-air as the model. Everything else (messages, streaming, temperature) matches a standard OpenAI-SDK call.
Call GLM-4.6 with the OpenAI SDK (TypeScript)
import OpenAI from "openai";
// Z.ai exposes an OpenAI-compatible endpoint.
const client = new OpenAI({
apiKey: process.env.ZAI_API_KEY,
baseURL: "https://api.z.ai/api/paas/v4",
});
const response = await client.chat.completions.create({
model: "glm-4.6", // or "glm-4.5-air" for the lighter model
messages: [
{ role: "system", content: "You are a senior backend engineer." },
{ role: "user", content: "Refactor this handler to use async/await." },
],
});
console.log(response.choices[0].message.content);Because the protocol is OpenAI-compatible, swapping GLM-4.6 in for another model is a base-URL and model-name change, not a rewrite. The same property lets a router sit in front of multiple OpenAI-compatible backends and pick a model per request. Verify the exact base URL against Z.ai's current API docs before deploying.
Morph Runs a GLM-Family Model
Morph builds inference infrastructure for AI coding agents. On its production fleet, Morph serves glm51-754b, a member of the GLM family, alongside qwen35-397b (~120 tok/s), minimax27-230b (~140 tok/s), and others. This is a first-hand datapoint: the GLM line is capable enough that Morph runs a member of it directly for coding-agent workloads.
Morph exposes its fleet through an OpenAI-compatible router at api.morphllm.com/v1. The router classifies prompt difficulty in ~430ms and sends each request to the right model tier, which is the same difficulty-based split described above for pairing GLM-4.5-Air with GLM-4.6. Cheap models handle easy turns, the larger model handles hard ones, and the classification costs about $0.001 per request.
GLM-family on the fleet
Morph serves glm51-754b, a GLM-family model, on its own production GPU fleet for coding-agent workloads.
OpenAI-compatible router
One endpoint at api.morphllm.com classifies prompt difficulty in ~430ms and routes to the right model tier.
Cost-aware routing
40-70% API cost savings by reserving the largest model for hard turns and routing routine turns to cheaper models.
GLM-4.6 vs Claude and DeepSeek for Coding
The cleanest distinction is licensing. GLM-4.6 is MIT-licensed and self-hostable. Claude is closed and API-only: you cannot run the weights yourself at any price. If self-hosting or weight access is a hard requirement, GLM-4.6 and Claude are not in the same category, regardless of benchmark scores.
Against DeepSeek, the comparison is open-weights to open-weights. DeepSeek-V3.2-Exp is a 685B MoE model, MIT-licensed, with a 160K context, and it scores 67.8 on SWE-bench Verified and 74.1 on LiveCodeBench per its model card. GLM-4.6 brings a wider 200K context window and a smaller 357B total parameter count. A directly comparable GLM-4.6 SWE-bench figure was not in this verified set, so the honest comparison is on the attributes that are verified rather than a claimed coding-score winner.
| Model | Total params | Context | License | SWE-bench Verified |
|---|---|---|---|---|
| GLM-4.6 | 357B (MoE) | 200K | MIT | Not in set |
| GLM-4.5-Air | 106B (MoE) | 128K | MIT | Not in set |
| DeepSeek-V3.2-Exp | 685B (MoE) | 160K | MIT | 67.8 |
For an open-source coding-model comparison built entirely on verified per-model scores, see Best Open-Source Coding Model 2026. For routing across these models through one API, see LLM Router.
Frequently Asked Questions
How many parameters does GLM-4.6 have?
GLM-4.6 has 357B total parameters and uses a Mixture-of-Experts architecture, per Z.ai's model card. Z.ai has not published an active-parameters-per-token figure for GLM-4.6, so only the 357B total is stated. The smaller GLM-4.5-Air is a 106B-A12B model: 106B total with 12B active per token.
What is the context window of GLM-4.6?
GLM-4.6 has a 200K-token context window, up from 128K in GLM-4.5. Its maximum output length is 128K tokens. GLM-4.5-Air has a 128K-token context window.
Is GLM-4.6 open source? What license is it under?
Yes. GLM-4.6 is released under the MIT license, per the Z.ai model card. MIT permits commercial use, fine-tuning, and self-hosting with no per-use fee to Z.ai. GLM-4.5-Air is also MIT-licensed and usable commercially.
What is the difference between GLM-4.6 and GLM-4.5-Air?
GLM-4.6 is the 357B flagship with a 200K context, released September 30, 2025. GLM-4.5-Air is a 106B-A12B model with a 128K context, part of the GLM-4.5 series from July 28, 2025. The flagship handles harder reasoning and longer inputs; Air is cheaper and faster. On the Z.ai API, GLM-4.6 is $0.60/$2.20 per 1M input/output tokens versus $0.20/$1.10 for Air.
How do I use the GLM API?
Both models are OpenAI-compatible. Point the OpenAI SDK at Z.ai's base URL, set the model to glm-4.6 or glm-4.5-air, and send a standard chat-completions request. The message shape is unchanged, so existing OpenAI-SDK integrations can swap GLM in without a rewrite.
How does GLM-4.6 compare to Claude and DeepSeek for coding?
GLM-4.6 is MIT-licensed and self-hostable; Claude is API-only and closed. Against DeepSeek-V3.2-Exp (MIT, 160K context, 67.8 on SWE-bench Verified), GLM-4.6 brings open weights and a larger 200K context. A directly comparable GLM-4.6 SWE-bench figure is not in this verified set, so compare verified per-model scores rather than assuming a winner.
Who makes GLM-4.6?
GLM-4.6 is made by Z.ai, the model arm of Zhipu AI, a Chinese AI company. The GLM (General Language Model) series includes GLM-4.5, GLM-4.5-Air, and GLM-4.6. Weights are published on Hugging Face under the zai-org organization.
Related Resources
Run GLM-Family Models Through One OpenAI-Compatible API
Morph serves a GLM-family model on its production fleet and exposes an OpenAI-compatible router at api.morphllm.com. Classify prompt difficulty in ~430ms, route easy turns to cheap models and hard turns to the large one, and cut API costs 40-70%.
