GLM-5 vs Qwen 3.5: Benchmarks, Pricing & Architecture Compared (2026)

February 2026 was a statement month for Chinese AI. Zhipu AI released GLM-5 on February 11. Alibaba followed with Qwen 3.5 on February 16. Both are Mixture-of-Experts models. Both claim parity with GPT-5.2 and Gemini 3 Pro on key benchmarks. Both are open-weight under permissive licenses.

But they represent fundamentally different bets. GLM-5 is an agentic engineering model, trained entirely on domestic Chinese hardware. Qwen 3.5 is a natively multimodal model with a 1M context window and a full family of sizes for every deployment scenario. The numbers diverge in interesting ways.

TL;DR

GLM-5 wins on: SWE-bench Verified (77.8%), Chatbot Arena Elo (#1 at 1451), Humanity's Last Exam (tool-augmented: 50.4), agentic coding workflows, hallucination reduction
Qwen 3.5 wins on: LiveCodeBench v6 (83.6), MMLU (88.5), GPQA Diamond (88.4), multimodal capabilities, 1M context window, throughput (19x faster than predecessor), price (as low as $0.10/M tokens with Flash)
Choose GLM-5 if you need the strongest open-weight model for autonomous software engineering and agent workflows
Choose Qwen 3.5 if you need multimodal reasoning, massive context, a full family of model sizes, or cost-efficient production deployment

Specs at a Glance

Spec	GLM-5	Qwen 3.5 (397B)
Developer	Zhipu AI	Alibaba / Qwen Team
Release Date	Feb 11, 2026	Feb 16, 2026
Total Parameters	744B	397B
Active Parameters	40-44B	17B
Architecture	MoE (256 experts, 8 active)	MoE (256 experts, 8 routed + 1 shared)
Context Window	200K	262K (extendable to 1M)
Modalities	Text only	Text + Vision (native multimodal)
Training Hardware	100K Huawei Ascend chips	Not disclosed (likely NVIDIA)
License	MIT	Apache 2.0
Weights Available	Yes (Hugging Face)	Yes (Hugging Face)

Benchmark Breakdown

Both models claim frontier-level performance. Here are the numbers side by side.

Benchmark	GLM-5	Qwen 3.5
MMLU	85.0%	88.5%
MMLU-Pro	70.4%	87.8%
GPQA Diamond	Not reported	88.4%
AIME 2026 I	92.7	91.3
Humanity's Last Exam	30.5 (tools: 50.4)	Not reported
MathVista	Not reported	90.3

Qwen 3.5 leads on knowledge benchmarks. MMLU-Pro is a 17-point gap, which is substantial. GLM-5 takes the edge on math (AIME 2026) and is the only model with a reported Humanity's Last Exam score, where its tool-augmented 50.4 beats GPT-5.2's 45.5.

Benchmark	GLM-5	Qwen 3.5
SWE-bench Verified	77.8%	76.4%
LiveCodeBench v6	52.0	83.6
HumanEval	90%	~85%
Terminal-Bench 2	Not reported	52.5
BFCL v4 (Tool Use)	Not reported	72.9

The coding story is split. GLM-5 is the #1 open-weight model on SWE-bench Verified, the benchmark that best approximates real software engineering work. But its LiveCodeBench score (52.0) is dramatically lower than Qwen 3.5's 83.6. LiveCodeBench tests isolated code generation problems, while SWE-bench tests end-to-end repo-level bug fixes. GLM-5 was explicitly optimized for agentic engineering: planning, implementing, debugging, and iterating. Qwen 3.5 was optimized for broad coding fluency across 358 languages.

GLM-5: Chatbot Arena Elo (1451)

77.8%

GLM-5: SWE-bench Verified (#1 open-weight)

83.6

Qwen 3.5: LiveCodeBench v6

Chatbot Arena

GLM-5 holds the #1 Chatbot Arena rating at 1451 Elo, making it the top-ranked model by human preference. The top three are all Chinese open-weight models: GLM-5 (1451), Kimi K2.5 (1449), and GLM-4.7 (1445). Qwen 3.5's Arena Elo has not been widely reported yet, though early community rankings place it around 1401.

Coding Performance Deep Dive

GLM-5: Built for Agentic Engineering

Zhipu AI trained GLM-5 specifically for autonomous coding agents. Their paper title says it: "From Vibe Coding to Agentic Engineering." The model excels at multi-step software tasks where it needs to read a codebase, form a plan, write code, run tests, and iterate on failures. That's exactly what SWE-bench measures, and 77.8% is the highest score among all open-weight models.

The tradeoff: GLM-5's LiveCodeBench score cratered to 52.0, down from GLM-4.7's 84.9. This is not a regression in raw coding ability. It reflects a deliberate optimization toward multi-step agent workflows at the expense of one-shot code generation. If you are building coding agents, this tradeoff makes sense. If you need a model that quickly solves isolated coding problems, it does not.

Qwen 3.5: Broad Coding Fluency

Qwen 3.5 takes the opposite approach. LiveCodeBench v6 at 83.6 means it generates correct code on the first pass more reliably than almost any model. It supports 358 programming languages, compared to GLM-5's more limited language coverage. The 1M context window lets it ingest entire codebases in one shot without chunking strategies.

SWE-bench at 76.4% is still strong. The gap with GLM-5 is only 1.4 percentage points. For most developers, this difference will not be noticeable. Where Qwen 3.5 pulls ahead is throughput: at 256K context, it decodes 19x faster than Qwen3-Max, making it practical for interactive coding sessions where latency matters.

Use Case	Better Model	Why
Autonomous coding agents	GLM-5	Highest SWE-bench, agent-optimized RL training
One-shot code generation	Qwen 3.5	83.6 LiveCodeBench, broad language support
Large codebase analysis	Qwen 3.5	1M context window vs 200K
Tool calling / function use	Qwen 3.5	72.9 BFCL v4, built-in adaptive tool use
Multi-step debugging	GLM-5	Agent workflow optimization, Slime RL
Polyglot development	Qwen 3.5	358 languages vs limited coverage

Architecture: Two Approaches to MoE

Both models use Mixture-of-Experts architectures, but the design philosophies diverge significantly.

GLM-5: MLA + Sparse Attention

GLM-5 combines three key innovations. Multi-head Latent Attention (MLA), borrowed from DeepSeek-V2, compresses key-value pairs into a latent space, cutting memory overhead by 33% during inference. DeepSeek Sparse Attention (DSA) dynamically selects which tokens to attend to across the 200K context window. Multi-token Prediction (MTP) uses three additional prediction layers, achieving an average acceptance length of 2.76 tokens per step for faster decoding.

With 744B total parameters and 40-44B active per forward pass, GLM-5 is the larger model by a wide margin. But the MLA compression keeps its inference footprint manageable. The 256 experts with 8 activated per token give it deep specialization across domains.

Qwen 3.5: Gated DeltaNet Hybrid

Qwen 3.5 takes a more radical approach to attention. It uses a 3:1 hybrid layout where most layers use Gated DeltaNet (linear attention) and the remaining layers use Gated Attention (full softmax). This hybrid design enables near-linear compute scaling with context length, which is how it supports 1M tokens without prohibitive cost.

At 397B total with only 17B active, Qwen 3.5 activates less than half the parameters per token that GLM-5 does. This makes it dramatically cheaper to serve. The shared expert (1 per layer in addition to 8 routed experts) provides a knowledge baseline that prevents quality degradation when routing happens to miss the ideal expert.

Hardware Independence

GLM-5 is a milestone for the Chinese semiconductor ecosystem. Trained entirely on 100,000 Huawei Ascend chips using the MindSpore framework, it proves frontier model training is possible without NVIDIA hardware. This matters geopolitically: US export controls have not stopped China from producing competitive models. Qwen 3.5's training infrastructure has not been publicly disclosed.

Open Source and Licensing

Both models are released under permissive open-source licenses. This is the biggest story here: two frontier-class models, fully open-weight, available for commercial use.

Aspect	GLM-5	Qwen 3.5
License	MIT	Apache 2.0
Commercial Use	Unrestricted	Unrestricted
Modification	Unrestricted	Unrestricted
Patent Grant	No explicit grant	Explicit patent grant
Weights on Hugging Face	Yes	Yes
Weights on ModelScope	Yes	Yes
Fine-tuning Allowed	Yes	Yes

MIT (GLM-5) is shorter and simpler. Apache 2.0 (Qwen 3.5) includes an explicit patent grant, which provides additional legal protection if you are deploying in enterprise environments. For practical purposes, both licenses let you do whatever you want with the weights. Neither requires attribution in your product (though MIT requires it in the license notice).

The real open-source story: these are 744B and 397B parameter models released for free commercial use. A year ago, models at this performance level were locked behind API-only access at premium prices.

API Pricing

Qwen 3.5 is significantly cheaper across the board, especially if you use the Flash variant.

Model	Input	Output	Notes
GLM-5 (Reasoning)	$1.00	$3.20	Official Z.AI pricing
Qwen 3.5-Plus	~$0.48	~$1.20	Alibaba Cloud, tiered by context
Qwen 3.5-Flash	$0.10	~$0.30	Budget variant, near-frontier quality

GLM-5 costs roughly 2x more than Qwen 3.5-Plus and 10x more than Qwen 3.5-Flash. The price gap widens further with third-party providers. Alibaba claims Qwen 3.5 runs at 1/18th the cost of Gemini 3 Pro at comparable performance. If cost matters more than the last percentage point on SWE-bench, Qwen 3.5 wins handily.

$1.00

GLM-5: Input per 1M tokens

$0.48

Qwen 3.5-Plus: Input per 1M tokens

$0.10

Qwen 3.5-Flash: Input per 1M tokens

Third-party providers like DeepInfra, Together.ai, and Fireworks often undercut official pricing. GLM-5 is available through 8+ providers with varying quantization levels (FP8, FP4). Qwen 3.5 has even wider availability. Both models are accessible through OpenRouter for easy integration.

Global Availability

Chinese AI models used to have a distribution problem. That has largely been solved through third-party inference providers and open-weight releases.

Channel	GLM-5	Qwen 3.5
Official API	Z.AI	Alibaba Cloud (US, Singapore, Beijing)
OpenRouter	Yes	Yes
DeepInfra	Yes (FP8)	Yes
Together.ai	Yes (FP4)	Yes
Fireworks	Yes	Yes
Self-hosted	Possible (multi-GPU)	Practical (medium models on single GPU)
US Regional Endpoint	Via third-party only	Alibaba Cloud Virginia

Qwen 3.5 has a distribution advantage. Alibaba Cloud offers regional endpoints in Virginia (US), Singapore, and Beijing, with data residency guarantees. GLM-5's official API runs through Z.AI, and global access is primarily through third-party providers. For enterprise deployments with compliance requirements, Qwen 3.5's multi-region setup is more mature.

Model Families: The Qwen Advantage

This is where Qwen 3.5 pulls significantly ahead. Alibaba released a full family of models at different sizes, giving developers options for every deployment scenario. GLM-5 is a single model.

Model	Total Params	Active Params	Best For
Qwen3.5-397B-A17B	397B	17B	Flagship, highest quality
Qwen3.5-122B-A10B	122B	10B	Enterprise on-prem, strong tool use
Qwen3.5-35B-A3B	35B	3B	Local deployment, single GPU
Qwen3.5-27B (dense)	27B	27B	Maximum reasoning density per token
Qwen3.5-Flash	Not disclosed	Small	Budget API, near-frontier quality

The 27B dense model is particularly impressive: it hits 72.4% on SWE-bench Verified, matching GPT-5 mini, in a model you can run on consumer hardware. The 35B-A3B runs at 60-100+ tokens/sec on an RTX 4090. If you need local inference, GLM-5's 744B architecture is simply not an option without a GPU cluster.

GLM-5 stands alone as a single model. Zhipu AI's previous generations (GLM-4.7, GLM-4.5) are available at smaller sizes, but the GLM-5 generation has no official distilled variants yet.

When to Use Which

Your Priority	Pick This	Why
Autonomous coding agents	GLM-5	#1 SWE-bench, agent-optimized training
Lowest API cost	Qwen 3.5-Flash	$0.10/M input tokens, near-frontier
Multimodal (text + vision)	Qwen 3.5	Native multimodal, GLM-5 is text-only
Largest context window	Qwen 3.5	1M tokens vs 200K
Human preference quality	GLM-5	#1 Chatbot Arena (1451 Elo)
Local / edge deployment	Qwen 3.5	27B/35B models run on consumer GPUs
Enterprise compliance (US)	Qwen 3.5	US regional endpoint on Alibaba Cloud
Hallucination-sensitive tasks	GLM-5	34% hallucination rate (down from 90%)
Polyglot coding	Qwen 3.5	358 programming languages
Open-source ecosystem	Both	MIT vs Apache 2.0, both permissive

For most developers, Qwen 3.5 is the more practical choice. It costs less, comes in more sizes, handles more modalities, and has better global infrastructure. GLM-5 is the better choice specifically for agentic coding workflows where SWE-bench performance translates directly to your use case, or when you need the best conversational quality (Arena Elo).

The meta-story matters too. GLM-5 proves that frontier AI training can happen on non-NVIDIA hardware. Whether that matters to you depends on your interest in supply chain independence. For pure performance-per-dollar, Qwen 3.5 is hard to beat.

Frequently Asked Questions

Which is better for coding, GLM-5 or Qwen 3.5?

It depends on the task. GLM-5 scores higher on SWE-bench Verified (77.8% vs 76.4%), making it stronger for end-to-end software engineering where the model reads repos, plans changes, and fixes bugs autonomously. Qwen 3.5 dominates LiveCodeBench v6 (83.6 vs 52.0), which tests standalone code generation. For agentic coding workflows, GLM-5 is purpose-built. For raw code generation and polyglot support across 358 languages, Qwen 3.5 wins.

Is GLM-5 really trained without NVIDIA chips?

Yes. GLM-5 was trained entirely on 100,000 Huawei Ascend chips using the MindSpore framework. It is the first frontier-class model trained without any NVIDIA hardware, proving the Ascend ecosystem can handle large-scale pre-training. Training used 28.5 trillion tokens.

Can I run these models locally?

The flagship models require serious hardware. GLM-5 at 744B total parameters needs multi-GPU setups even with quantization. Qwen 3.5 is more practical for local use: the 35B-A3B variant runs on a single GPU with 8GB+ VRAM using GGUF quantization (60-100+ tok/sec on an RTX 4090), and the 27B dense model fits on consumer hardware at 15-25 tok/sec. For local deployment, Qwen's medium model series is the clear winner.

Which has better global API availability?

Qwen 3.5 has broader availability. Alibaba Cloud offers regional endpoints in the US (Virginia), Singapore, and Beijing with data residency guarantees. Both models are accessible through third-party providers like OpenRouter, DeepInfra, Together.ai, and Fireworks. GLM-5 is available through 8+ API providers globally, but lacks dedicated regional endpoints.

What are the license differences?

GLM-5 uses MIT. Qwen 3.5 uses Apache 2.0. Both allow unrestricted commercial use, modification, and distribution. Apache 2.0 includes an explicit patent grant, which provides additional legal protection in enterprise settings. For practical purposes, the difference is negligible.

How do these compare to Claude, GPT-5.2, and Gemini 3 Pro?

On reasoning benchmarks, both models are competitive with closed-source leaders. GLM-5 beats GPT-5.2 on Humanity's Last Exam (tool-augmented) and matches Claude Opus 4.5 on AIME 2026. Qwen 3.5 leads on GPQA Diamond (88.4) and matches on math benchmarks. The gap between open and closed models has effectively closed on most standardized benchmarks. Where closed models still lead is in polish, safety tuning, and ecosystem maturity.

Related Comparisons

Use Any Model with Morph Fast Apply

GLM-5, Qwen 3.5, Claude, GPT. Morph processes code edits from any model at 10,500+ tok/sec with 98% first-pass accuracy. Your model generates the edit, Morph applies it correctly.

Try Morph Free

See the Playground

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

GLM-5 vs Qwen 3.5: China's Two AI Giants Go Head-to-Head (2026)