Qwen 3.5 and DeepSeek V4 are the two most important open-weight models coming out of China in 2026. Qwen 3.5 shipped on February 16 with a 397B MoE architecture that activates just 17B parameters per token. DeepSeek V4, expected in early March, scales to 1 trillion parameters with 32B active.
Both target coding dominance. Both are Apache 2.0 licensed. Both undercut Western frontier models on price by 10x or more. The question for developers: which one should you actually use?
This comparison uses confirmed numbers for Qwen 3.5 and leaked/projected specs for DeepSeek V4. We mark unverified claims clearly throughout.
TL;DR
- Available now: Qwen 3.5 is live with verified benchmarks. DeepSeek V4 is expected in the first week of March 2026.
- Architecture: Qwen 3.5 uses 397B params / 17B active (MoE). DeepSeek V4 uses ~1T params / 32B active (MoE + Engram memory).
- Coding: Qwen 3.5 scores 83.6 LiveCodeBench, 76.4 SWE-bench. DeepSeek V4 claims 80%+ SWE-bench, 90% HumanEval (unverified).
- Context: Qwen 3.5 supports 262K tokens (open-weight) or 1M (hosted Plus). DeepSeek V4 targets 1M tokens natively.
- Pricing: Qwen 3.5 Plus costs $0.10-$1.20/M tokens. DeepSeek V4 is projected at $0.27/$1.10 per M tokens, consistent with DeepSeek's history of aggressive pricing.
- Self-hosting: Both run on consumer GPUs with quantization. Qwen 3.5 on a single RTX 4090; DeepSeek V4 on dual 4090s or single 5090.
Quick Comparison Table
| Specification | Qwen 3.5 (397B-A17B) | DeepSeek V4 |
|---|---|---|
| Status | Released Feb 16, 2026 | Expected early March 2026 |
| Total Parameters | 397B | ~1T |
| Active Parameters | 17B | ~32B |
| Architecture | MoE + Gated DeltaNet | MoE + Engram + mHC |
| Context Window | 262K (open) / 1M (Plus) | 1M tokens |
| MMLU | 88.5 | TBD |
| SWE-bench Verified | 76.4 | 80%+ (leaked) |
| LiveCodeBench v6 | 83.6 | TBD |
| HumanEval | 99.0 | ~90% (leaked) |
| Languages | 201 | TBD |
| Multimodal | Yes (text, image, video) | Yes (expected) |
| License | Apache 2.0 | Apache 2.0 (expected) |
| API Input Price | $0.10/M tokens (Flash) | $0.27/M tokens (est.) |
| API Output Price | $0.40/M tokens (Flash) | $1.10/M tokens (est.) |
| Self-Host Minimum | 1x RTX 4090 (INT4) | 2x RTX 4090 (est.) |
Architecture: Two Different Approaches to Sparse Efficiency
Both models use Mixture-of-Experts to keep inference costs low despite massive total parameter counts. The similarities end there.
Qwen 3.5: Gated DeltaNet + Sparse MoE
Qwen 3.5 combines Gated Delta Networks with sparse MoE routing. The 397B model activates only 17B parameters per forward pass, a 95% reduction in activation memory compared to a dense model of equivalent capacity. The architecture uses a hybrid attention mechanism (Gated DeltaNet combined with Gated Attention layers) that reduces compute overhead for long-context processing.
The expanded 250K vocabulary (up from 152K in Qwen 3) delivers 10-60% token savings for non-English text. Training used a native FP8 pipeline, and the model achieves 8.6-19x faster decoding than Qwen3-Max on long-context tasks.
DeepSeek V4: Engram Memory + mHC + Sparse Attention
DeepSeek V4 introduces three architectural innovations on top of its trillion-parameter MoE backbone.
Engram Conditional Memory is the standout innovation. It provides O(1) hash-based knowledge retrieval using system DRAM (not GPU VRAM), improving Needle-in-a-Haystack accuracy from 84.2% to 97%. For coding, this means the model can maintain a persistent understanding of project structure, naming conventions, and cross-file relationships.
Manifold-Constrained Hyper-Connections (mHC) solve a fundamental training stability problem at trillion-parameter scale. They constrain signal amplification to 1.6x (vs 3000x unconstrained), adding only 6.7% training overhead for a 4x residual stream expansion. Benchmark improvements from mHC alone: BBH +7.2, DROP +5.7, GSM8K +6.1, MMLU +5.2.
DeepSeek Sparse Attention (DSA) enables the 1M token context window while cutting compute overhead by approximately 50% vs standard attention. It achieves roughly linear scaling instead of quadratic, which is what makes million-token contexts practical at inference time.
Architecture Bottom Line
Qwen 3.5 is the more efficient model per active parameter: 17B active vs 32B for V4. DeepSeek V4 is the more ambitious model architecturally, with three novel components targeting repository-scale code understanding. If you care about running on minimal hardware, Qwen 3.5 wins. If you care about pushing the boundary on what a model can reason about across a full codebase, DeepSeek V4 is the one to watch.
Benchmark Breakdown
Important caveat: Qwen 3.5 benchmarks are from Alibaba's official release and have been partially validated by independent evaluations. DeepSeek V4 numbers are from leaked internal testing and remain unverified. Treat the V4 column accordingly.
| Benchmark | Qwen 3.5 (397B) | DeepSeek V4 (est.) | Reference |
|---|---|---|---|
| MMLU | 88.5 | TBD | GPT-5.2: ~89 |
| MMLU-Pro | 87.8 | TBD | Claude Opus 4.5: ~88 |
| GPQA Diamond | 88.4 | TBD | Gemini 3 Pro: ~87 |
| AIME 2026 | 91.3 | TBD | Claude Opus 4.5: ~86 |
| BrowseComp | 78.6 | TBD | GPT-5: ~71 |
| Benchmark | Qwen 3.5 (397B) | DeepSeek V4 (est.) | Reference |
|---|---|---|---|
| SWE-bench Verified | 76.4 | 80%+ (leaked) | Claude Opus 4.5: 80.9 |
| LiveCodeBench v6 | 83.6 | TBD | GPT-5.2: ~81 |
| HumanEval | 99.0 | ~90% (leaked) | Claude: ~88 |
| BFCL v4 (Tool Use) | 72.9 | TBD | GPT-5 mini: 55.5 |
| Terminal-Bench 2 | 52.5 | TBD | Claude Code: ~68 |
What the Numbers Mean
Qwen 3.5's 83.6 on LiveCodeBench v6 is currently the best score from any open-weight model. Its 91.3 on AIME 2026 suggests genuinely strong mathematical reasoning. The 76.4 SWE-bench Verified score trails Claude Opus 4.5 (80.9%) but beats most other frontier models.
The BrowseComp score (78.6) is particularly notable. Qwen 3.5 uses an aggressive context-folding strategy that outperforms every US frontier model on web browsing tasks. The BFCL v4 tool-use score (72.9) makes it one of the strongest open-source options for building AI agents with function-calling.
DeepSeek V4's leaked SWE-bench claim of 80%+ would put it in direct competition with Claude Opus 4.5. If confirmed, it would be the first open-weight model to match a Western frontier model on the most respected coding benchmark. But "leaked" is doing a lot of heavy lifting in that sentence.
Benchmark Caveat
Qwen 3.5's benchmarks are self-reported by Alibaba. DeepSeek V4's are leaked from internal testing. Neither set has undergone fully independent third-party evaluation at the time of writing. Run your own evals on your specific use case before making infrastructure decisions.
Coding Performance: Practical Differences
Benchmarks tell one story. Real coding performance tells another. Here is what we know about how each model handles actual development work.
Qwen 3.5: Strong on Standard Tasks, Craters on Master-Level
Qwen 3.5 performs well on standard and expert-level coding tasks. It supports three inference modes: "Auto" (adaptive thinking with tool use), "Thinking" (deep reasoning), and "Fast" (instant responses without chain-of-thought). It works with OpenClaw, Claude Code, Cline, and Alibaba's Qwen Code.
The catch: independent testing shows Qwen 3.5's Elo drops from ~1550 on expert tasks to 1194 on "Master-level" challenges that require complex multi-file coordination. When a task demands understanding relationships across many files simultaneously, Qwen 3.5 struggles more than its headline numbers suggest.
The 122B-A10B medium variant is worth noting: it scores 72.2 on BFCL-V4, outperforming GPT-5 mini (55.5) by 30% on tool use. For agentic coding tasks that rely heavily on function calling, the medium model may be the better price-performance choice.
DeepSeek V4: Designed for Repository-Scale Understanding
DeepSeek V4's architecture is specifically optimized for the multi-file coordination problem where Qwen 3.5 stumbles. The Engram memory enables O(1) retrieval of project-level patterns, and the 1M native context window can ingest an entire medium-sized codebase in a single pass.
Early reports from testers describe repository-level understanding: diagnosing bugs that span multiple files, understanding import-export relationships across dozens of modules, and performing autonomous refactoring with awareness of downstream effects.
The claimed 1.8x inference speedup over V3 (which already offered competitive throughput) suggests DeepSeek is optimizing not just accuracy but practical development velocity.
Coding Verdict
For single-file and standard coding tasks, Qwen 3.5 is excellent and available today. For repository-scale refactoring and complex multi-file tasks, DeepSeek V4's architecture looks purpose-built. But V4's coding claims remain unverified. If you need a production-ready coding model right now, Qwen 3.5 is the only real option.
Open Source and Self-Hosting
Both models are open-weight under Apache 2.0. Both can run on consumer hardware with quantization. The practical self-hosting experience differs significantly.
Qwen 3.5 Self-Hosting
| Configuration | Hardware | Performance |
|---|---|---|
| Minimum (INT4) | 1x RTX 4090 (24GB) + 64GB RAM | Functional, slower |
| Recommended | 2x A100 80GB + 128GB RAM | Production-ready |
| High Performance | 8x H100 + 256GB RAM | ~45 tok/s throughput |
| Mac (3-bit) | 192GB unified memory | Functional via llama.cpp |
| Mac (4-bit) | 256GB unified memory | Better quality |
The full 397B model is ~807GB on disk at full precision. With INT4 quantization, it fits on a single 24GB GPU with system RAM offloading. The MoE architecture helps: only 17B parameters are active per forward pass, so you do not need to keep all 397B in fast memory simultaneously.
The smaller variants are far more accessible. The 27B dense model runs comfortably on a single RTX 4090. The 35B-A3B MoE activates just 3B parameters per token, making it viable on laptops with 16GB+ VRAM.
DeepSeek V4 Self-Hosting (Projected)
| Configuration | Hardware | Performance |
|---|---|---|
| Consumer Minimum | 2x RTX 4090 or 1x RTX 5090 | ~30-35 tok/s (est.) |
| Consumer (Batch) | Dual RTX 4090 + 64GB RAM | ~550 tok/s batch size 4 |
| Production | 4-8x A100/H100 | Full throughput |
DeepSeek V4's trillion parameters make the raw model larger, but the 32B active parameter count and Engram memory offloading to system DRAM keep VRAM requirements manageable. The Engram module can offload 100B+ parameter embedding tables with less than 3% throughput penalty, which is critical for consumer deployments.
DeepSeek's track record matters here. V3 shipped with excellent quantization support and community tooling from day one. If V4 follows the same pattern, expect Ollama, vLLM, and llama.cpp support within days of release.
Self-Hosting Verdict
Qwen 3.5 has the edge for minimal hardware deployments: its 17B active parameters vs DeepSeek V4's 32B means lower minimum VRAM requirements. Qwen 3.5 also has a full range of smaller variants (27B, 35B-A3B, 122B-A10B) for teams that need to match model size to hardware. DeepSeek V4's Engram offloading is clever, but at 1T total parameters, you need more storage and RAM even with aggressive quantization.
API Pricing
Both models aggressively undercut Western frontier models. The pricing gap between Chinese and US models is now 10-100x depending on the tier.
| Model | Input | Output | Notes |
|---|---|---|---|
| Qwen 3.5-Flash | $0.10 | $0.40 | Best value tier, 1M context |
| Qwen 3.5-Plus | $0.11-$1.20 | $0.40-$4.80 | Tiered by context length |
| DeepSeek V4 (est.) | $0.27 | $1.10 | Based on leaked pricing |
| DeepSeek V3.2 | $0.14 | $0.55 | Current generation (live) |
| GPT-5.2 | $1.75 | $14.00 | Reference: Western frontier |
| Claude Opus 4.5 | $5.00 | $25.00 | Reference: Western frontier |
Qwen 3.5-Flash at $0.10 input / $0.40 output is the cheapest way to access frontier-adjacent intelligence through an API. It costs roughly 1/13th of Claude Sonnet 4.6 for comparable tasks. The Plus tier is more expensive at higher context lengths but still 10x cheaper than Western alternatives.
DeepSeek V4's projected pricing ($0.27/$1.10) follows DeepSeek's pattern of being slightly more expensive than Qwen but still dramatically cheaper than OpenAI or Anthropic. DeepSeek historically offers 20-50x cheaper pricing than OpenAI on comparable models.
Cost for Typical Coding Workflows
A medium coding session (50K input tokens, 10K output tokens per request, 20 requests/day) costs roughly:
- Qwen 3.5-Flash: ~$0.18/day ($5.40/month)
- DeepSeek V4 (est.): ~$0.49/day ($14.70/month)
- Claude Opus 4.5: ~$10/day ($300/month)
For teams processing thousands of coding requests daily, the 50-100x cost advantage of Chinese open-weight models is not incremental. It changes what is economically feasible.
When to Use Which
| Your Situation | Pick This | Why |
|---|---|---|
| Need a model today | Qwen 3.5 | Released and available. DeepSeek V4 is not. |
| Budget-constrained API use | Qwen 3.5-Flash | $0.10/M input is the cheapest frontier-tier option |
| Repository-scale coding | Wait for DeepSeek V4 | 1M context + Engram memory designed for full-codebase tasks |
| Single-file coding tasks | Qwen 3.5 | 83.6 LiveCodeBench, strong on standard tasks |
| Consumer GPU self-hosting | Qwen 3.5 (27B or 35B) | Smallest active params, runs on a single GPU easily |
| Maximum benchmark scores | DeepSeek V4 (if claims hold) | 80%+ SWE-bench would match Claude Opus |
| Multilingual needs | Qwen 3.5 | 201 languages, 10-60% token savings on non-English |
| Agentic tool use | Qwen 3.5 | 72.9 BFCL v4, 78.6 BrowseComp, verified scores |
| Long-context retrieval | DeepSeek V4 | Engram boosts NIAH from 84% to 97% at 1M tokens |
| Multimodal (text + image + video) | Qwen 3.5 | Native multimodal, shipping now. V4 multimodal is unconfirmed. |
The Pragmatic Answer
Use Qwen 3.5 today. Evaluate DeepSeek V4 when it ships. The realistic scenario for most teams is running both, picking the right model per task type. Qwen 3.5 for quick iterations, tool-use-heavy agents, and budget-sensitive workloads. DeepSeek V4 (once available) for deep codebase reasoning and repository-scale refactoring.
Both models work with the same tooling ecosystem: Ollama, vLLM, OpenAI-compatible APIs, and every major coding agent. Switching between them is a config change, not a migration.
Frequently Asked Questions
Is DeepSeek V4 released yet?
As of March 2, 2026, no. The Financial Times reported it will arrive in the first week of March 2026, timed around China's annual Two Sessions parliamentary meetings starting March 4. TechNode reports that DeepSeek plans to release V4 "this week." All V4 benchmark numbers in this article come from leaked internal testing and are not independently verified.
Can I run these models on consumer hardware?
Yes. Qwen 3.5-397B runs on a single RTX 4090 at INT4 quantization with 64GB system RAM. The smaller Qwen 3.5-27B runs on any 24GB GPU without tricks. DeepSeek V4 reportedly targets dual RTX 4090s or a single RTX 5090 as its consumer-tier hardware requirement. Both models benefit from MoE architectures where only a fraction of total parameters are active per token.
Which model is better for coding?
Qwen 3.5 scores 83.6 on LiveCodeBench v6 and 76.4 on SWE-bench Verified, both confirmed. DeepSeek V4 claims 80%+ SWE-bench and 90% HumanEval from leaks. Qwen 3.5 is the safer pick today. DeepSeek V4 may surpass it for repository-level tasks once released, but those claims need independent verification.
How do they compare to GPT-5 and Claude Opus?
Qwen 3.5 claims to outperform GPT-5.2 and Claude Opus 4.5 on 80% of evaluated benchmarks. DeepSeek V4 targets SWE-bench parity with Claude Opus 4.5 (80.9%). The key differentiator is not capability but cost: Qwen 3.5 Flash costs $0.10/M input tokens vs $5.00 for Claude Opus, a 50x gap. Even if the Chinese models trail slightly on absolute benchmarks, the cost difference makes them practical for workloads that would be prohibitively expensive on Western APIs.
Which is better for building AI agents?
Qwen 3.5. Its 72.9 BFCL v4 score (function calling) and 78.6 BrowseComp (web browsing) are the highest among open-weight models. The 122B-A10B variant outperforms GPT-5 mini on tool use by 30%. DeepSeek V4 has not published agentic benchmarks yet.
Related Comparisons
Apply Code Edits from Any Model at 10,500+ tok/sec
Morph Fast Apply works with Qwen 3.5, DeepSeek, Claude, GPT, or any model. It handles the last mile of code editing: taking AI-generated diffs and applying them correctly to your files.