Qwen Code CLI: Architecture, Tool Definitions, and Technical Deep Dive

Qwen Code is Alibaba's Apache 2.0 terminal coding agent. This guide covers the complete tool registry, agent loop internals, file edit format, MCP integration, multi-provider config, and benchmarks vs Claude Code and Gemini CLI.

March 3, 2026 · 1 min read

Quick Verdict

Decision Matrix (March 2026)

  • Choose Qwen Code if: You need multi-provider flexibility, JetBrains IDE support, or self-hosted models with full data privacy. 1,000 free requests/day via OAuth.
  • Choose Claude Code if: You need the highest SWE-bench accuracy (80.8% vs 69.6%), parallel sub-agent teams, or deterministic multi-file refactoring.
  • Choose Gemini CLI if: You need native Google Search grounding. Same free tier (1,000 req/day), same Apache 2.0 license, but locked to Gemini models.
69.6%
SWE-bench Verified (Qwen3-Coder-480B)
13
Built-in Tools in Tool Registry
1,000
Free API Requests Per Day (OAuth)

Qwen Code is a production-grade terminal agent that exposes a defined set of tools to the LLM through OpenAI-compatible function calling. The architecture is a ReAct loop: the model receives tool schemas on every turn, requests tool executions in its response, and receives structured results back. This is the same pattern as Claude Code and Gemini CLI, but Qwen Code is the only one of the three that lets you swap the backing model at runtime to any OpenAI-compatible endpoint.

The underlying Qwen3-Coder-480B model was trained with long-horizon reinforcement learning on 20,000 parallel environments, each running full GitHub issue resolution tasks. The RL training specifically targets multi-turn tool use, which is why the SWE-bench score (69.6%) is meaningfully higher than models trained only on code generation.

Agent Loop Architecture

Qwen Code is a TypeScript/Node.js application structured around a layered ReAct loop. Three core classes handle the lifecycle:

  • GeminiClient — manages session lifecycle and API connections
  • GeminiChat — owns conversation history and compression logic
  • Turn — processes individual LLM response streams, extracts function_calls

The execution path per turn:

  1. User input enters through InputPrompt with tab completion
  2. AppContainer prepares React context and UI state
  3. GeminiClient sends the conversation history plus all registered tool schemas (getFunctionDeclarations()) to the model API
  4. The model responds with text, one or more function_calls, or both
  5. Turn extracts function calls from the streaming response
  6. ToolScheduler validates parameters with Zod schemas, checks approval mode, runs tools (potentially in parallel), and returns ToolResult objects
  7. Results feed back into GeminiChat as tool response messages for the next turn

Tool results have two fields: llmContent (detailed output for the model, including exit codes and PIDs) and returnDisplay (formatted output for the terminal UI). This separation prevents the model's context from being polluted by display formatting.

Parallel Tool Execution

ToolScheduler supports running multiple tool calls in parallel when the model requests them simultaneously. Read-only tools (read_file, grep_search, glob) can run concurrently. Write and execute tools are serialized by default unless running in YOLO mode. This matters for performance on tasks like "read these 10 files and summarize each" — all 10 read_file calls execute in one batch.

Built-in Tool Registry

Every tool is a subclass of BaseDeclarativeTool, registered in ToolRegistry, and exported to the LLM via getFunctionDeclarations(). The LLM sees the same JSON Schema definitions you would see if you called /tools --verbose in a session.

Tool NameKindParametersConfirmation Required
read_fileReadpath (string, abs), offset (int?), limit (int?)No
write_fileWritefile_path (string, abs), content (string)Yes (diff shown)
editEditfile_path, old_string, new_string, replace_all (bool?)Yes (diff shown)
list_directoryReadpath (string), ignore (string[]?), respect_git_ignore (bool?)No
globReadpattern (string), path (string?)No
grep_searchReadpattern (string), path (string?), glob (string?), limit (int?)No
run_shell_commandExecutecommand (string), is_background (bool?), timeout_ms (int?)Yes (DEFAULT mode)
web_fetchReadurl (string), method (string?), headers (object?), body (string?)No
web_searchReadquery (string)No
save_memoryWriteoperation (add|clear), content (string?)No
todo_writeOtheroperation (add|update|complete|delete|clear), task detailsNo
taskOtherdescription (string), agent_type (file-search|...)No
skillOtheroperation (execute|list|describe), skill_name (string?)Depends

Tool Schema Format

Tools are exposed to the model as standard OpenAI function definitions. The getFunctionDeclarations() method on ToolRegistry returns an array sorted alphabetically. Here is the schema for edit:

edit tool — function declaration sent to LLM

{
  "type": "function",
  "function": {
    "name": "edit",
    "description": "Replaces text in a file. Requires old_string to match exactly once in the file unless replace_all is set to true.",
    "parameters": {
      "type": "object",
      "properties": {
        "file_path": {
          "type": "string",
          "description": "Absolute path to the file to modify."
        },
        "old_string": {
          "type": "string",
          "description": "Exact text to find. Must be unique in the file."
        },
        "new_string": {
          "type": "string",
          "description": "Text to replace old_string with."
        },
        "replace_all": {
          "type": "boolean",
          "description": "If true, replace all occurrences. Default: false."
        }
      },
      "required": ["file_path", "old_string", "new_string"]
    }
  }
}

run_shell_command tool — function declaration sent to LLM

{
  "type": "function",
  "function": {
    "name": "run_shell_command",
    "description": "Execute a shell command in a persistent session. Commands that modify the filesystem require user approval in DEFAULT mode.",
    "parameters": {
      "type": "object",
      "properties": {
        "command": {
          "type": "string",
          "description": "Shell command to execute."
        },
        "is_background": {
          "type": "boolean",
          "description": "If true, run asynchronously. Default: false."
        },
        "timeout_ms": {
          "type": "number",
          "description": "Maximum execution time in milliseconds."
        }
      },
      "required": ["command"]
    }
  }
}

read_many_files (@ command)

The @path syntax is a shortcut that triggers read_many_files, a specialized batch read tool. It injects directory contents into the prompt with git-aware filtering: files matching .gitignore and .qwenignore patterns are excluded. This is how you efficiently pull an entire module into context without manually listing files.

File Edit Format

Qwen Code uses a search-and-replace format for file edits, not unified diffs. The model calls the edit tool with old_string and new_string. The tool validates that old_string appears exactly once in the target file, then replaces it.

ToolFormatUse Case
editSearch-and-replace (old_string → new_string)Targeted changes to existing files
write_fileFull file content overwriteCreating new files or complete rewrites

The search-and-replace approach has a known failure mode: if the model specifies an old_string that matches zero or multiple locations, the edit fails. Qwen Code includes a multi-stage correction mechanism to handle this: if the initial match fails, the tool calls the model again with the file content and asks it to refine the old_string to be more specific. This loop runs up to a configured number of retries before giving up.

Example: model requests an edit

// Model output (function call in response stream)
{
  "name": "edit",
  "arguments": {
    "file_path": "/home/user/project/src/api.ts",
    "old_string": "  const timeout = 5000;",
    "new_string": "  const timeout = 30000;"
  }
}

// Tool result returned to model
{
  "llmContent": "Successfully replaced 1 occurrence in /home/user/project/src/api.ts",
  "returnDisplay": "✓ Edited src/api.ts (1 change)"
}

Comparing Edit Formats Across Terminal Agents

All three major terminal agents use search-and-replace as their primary edit format:

  • Qwen Code: edit tool with old_string/new_string + write_file for full overwrites
  • Claude Code: str_replace_based_edit_tool with old_str/new_str (same pattern)
  • Gemini CLI: replace_file_content with diff patches via the diff package

The practical difference is in the correction mechanism. Qwen Code's multi-stage fuzzy correction handles model hallucination better than a hard fail. Claude Code relies on the model generating precise strings from the start. For fast-apply use cases, Morph's API handles the edit translation layer regardless of which agent generates the instruction.

Context Management

QWEN.md Project Files

Project context loads from QWEN.md files in a hierarchy: global user (~/.qwen/QWEN.md), then project root, then subdirectories. Run /init inside a session to generate a starter QWEN.md from your current project structure. The context.fileName setting lets you override the filename.

Context Compression

When conversation history grows large, two mechanisms kick in:

  • Manual: /compress summarizes the current chat history into a condensed form. Useful before starting a new sub-task that does not need full prior context.
  • Automatic: model.chatCompression.contextPercentageThreshold triggers compression when history exceeds a set percentage of the model's context window. Default is typically 70-80%.

Context Window Limits

Qwen Code determines the effective context window from built-in defaults per model name. If your provider's actual limit differs (for example, you're using an older API version or a quantized local model), override it with contextWindowSize in generationConfig:

Override context window in settings.json

{
  "model": {
    "name": "qwen3-coder-plus",
    "generationConfig": {
      "contextWindowSize": 131072,
      "enableCacheControl": true,
      "samplingParams": {
        "temperature": 0.1,
        "top_p": 0.95,
        "max_tokens": 8192
      }
    }
  }
}

Token Caching

For API key authentication (not OAuth), Qwen Code activates prompt caching when the provider supports it. System instructions and stable QWEN.md context are cached, so repeated turns in the same session do not re-process the full system prompt. Set enableCacheControl: true in generationConfig to enable. DashScope and OpenRouter support cache pricing for Qwen models.

Context FeatureQwen CodeClaude CodeGemini CLI
Project memory fileQWEN.md (configurable)CLAUDE.mdGEMINI.md
Manual compression/compress commandAutomatic compaction/compress command
Context window overridecontextWindowSize in generationConfigNot exposedNot exposed
Native context (Qwen3-Coder-480B)256K tokens200K (Opus 4.6)1M (Gemini 2.5 Pro)
Extended context1M via YaRN extrapolationN/A1M (native)
Token cachingYes (API key auth)Yes (included)Yes (Gemini API)
Session persistence/chat save/resumeCross-session memoryLimited

MCP Integration

MCP (Model Context Protocol) servers register additional tools into the ToolRegistry at startup. From the model's perspective, MCP tools are indistinguishable from built-in tools — they appear in the same getFunctionDeclarations() array.

Discovery Process

On startup, discoverMcpTools() iterates the mcpServers config, establishes transport connections, calls each server's tool listing endpoint, sanitizes schemas (removes $schema and additionalProperties for compatibility), and registers tools. Name conflicts use first-registration-wins with prefixing: if two servers both expose a tool named search, the second becomes serverName__search.

Transport Types

  • Stdio: Spawns a subprocess, communicates via stdin/stdout. Best for local Python or Node.js servers.
  • SSE: Connects to a Server-Sent Events endpoint. For persistent remote servers.
  • Streamable HTTP: HTTP streaming. For cloud-hosted MCP services.

MCP server configuration examples (settings.json)

{
  "mcpServers": {
    "warpgrep": {
      "command": "npx",
      "args": ["-y", "@morphllm/warpgrep-mcp"],
      "env": {
        "WARPGREP_API_KEY": "$WARPGREP_API_KEY"
      },
      "timeout": 10000
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"],
      "trust": false
    },
    "remote-tools": {
      "httpUrl": "https://my-mcp-server.example.com/mcp",
      "headers": {
        "Authorization": "Bearer $REMOTE_MCP_TOKEN"
      },
      "timeout": 5000,
      "includeTools": ["search", "fetch"],
      "excludeTools": ["dangerous_tool"]
    },
    "google-drive": {
      "url": "https://mcp.googleapis.com/sse",
      "authProviderType": "google_credentials"
    }
  }
}

Tool Wrapping

Each discovered MCP tool is wrapped in a DiscoveredMCPTool instance. This wrapper handles:

  • Confirmation dialogs (bypassed when trust: true on the server config)
  • Per-tool include/exclude filtering via includeTools and excludeTools
  • Execution routing: the tool JSON parameters are passed to the MCP server via stdin (Stdio) or POST (HTTP)
  • Response normalization: MCP responses are mapped to the standard ToolResult format

OAuth for Remote MCP Servers

Remote SSE/HTTP servers that require authentication trigger a browser-based OAuth flow. The token is stored in ~/.qwen/mcp-oauth-tokens.json. Three provider types are supported: dynamic_discovery (server advertises its OAuth config), google_credentials (Application Default Credentials), and service_account_impersonation (Google Cloud service accounts).

Multi-Provider Configuration

Qwen Code is provider-agnostic. The modelProviders array in ~/.qwen/settings.json defines available models. Switch at runtime with /model. Each provider entry specifies a protocol, base URL, model ID, and environment variable for the API key.

Multi-provider settings.json — full example

{
  "modelProviders": [
    {
      "protocol": "openai",
      "id": "qwen3-coder-plus",
      "displayName": "Qwen3-Coder (DashScope)",
      "baseURL": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "envKey": "QWEN_API_KEY"
    },
    {
      "protocol": "openai",
      "id": "qwen3-coder-480b-a35b-instruct",
      "displayName": "Qwen3-Coder-480B (OpenRouter)",
      "baseURL": "https://openrouter.ai/api/v1",
      "envKey": "OPENROUTER_API_KEY",
      "generationConfig": {
        "contextWindowSize": 262144
      }
    },
    {
      "protocol": "anthropic",
      "id": "claude-sonnet-4-5-20251022",
      "displayName": "Claude Sonnet 4.5",
      "envKey": "ANTHROPIC_API_KEY"
    },
    {
      "protocol": "openai",
      "id": "gpt-5.4",
      "displayName": "GPT-5.4 (OpenAI)",
      "envKey": "OPENAI_API_KEY"
    },
    {
      "protocol": "openai",
      "id": "qwen3-coder",
      "displayName": "Qwen3-Coder (Ollama local)",
      "baseURL": "http://localhost:11434/v1",
      "envKey": ""
    }
  ],
  "model": {
    "name": "qwen3-coder-plus"
  }
}

Environment setup

# DashScope (Alibaba Cloud — direct Qwen API)
export QWEN_API_KEY=sk-...

# OpenRouter (multi-model routing, same endpoint)
export OPENROUTER_API_KEY=sk-or-...

# Anthropic (for Claude models via Qwen Code)
export ANTHROPIC_API_KEY=sk-ant-...

# OpenAI (for GPT-5.4)
export OPENAI_API_KEY=sk-...

# Tavily (web_search tool — required for web search)
export TAVILY_API_KEY=tvly-...

# Launch with a specific model
qwen --model qwen3-coder-480b-a35b-instruct

Switching providers mid-session with /model creates a new ContentGenerator instance for the selected provider but preserves conversation history. The model receives the full prior context on the first turn after switching.

Permission and Sandbox Model

Four Approval Modes

Every tool call routes through the approval system before execution. Four modes control the behavior:

ModeFlagRead ToolsEdit/Write ToolsShell Commands
PLAN--approval-mode planAutoBlockedBlocked
DEFAULT--approval-mode defaultAutoRequires approvalRequires approval
AUTO-EDIT--approval-mode auto-editAutoAutoRequires approval
YOLO--approval-mode yoloAutoAutoAuto

Per-tool overrides are available through tools.allowed (whitelist specific tools that bypass confirmation regardless of mode) and tools.exclude (remove tools from the registry entirely).

Sandboxing

The --sandbox flag enables container isolation for run_shell_command. Three backends:

  • Docker/Podman (Linux): Uses the qwen-code-sandbox image. Drop in a custom .qwen/sandbox.Dockerfile for project-specific toolchains.
  • Seatbelt (macOS): sandbox-exec profiles restrict filesystem and network access. Configured via SEATBELT_PROFILE.
  • No sandbox: Default. All shell commands run in the host environment with workspace boundary enforcement only.

Workspace Boundary Enforcement

Independent of approval mode, all file tools (read_file, write_file, edit) validate that paths are absolute and within the registered workspace directories. Commands cannot access files outside the project root. The /directory add slash command extends the workspace to additional directories.

settings.json — permission configuration

{
  "tools": {
    "approvalMode": "auto-edit",
    "allowed": ["read_file", "grep_search", "glob", "list_directory"],
    "exclude": ["web_search"],
    "sandbox": false,
    "shell": {
      "enableInteractiveShell": true
    },
    "enableToolOutputTruncation": true,
    "truncateToolOutputThreshold": 50000,
    "truncateToolOutputLines": 500
  }
}

Setup and Installation

Requires Node.js 20+. Install takes under a minute.

Install

# npm (recommended)
npm install -g @qwen-code/qwen-code@latest

# Homebrew (macOS/Linux)
brew install qwen-code

# Official install script
curl -fsSL https://qwen-code-assets.oss-cn-hangzhou.aliyuncs.com/installation/install-qwen.sh | bash

# Verify
qwen --version

Quick start with Qwen OAuth (free, no API key)

# Launch — select "Qwen OAuth" on first run
qwen

# Headless mode (CI, scripts)
qwen -p "Add error handling to src/api.ts"

# Pipe mode
git diff HEAD~1 | qwen -p "Summarize these changes"
tail -f app.log | qwen -p "Alert me when you see an error"

# Launch in auto-edit mode (no confirmation for file edits)
qwen --approval-mode auto-edit

# Launch with sandboxing enabled
qwen --sandbox

Project initialization

# In your project root, generate QWEN.md
qwen
/init

# QWEN.md structure (customize for your project)
# ---
# Project: my-api
# Stack: TypeScript, Node.js 20, PostgreSQL
# Test command: bun test
# Lint command: bun run lint
# Build command: bun run build
# Key directories: src/api, src/db, tests/
# ---

Qwen Code vs Claude Code vs Gemini CLI

CapabilityQwen CodeClaude CodeGemini CLI
LicenseApache 2.0ProprietaryApache 2.0
Free tier1,000 req/day (OAuth)None1,000 req/day (Google account)
SWE-bench Verified69.6% (Qwen3-Coder-480B)80.8% (Opus 4.6)63.8% (Gemini 2.5 Pro)
Native context window256K tokens200K tokens (Opus 4.6)1M tokens
Extended context1M via YaRNN/A1M (native)
File edit formatSearch-replace (old_string/new_string)Search-replace (old_str/new_str)Unified diff patches
Multi-provider supportAny OpenAI-compatible endpointAnthropic onlyGoogle only
Self-hosted modelsYes (Ollama, vLLM)NoNo
Parallel tool executionYes (ToolScheduler)YesYes
Parallel sub-agentsSingle task toolYes (agent teams)No
MCP supportYes (all 3 transports)YesYes
Google Search groundingNo (Tavily API for web_search)NoBuilt-in native
IDE integrationVS Code (beta), JetBrainsVS CodeVS Code, Zed
SandboxingDocker/Podman/macOS SeatbeltmacOS SeatbeltNo native sandboxing
Context compression/compress + auto thresholdAutomatic compaction/compress command
Token cachingYes (enableCacheControl)Yes (included)Yes (Gemini API)

Model Support and Token Costs

Qwen Code works with any OpenAI-compatible provider. These are the models and costs most relevant to Qwen Code deployments as of March 2026:

ModelInput (per 1M tokens)Output (per 1M tokens)ContextProvider
Qwen3-Coder-480B (free tier)$0 (OAuth, 1K req/day)$0256KDashScope OAuth
Qwen3-Coder-480B (DashScope API)$0.22$1.00256KDashScope
Qwen3-Coder-Plus (DashScope)$0.30$1.50256KDashScope
Qwen3-Coder-480B (OpenRouter)~$0.22~$1.00256KOpenRouter
Claude Sonnet 4.5$3.00$15.00200KAnthropic direct
Claude Opus 4.6$15.00$75.00200KAnthropic direct
GPT-5.4$2.50$10.00128KOpenAI direct
Qwen3-Coder (Ollama local)$0 (hardware cost)$0Up to 256KSelf-hosted

Cost comparison: 100K tokens of context

For a session with 100K input tokens and 5K output tokens:

  • Qwen3-Coder-480B (free OAuth): $0 (counts as 1 request)
  • Qwen3-Coder-480B (DashScope): $0.022 input + $0.005 output = $0.027
  • Claude Sonnet 4.5: $0.30 input + $0.075 output = $0.375
  • Claude Opus 4.6: $1.50 input + $0.375 output = $1.875

For heavy API usage, Qwen3-Coder is 10-70x cheaper than Claude models at the same context length. The tradeoff is the 11-point SWE-bench gap.

Benchmarks

Qwen3-Coder-480B was trained with long-horizon RL on 20,000 parallel environments. The RL loop specifically targets SWE-bench-style tasks: given a GitHub issue and a repository, resolve the issue through multi-turn tool use. This training approach directly optimizes for the benchmark, which is why the score (69.6%) is meaningful rather than an artifact of overfitting.

BenchmarkQwen3-Coder-480BClaude Opus 4.6Gemini 2.5 Pro
SWE-bench Verified69.6%80.8%63.8%
LiveCodeBench v5Competitive (exact score TBD)N/A70.4%
BFCL (function calling)Strong (RL-trained)N/AN/A
CodeForces ELOReported, specific score varies by runN/AN/A

The 11-point gap between Qwen3-Coder-480B (69.6%) and Claude Opus 4.6 (80.8%) on SWE-bench is not uniform across task types. Simple single-file bugs: gap is small. Complex multi-file refactors requiring deep dependency understanding: gap is large. For a free model, 69.6% is a high floor. For production critical paths where failing 1 in 10 tasks has real cost, Claude Code's accuracy advantage justifies the subscription.

What Developers Are Saying

Hacker News — Free Tier Discussion

"The qwen coder CLI gives you 1,000 free requests per day to the qwen coder model... Roo Code in VS Code and Qwen Coder in LM Studio is a decent local-only combo."

It's FOSS — Switching from Claude Code

An experienced sysadmin who switched from Claude Code to Qwen Code reported it "turns intent into correct, reviewable shell commands without any fees or demanding blind trust." The switch was driven by cost, data privacy from local deployment, and sufficient capability for Linux/sysadmin workflows.

InfoWorld — Honest Assessment

"Qwen Code is good but not great." Testing found the tool excels at new code generation but struggles with complex debugging: it "switched to a much simpler (and wrong) algorithm and couldn't figure out how to fix its own mistake."

GitHub Issues — Known Issues

Issue #1924: "Useless compression and buggy contextWindowSize" — users report that /compress sometimes fails to meaningfully reduce context on large sessions, and the contextWindowSize override is not always respected by all providers. Issue #882: quota exhaustion shows misleading error messages in the free tier.

Limitations

Technical Limitations (March 2026)

  • Complex multi-file debugging: Real-world performance diverges from benchmark scores on tasks requiring tight dependency tracking across 10+ files. The multi-stage edit correction helps but does not eliminate failures.
  • OAuth unavailable in non-interactive environments: Free tier OAuth requires a browser. CI, SSH sessions, and containers must use API key authentication. Free tier does not extend to headless use.
  • contextWindowSize bugs: Some providers do not respect the override, causing Qwen Code to miscalculate compression triggers. Workaround: set it explicitly and verify with /stats.
  • No parallel sub-agents: The task tool delegates to a single subagent at a time. Claude Code's agent teams (multiple sub-agents with shared task lists running in parallel) have no equivalent. For large parallel refactoring jobs, this is a real gap.
  • VS Code extension is beta: The Qwen Code Companion extension is newer than the Gemini CLI or Claude Code IDE integrations. Diff preview and real-time change display have reported rendering issues.
  • Model content restrictions: Qwen models include content filtering that affects Chinese political topics. For most development tasks this is irrelevant, but it can surface unexpectedly in compliance or legal tech contexts.
  • Web search requires Tavily API key: Unlike Gemini CLI's native Google Search grounding, web_search in Qwen Code requires a separate Tavily API key. The Tavily free tier covers 1,000 searches/month.

When to Use Qwen Code

JetBrains IDE Users

Qwen Code is the only major terminal agent with JetBrains companion support. IntelliJ, PyCharm, WebStorm users get native diff preview and real-time change display without leaving the IDE.

Self-Hosted Model Deployments

Point Qwen Code at a local Ollama or vLLM endpoint running Qwen3-Coder. All inference stays on-premise. Necessary for air-gapped environments or strict data residency requirements.

Multi-Model CI Workflows

Headless mode plus multi-provider config means you can run different models per task type in CI: cheap Qwen for static analysis, Claude for complex refactors, all via the same Qwen Code binary.

Cost-Sensitive Teams

Qwen3-Coder-480B costs $0.22/1M input tokens on DashScope. Claude Sonnet 4.5 costs $3.00/1M. For teams running thousands of agentic sessions per day, that 13x cost difference compounds fast.

Open-Source Auditability

Apache 2.0 license, 19.8K GitHub stars. You can read the ToolRegistry implementation, audit the permission system, fork and modify the tool schemas, or integrate qwen-code-core as a library in your own tooling.

Unix Pipeline Integration

git diff | qwen -p, tail -f app.log | qwen -p, any stdin pipe works. Composable with existing shell workflows. Run in cron jobs, GitHub Actions, or as part of custom Makefiles.

PriorityBest ChoiceReason
Highest SWE-bench accuracyClaude Code80.8% vs 69.6% — 11-point gap on hard tasks
JetBrains IDEQwen CodeOnly major agent with JetBrains companion plugin
Self-hosted inferenceQwen CodeOllama/vLLM endpoint support
Multi-model flexibilityQwen CodeAny OpenAI-compatible endpoint at runtime
Google Search groundingGemini CLINative, no separate API key
Parallel sub-agentsClaude CodeAgent teams not available in Qwen Code
Free daily requestsQwen Code or Gemini CLIBoth: 1,000 req/day free
Cheapest API callsQwen Code (DashScope)$0.22/1M input vs $3+ for Claude
Sandboxing optionsQwen CodeDocker, Podman, macOS Seatbelt all supported

Frequently Asked Questions

What file edit format does Qwen Code use?

Search-and-replace. The edit tool takes old_string (must match exactly once) and new_string. A multi-stage correction mechanism retries with fuzzy matching if the exact string is not found. For full file overwrites, write_file is used instead. Both tools show a diff before writing and require confirmation in DEFAULT mode.

What tools does the LLM actually see?

Thirteen built-in tools as OpenAI function definitions: read_file, write_file, edit, list_directory, glob, grep_search, run_shell_command, web_fetch, web_search, save_memory, todo_write, task, and skill. MCP servers add additional tools to the same registry. Run /tools --verbose to see the full JSON schemas for every registered tool.

How does Qwen Code manage context on large codebases?

Project context loads from hierarchical QWEN.md files. /compress summarizes history manually. chatCompression.contextPercentageThreshold triggers automatic compression. The @path syntax batch-reads files with git-aware filtering. For models with smaller effective context, override with contextWindowSize in generationConfig.

How does MCP integration work technically?

discoverMcpTools() runs at startup, connects to configured servers via Stdio/SSE/HTTP, fetches tool schemas, sanitizes them (removes $schema and additionalProperties), and registers tools in ToolRegistry. MCP tools are wrapped in DiscoveredMCPTool for confirmation logic. Tool name conflicts get server-prefix namespacing.

What approval modes exist and how do they differ?

PLAN: no modifications allowed. DEFAULT: approval required for all edit/write/exec tools. AUTO-EDIT: file edits auto-approved, shell commands need approval. YOLO: everything auto-approved. Set via --approval-mode flag or tools.approvalMode in settings.json. Individual tools can be permanently whitelisted with tools.allowed.

Can I run Qwen Code in CI without interactive auth?

Yes, but not with the free OAuth tier — OAuth requires a browser. Use a QWEN_API_KEY (DashScope) or any other provider API key. Set the key as a CI secret, configure it in settings.json or as an environment variable, and run with qwen -p "prompt" for headless execution.

How does Qwen3-Coder handle function calling differently from standard models?

Qwen3-Coder was specifically RL-trained on tool use tasks, which means it's more reliable than fine-tuned-only models at knowing when to call tools vs. when to respond directly. The model uses a Hermes-style tool calling format and was deployed with --tool-call-parser qwen3_coder on vLLM. It runs exclusively in non-thinking mode (no extended reasoning blocks), which keeps tool call latency low and outputs immediately deployable.

How does Qwen Code compare to Gemini CLI architecturally?

They share the same base architecture (Qwen Code is a fork of Gemini CLI). Divergences: Qwen Code adds enhanced parser support for Qwen models, broader provider backend support (Anthropic, Google, OpenAI all in one config), JetBrains integration, and Docker/Podman sandboxing. Gemini CLI adds native Google Search grounding, which Qwen Code replaces with the Tavily-powered web_search tool. The agent loop, tool dispatch, and MCP integration patterns are nearly identical.

Related

Add Semantic Search to Qwen Code via MCP

WarpGrep is a semantic codebase search MCP server. Add it to mcpServers in your Qwen Code settings.json and the agent finds the right files on the first tool call instead of glob-and-grep cycles.

Sources