Claude Code Router: Route Claude Code to Any Model (2026 Guide)

Claude Code only talks to Claude out of the box. Claude Code Router proxies it to DeepSeek, Qwen, GLM, MiniMax, Gemini, or local models, and routes each request type to a different model. Full config.json setup, routing rules, and the edit-accuracy fix.

June 9, 2026 · 1 min read
Claude Code Router: Route Claude Code to Any Model (2026 Guide)

What Claude Code Router Does

Claude Code talks Anthropic's message format and, by default, only to Claude. Claude Code Router (@musistudio/claude-code-router) is an open-source proxy that intercepts those requests, rewrites them for whatever provider you point it at, and picks the model per request type. You launch the agent with ccr code instead of claude, and nothing else about your workflow changes.

The reason to bother is cost shape. A coding session is not one workload. Summarizing a diff, naming a branch, and compacting old context are throwaway tasks. Planning a refactor is a reasoning task. Applying a 40-line edit across three files is where quality actually matters. Paying frontier-model output rates for all of it is the default, and it is wasteful.

1 session
Up to 6 different models routed by task type
60k
Default longContext token threshold
MIT
Open-source, free; you pay only model APIs

Router vs. swapping the model wholesale

If you only want Claude Code to run on a single non-Claude model, you do not need a router. Set the base URL and key directly, covered in using a different LLM with Claude Code. The router earns its place the moment you want different models for different request types. For the theory behind difficulty-based routing, see what an LLM router is.

Install in 60 Seconds

Two global npm packages: Claude Code itself, and the router.

Install

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router

The CLI is ccr. The commands you will actually use:

CommandWhat it does
ccr codeLaunch Claude Code through the router (your main entry point)
ccr uiOpen the web config editor instead of hand-writing JSON
ccr start / stop / restartControl the background router service; restart after config edits
ccr statusCheck whether the router service is running
ccr modelInteractive model selector from the CLI

The config.json Structure

Configuration lives in ~/.claude-code-router/config.json. Two blocks matter: Providers (an array of where models come from) and Router (which model handles which request type). A minimal working file:

~/.claude-code-router/config.json

{
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/chat/completions",
      "api_key": "sk-xxx",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    }
  ],
  "Router": {
    "default": "deepseek,deepseek-chat",
    "background": "deepseek,deepseek-chat",
    "think": "deepseek,deepseek-reasoner",
    "longContext": "deepseek,deepseek-chat",
    "longContextThreshold": 60000,
    "webSearch": "deepseek,deepseek-chat"
  }
}

Each provider entry needs four fields: name (a unique label you reference later), api_base_url (the full endpoint, not just the host), api_key, and models (the model IDs that provider serves). transformer is optional and covered below. Optional top-level fields include APIKEY to lock down the local router, PROXY_URL, API_TIMEOUT_MS, and HOST.

The Five Routing Rules (Plus Two)

The Router object is where the savings live. Each key takes a provider,model string. Claude Code Router inspects every outbound request and picks the model by these rules:

KeyFires whenRoute it to
defaultGeneral coding requests that match no other ruleYour workhorse coding model
backgroundCheap throwaway tasks: diff summaries, titles, compactionThe smallest, cheapest model you have
thinkPlanning / reasoning modeA reasoning model
longContextConversation exceeds longContextThreshold (default 60000 tokens)A long-context model
webSearchWeb search tool callsA model with strong tool use
image (beta)Image-related requestsA vision-capable model

The single highest-leverage rule is background. Claude Code fires background requests constantly to summarize and compact, and they do not need a frontier model. Pointing background at a small or open-source model removes a large chunk of token spend that was invisible before.

longContextThreshold is the knob most people miss

Below the threshold (60000 tokens by default), requests follow default. Past it, they jump to longContext. If your default model already has a 1M-token window, you can raise the threshold or point both keys at the same model. If it does not, set longContext to a model that does, or Claude Code will silently truncate context once the conversation grows.

Adding Providers

A provider is one API endpoint and the models behind it. Add as many as you want; the Router rules mix and match across them. DeepSeek and OpenRouter as a two-provider setup:

Two providers, mixed routing

{
  "Providers": [
    {
      "name": "openrouter",
      "api_base_url": "https://openrouter.ai/api/v1/chat/completions",
      "api_key": "sk-or-xxx",
      "models": ["anthropic/claude-sonnet-4.6", "google/gemini-3.1-pro"],
      "transformer": { "use": ["openrouter"] }
    },
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/chat/completions",
      "api_key": "sk-xxx",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    }
  ],
  "Router": {
    "default": "openrouter,anthropic/claude-sonnet-4.6",
    "background": "deepseek,deepseek-chat",
    "think": "deepseek,deepseek-reasoner",
    "longContext": "openrouter,google/gemini-3.1-pro",
    "webSearch": "openrouter,anthropic/claude-sonnet-4.6"
  }
}

For local models, point a provider at Ollama's OpenAI-compatible endpoint (http://localhost:11434/v1/chat/completions) and list the model tags you have pulled. See running Claude Code with Ollama for the local-model caveats around tool calling.

Routing to Open-Source Models

Any OpenAI-compatible endpoint serving open weights drops in as a single provider. Morph hosts Qwen 3.5, Qwen 3.6, MiniMax M2.7, and DeepSeek V4 Flash on one endpoint, so they all live under a single provider entry:

Open-source models via an OpenAI-compatible endpoint

{
  "name": "morph",
  "api_base_url": "https://api.morphllm.com/v1/chat/completions",
  "api_key": "sk-...",
  "models": [
    "morph-qwen35-397b",
    "morph-qwen36-27b",
    "morph-minimax27-230b",
    "morph-dsv4flash"
  ]
}

That maps cleanly onto the router rules: a small model on background, a large MoE on default, and a long-context model where the conversation grows. Why open weights at all is its own decision; the cost and benchmark case is in the best open-source coding model and per-model API pages.

Router keyModelWhy
backgroundmorph-qwen36-27b27B is plenty for diff summaries and titles; cheapest per token
defaultmorph-qwen35-397b397B MoE handles general coding
thinkmorph-minimax27-230bStrong reasoning at low output price
longContextmorph-dsv4flashDeepSeek V4 Flash carries a long window at a near-zero rate

Dynamic Switching and Subagent Routing

The static Router rules are defaults, not handcuffs. Two overrides:

Switch mid-session with /model

Inside Claude Code, type /model provider_name,model_name to move the current session to a different model on the fly. Useful when a task turns out harder than expected: bump from the cheap default up to a frontier model for one stretch, then drop back.

Pin a model per subagent

Start a subagent prompt with <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> to force that subagent onto a specific model. Run exploration subagents on a cheap model and reserve the expensive one for the subagent doing the actual edits.

Per-subagent routing pairs naturally with how multi-agent systems already split work. Cheap parallel readers feeding a single expensive writer is the same shape as agent orchestration generally.

Transformers: Making Any API Speak Claude

Not every provider accepts Anthropic-shaped requests. Transformers are the adapters that reshape requests and responses so a given API behaves. The built-ins cover the common cases:

TransformerUse it for
deepseek / gemini / openrouterProvider-specific request and response reshaping
tooluseCoaxing reliable function calling out of models that need a nudge
maxtokenSetting a custom max output token limit
reasoningHandling reasoning-token output formats
enhancetoolImproving tool-call reliability

Apply them per provider with "transformer": { "use": ["deepseek"] }, or per model inside a provider. You can also load a custom transformer from a JavaScript module for an endpoint nothing built-in covers.

The One Thing That Breaks: Edit Accuracy

Routing changes which model writes the code. It does not change Claude Code's edit format, which was tuned for Claude's exact output style. Route the default model to DeepSeek, Qwen, or GLM and you will see more failed edits: wrong line numbers, dropped context lines, malformed search-and-replace hunks. The agent retries, burns tokens, and sometimes corrupts the file.

The root cause is asking one model to both decide the change and emit a byte-perfect diff. Splitting those jobs fixes it. A fast-apply model takes the rough edit plus the original file and merges them deterministically, so the routed model only has to describe the change, not format it.

98%
Fast Apply merge accuracy
10,500 tok/s
Apply throughput on morph-v3-fast
any model
Works regardless of which model you route to

Why this matters more after you route

The cheaper the model you route to, the rougher its raw diffs, and the more an apply layer pays for itself. It is the same lesson as routing in general: the scaffold around the model moves outcomes more than the model does. Background on the approach is in fast apply and using a different LLM with Claude Code.

Frequently Asked Questions

What is Claude Code Router?

Claude Code Router (@musistudio/claude-code-router) is an open-source proxy that sits between the Claude Code CLI and model APIs. It translates Claude Code's Anthropic-format requests into the format each provider expects, then routes each request to a model you assign by scenario: default, background, reasoning, long-context, and web search. You start Claude Code through it with the command ccr code instead of claude.

How do I install Claude Code Router?

Install both packages globally with npm: npm install -g @anthropic-ai/claude-code and npm install -g @musistudio/claude-code-router. Then create ~/.claude-code-router/config.json with at least one provider and a Router block, and run ccr code to launch Claude Code through the router. ccr ui opens a web interface to edit the config without hand-writing JSON.

Can Claude Code Router use models other than Claude?

Yes, that is its purpose. It supports OpenRouter, DeepSeek, Ollama, Gemini, Volcengine, SiliconFlow, and any OpenAI-compatible endpoint. You add each as a Providers entry with name, api_base_url, api_key, and models, then reference provider,model in the Router rules. Open-source models like Qwen, GLM, MiniMax, and DeepSeek served on an OpenAI-compatible API work the same way.

What are the routing rules in Claude Code Router?

The Router object assigns a model to each request type: default (general coding), background (cheap, fast tasks like diff summaries), think (reasoning-heavy planning), longContext (conversations past the longContextThreshold, default 60000 tokens), webSearch, and image (beta). Each value is a string in the form provider,model. You can override any of them mid-session with the /model command.

Why do edits fail more often when I route Claude Code to another model?

Claude Code's search-and-replace edit format was tuned for Claude's output. Other models produce diffs with wrong line numbers, missing context, or malformed hunks, so edits fail and the agent retries. The fix is a dedicated apply layer: a fast-apply model takes the rough edit and the original file and merges them deterministically. Morph's Fast Apply runs this at roughly 10,500 tokens per second with around 98% accuracy, independent of which model you route the main agent to.

Does Claude Code Router cost anything?

The router itself is free and open-source under the MIT license. You pay only for the model APIs you route to. Routing background and simple tasks to small or open-source models while reserving a frontier model for hard edits typically cuts a Claude Code session's token bill by half or more without a visible quality drop on easy work.

Route to Any Model. Keep Edits Clean.

Whichever model Claude Code Router sends your edits to, Morph Fast Apply merges them at ~98% accuracy and 10,500 tokens per second. Pair it with WarpGrep for semantic codebase search, free for 100k requests.