Claude Code Router: Route Claude Code to Any Model (2026 Guide)

What Claude Code Router Does

Claude Code talks Anthropic's message format and, by default, only to Claude. Claude Code Router (@musistudio/claude-code-router) is an open-source proxy that intercepts those requests, rewrites them for whatever provider you point it at, and picks the model per request type. You launch the agent with ccr code instead of claude, and nothing else about your workflow changes.

The reason to bother is cost shape. A coding session is not one workload. Summarizing a diff, naming a branch, and compacting old context are throwaway tasks. Planning a refactor is a reasoning task. Applying a 40-line edit across three files is where quality actually matters. Paying frontier-model output rates for all of it is the default, and it is wasteful.

1 session

Up to 6 different models routed by task type

60k

Default longContext token threshold

MIT

Open-source, free; you pay only model APIs

Router vs. swapping the model wholesale

If you only want Claude Code to run on a single non-Claude model, you do not need a router. Set the base URL and key directly, covered in using a different LLM with Claude Code. The router earns its place the moment you want different models for different request types. For the theory behind difficulty-based routing, see what an LLM router is.

Install in 60 Seconds

Two global npm packages: Claude Code itself, and the router.

Install

npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router

The CLI is ccr. The commands you will actually use:

ccr Commands

Command	What it does
ccr code	Launch Claude Code through the router (your main entry point)
ccr ui	Open the web config editor instead of hand-writing JSON
ccr start / stop / restart	Control the background router service; restart after config edits
ccr status	Check whether the router service is running
ccr model	Interactive model selector from the CLI

The config.json Structure

Configuration lives in ~/.claude-code-router/config.json. Two blocks matter: Providers (an array of where models come from) and Router (which model handles which request type). A minimal working file:

~/.claude-code-router/config.json

{
  "Providers": [
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/chat/completions",
      "api_key": "sk-xxx",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    }
  ],
  "Router": {
    "default": "deepseek,deepseek-chat",
    "background": "deepseek,deepseek-chat",
    "think": "deepseek,deepseek-reasoner",
    "longContext": "deepseek,deepseek-chat",
    "longContextThreshold": 60000,
    "webSearch": "deepseek,deepseek-chat"
  }
}

Each provider entry needs four fields: name (a unique label you reference later), api_base_url (the full endpoint, not just the host), api_key, and models (the model IDs that provider serves). transformer is optional and covered below. Optional top-level fields include APIKEY to lock down the local router, PROXY_URL, API_TIMEOUT_MS, and HOST.

The Five Routing Rules (Plus Two)

The Router object is where the savings live. Each key takes a provider,model string. Claude Code Router inspects every outbound request and picks the model by these rules:

Router Keys

Key	Fires when	Route it to
default	General coding requests that match no other rule	Your workhorse coding model
background	Cheap throwaway tasks: diff summaries, titles, compaction	The smallest, cheapest model you have
think	Planning / reasoning mode	A reasoning model
longContext	Conversation exceeds longContextThreshold (default 60000 tokens)	A long-context model
webSearch	Web search tool calls	A model with strong tool use
image (beta)	Image-related requests	A vision-capable model

The single highest-leverage rule is background. Claude Code fires background requests constantly to summarize and compact, and they do not need a frontier model. Pointing background at a small or open-source model removes a large chunk of token spend that was invisible before.

longContextThreshold is the knob most people miss

Below the threshold (60000 tokens by default), requests follow default. Past it, they jump to longContext. If your default model already has a 1M-token window, you can raise the threshold or point both keys at the same model. If it does not, set longContext to a model that does, or Claude Code will silently truncate context once the conversation grows.

Static Rules vs Learned Routing

The Router block routes by request type: background, think, long-context. It cannot route by request difficulty, because "fix this typo" and "design an event-sourcing layer" arrive as the same default request. Sending both to one model means overpaying on the easy one or under-serving the hard one. Splitting them needs a classifier in front of the routing decision.

That is what Morph's Router does. Trained on millions of vibecoding prompts, it scores each prompt on three axes, difficulty (easy / medium / hard / needs_info), ambiguity, and domain (general / summary / coding / design / data), then recommends a model from a catalog spanning Anthropic, OpenAI, Gemini, and DeepSeek. One call to /v1/router/multimodel returns the model to use:

Morph Router: pick the model by prompt difficulty

const res = await fetch("https://api.morphllm.com/v1/router/multimodel", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.MORPH_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "Add error handling to this function",
    allowed_providers: ["anthropic"],
    policy: "cost_efficient",
  }),
});
const { model } = await res.json();

Classification runs in about 180ms and costs $0.005 per request, with a policy knob (balanced, cost_efficient, capability_heavy, domain_skills) and allowed_models / allowed_providers filters to constrain the catalog. The two compose: Claude Code Router exposes a CUSTOM_ROUTER_PATH hook for your own routing logic, so you can call Morph's Router there and let the difficulty score, not a fixed rule, choose the model for each turn. The mechanics of difficulty-based routing are covered in what an LLM router is.

Adding Providers

A provider is one API endpoint and the models behind it. Add as many as you want; the Router rules mix and match across them. DeepSeek and OpenRouter as a two-provider setup:

Two providers, mixed routing

{
  "Providers": [
    {
      "name": "openrouter",
      "api_base_url": "https://openrouter.ai/api/v1/chat/completions",
      "api_key": "sk-or-xxx",
      "models": ["anthropic/claude-sonnet-4.6", "google/gemini-3.1-pro"],
      "transformer": { "use": ["openrouter"] }
    },
    {
      "name": "deepseek",
      "api_base_url": "https://api.deepseek.com/chat/completions",
      "api_key": "sk-xxx",
      "models": ["deepseek-chat", "deepseek-reasoner"],
      "transformer": { "use": ["deepseek"] }
    }
  ],
  "Router": {
    "default": "openrouter,anthropic/claude-sonnet-4.6",
    "background": "deepseek,deepseek-chat",
    "think": "deepseek,deepseek-reasoner",
    "longContext": "openrouter,google/gemini-3.1-pro",
    "webSearch": "openrouter,anthropic/claude-sonnet-4.6"
  }
}

For local models, point a provider at Ollama's OpenAI-compatible endpoint (http://localhost:11434/v1/chat/completions) and list the model tags you have pulled. See running Claude Code with Ollama for the local-model caveats around tool calling.

Routing to Open-Source Models

Any OpenAI-compatible endpoint serving open weights drops in as a single provider. Morph hosts GLM-5.2, Qwen 3.6, MiniMax M2.7, and DeepSeek V4 Flash on one endpoint, so they all live under a single provider entry:

Open-source models via an OpenAI-compatible endpoint

{
  "name": "morph",
  "api_base_url": "https://api.morphllm.com/v1/chat/completions",
  "api_key": "sk-...",
  "models": [
    "morph-glm52-744b",
    "morph-qwen36-27b",
    "morph-minimax27-230b",
    "morph-dsv4flash"
  ]
}

That maps cleanly onto the router rules: a small model on background, a large MoE on default, and a long-context model where the conversation grows. Why open weights at all is its own decision; the cost and benchmark case is in the best open-source coding model and per-model API pages.

A practical open-source routing table

Router key	Model	Why
background	morph-qwen36-27b	27B is plenty for diff summaries and titles; cheapest per token
default	morph-glm52-744b	744B MoE handles general coding
think	morph-minimax27-230b	Strong reasoning at low output price
longContext	morph-dsv4flash	DeepSeek V4 Flash carries a long window at a near-zero rate

Dynamic Switching and Subagent Routing

The static Router rules are defaults, not handcuffs. Two overrides:

Switch mid-session with /model

Inside Claude Code, type /model provider_name,model_name to move the current session to a different model on the fly. Useful when a task turns out harder than expected: bump from the cheap default up to a frontier model for one stretch, then drop back.

Pin a model per subagent

Start a subagent prompt with <CCR-SUBAGENT-MODEL>provider,model</CCR-SUBAGENT-MODEL> to force that subagent onto a specific model. Run exploration subagents on a cheap model and reserve the expensive one for the subagent doing the actual edits.

Per-subagent routing pairs naturally with how multi-agent systems already split work. Cheap parallel readers feeding a single expensive writer is the same shape as agent orchestration generally.

Transformers: Making Any API Speak Claude

Not every provider accepts Anthropic-shaped requests. Transformers are the adapters that reshape requests and responses so a given API behaves. The built-ins cover the common cases:

Built-in transformers

Transformer	Use it for
deepseek / gemini / openrouter	Provider-specific request and response reshaping
tooluse	Coaxing reliable function calling out of models that need a nudge
maxtoken	Setting a custom max output token limit
reasoning	Handling reasoning-token output formats
enhancetool	Improving tool-call reliability

Apply them per provider with "transformer": { "use": ["deepseek"] }, or per model inside a provider. You can also load a custom transformer from a JavaScript module for an endpoint nothing built-in covers.

The One Thing That Breaks: Edit Accuracy

Routing changes which model writes the code. It does not change Claude Code's edit format, which was tuned for Claude's exact output style. Route the default model to DeepSeek, Qwen, or GLM and you will see more failed edits: wrong line numbers, dropped context lines, malformed search-and-replace hunks. The agent retries, burns tokens, and sometimes corrupts the file.

The root cause is asking one model to both decide the change and emit a byte-perfect diff. Splitting those jobs fixes it. A fast-apply model takes the rough edit plus the original file and merges them deterministically, so the routed model only has to describe the change, not format it.

98%

Fast Apply merge accuracy

10,500 tok/s

Apply throughput on morph-v3-fast

any model

Works regardless of which model you route to

Why this matters more after you route

The cheaper the model you route to, the rougher its raw diffs, and the more an apply layer pays for itself. It is the same lesson as routing in general: the scaffold around the model moves outcomes more than the model does. Background on the approach is in fast apply and using a different LLM with Claude Code.

Frequently Asked Questions

What is Claude Code Router?

Claude Code Router (@musistudio/claude-code-router) is an open-source proxy that sits between the Claude Code CLI and model APIs. It translates Claude Code's Anthropic-format requests into the format each provider expects, then routes each request to a model you assign by scenario: default, background, reasoning, long-context, and web search. You start Claude Code through it with the command ccr code instead of claude.

How do I install Claude Code Router?

Install both packages globally with npm: npm install -g @anthropic-ai/claude-code and npm install -g @musistudio/claude-code-router. Then create ~/.claude-code-router/config.json with at least one provider and a Router block, and run ccr code to launch Claude Code through the router. ccr ui opens a web interface to edit the config without hand-writing JSON.

Can Claude Code Router use models other than Claude?

Yes, that is its purpose. It supports OpenRouter, DeepSeek, Ollama, Gemini, Volcengine, SiliconFlow, and any OpenAI-compatible endpoint. You add each as a Providers entry with name, api_base_url, api_key, and models, then reference provider,model in the Router rules. Open-source models like Qwen, GLM, MiniMax, and DeepSeek served on an OpenAI-compatible API work the same way.

What are the routing rules in Claude Code Router?

The Router object assigns a model to each request type: default (general coding), background (cheap, fast tasks like diff summaries), think (reasoning-heavy planning), longContext (conversations past the longContextThreshold, default 60000 tokens), webSearch, and image (beta). Each value is a string in the form provider,model. You can override any of them mid-session with the /model command.

Why do edits fail more often when I route Claude Code to another model?

Claude Code's search-and-replace edit format was tuned for Claude's output. Other models produce diffs with wrong line numbers, missing context, or malformed hunks, so edits fail and the agent retries. The fix is a dedicated apply layer: a fast-apply model takes the rough edit and the original file and merges them deterministically. Morph's Fast Apply runs this at roughly 10,500 tokens per second with around 98% accuracy, independent of which model you route the main agent to.

Does Claude Code Router cost anything?

The router itself is free and open-source under the MIT license. You pay only for the model APIs you route to. Routing background and simple tasks to small or open-source models while reserving a frontier model for hard edits typically cuts a Claude Code session's token bill by half or more without a visible quality drop on easy work.

Route to Any Model. Keep Edits Clean.

Whichever model Claude Code Router sends your edits to, Morph Fast Apply merges them at ~98% accuracy and 10,500 tokens per second. Pair it with WarpGrep for semantic codebase search, free for 100k requests.

Get a Morph API Key

Read the Docs

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers