
Quick Answer
TL;DR
Set two environment variables to connect Claude Code to any OpenAI-compatible API:
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""This works with OpenRouter (200+ models), LiteLLM (any provider), and Z.AI. For local models like Ollama, you need LiteLLM as a translation layer since they use different protocols.
Why Developers Switch to Different Models
Claude Code ships locked to Anthropic's API. For most people, that's fine. But you'll want an escape hatch when:
- Rate limits hit hard — Anthropic's per-minute caps are real. During crunch time, you'll hit them. One team we talked to started routing overflow to GPT-4o just to keep shipping.
- Budget constraints — Claude Sonnet runs about $3/million input tokens. GPT-4o-mini is $0.15. For bulk refactoring or test generation, that 20x difference matters.
- Air-gapped environments — Defense contractors and healthcare orgs often can't send code to external APIs. Period.
- Curious about alternatives — Qwen3-Coder is surprisingly good at Python. Gemini's 1M context handles massive monorepos. You won't know until you try.
A setup we've seen work: GPT-4o for architecture discussions and planning (it's good at that), Claude for implementation (still the best at tool use), and a local 14B model for quick questions while offline. The proxy layer makes switching trivial.
Fair Warning
Claude Code was purpose-built for Claude models. According to Anthropic's engineering blog, the agent relies heavily on tool calling patterns that Claude handles natively. Other models can struggle with:
- Multi-step tool chains (especially the "read file → think → edit file" loop)
- Generating diffs that actually apply cleanly
- Extended thinking and artifacts (Opus-only features anyway)
That said, GPT-4o and Gemini Pro are solid alternatives for most workflows.
Method 1: Direct API Configuration
The simplest approach works with providers that offer Anthropic-compatible endpoints. No proxy needed.
Providers with Native Anthropic API Support
| Provider | Base URL | Notes |
|---|---|---|
| OpenRouter | https://openrouter.ai/api | 200+ models, Anthropic skin built-in |
| Z.AI (GLM) | https://api.z.ai/api/anthropic | Chinese models, global access |
| Amazon Bedrock | Via LiteLLM proxy | Enterprise, requires IAM setup |
| Azure OpenAI | Via LiteLLM proxy | Enterprise, requires configuration |
Step-by-Step Setup
1. Get your API key from your chosen provider.
2. Set environment variables in your terminal:
Terminal Configuration
# For OpenRouter
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
# Prevent Claude Code from trying Anthropic auth
export ANTHROPIC_API_KEY=""3. Make it permanent by adding to your shell profile:
Shell Profile (~/.zshrc)
# Add to ~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""4. Verify the connection:
# Restart your terminal, then run Claude Code
claude
# Inside Claude Code, check status
/statusCommon Mistake
Don't put these in a project .env file. Claude Code doesn't read standard .env files. Use your shell profile or set them in the terminal directly.
Method 2: LiteLLM Proxy
LiteLLM acts as a universal translator between Claude Code and any LLM provider. It handles the API format conversion automatically.
Use this method when:
- Your provider only offers OpenAI-format APIs
- You want to route between multiple providers with fallbacks
- You need usage tracking and cost controls
- You're connecting to self-hosted models
Quick Setup
LiteLLM Installation
# Install LiteLLM
pip install litellm
# Create config file
cat > litellm_config.yaml << 'EOF'
model_list:
- model_name: claude-sonnet-4-5-20250929
litellm_params:
model: openrouter/anthropic/claude-sonnet-4.5
api_key: sk-or-v1-your-key
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-your-openai-key
- model_name: qwen3-coder
litellm_params:
model: openrouter/qwen/qwen3-coder
api_key: sk-or-v1-your-key
EOF
# Start the proxy
litellm --config litellm_config.yaml --port 4000Connect Claude Code to LiteLLM
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="sk-1234" # Any value works locally
export ANTHROPIC_MODEL="claude-sonnet-4-5-20250929" # Or gpt-4o, qwen3-coder
# Now run Claude Code
claudeThe proxy translates requests on the fly. Claude Code thinks it's talking to Anthropic, but your requests go to whatever model you configured.
Shell Aliases for Fast Switching
Quick Model Switching
# Add to ~/.zshrc or ~/.bashrc
alias claude-gpt='ANTHROPIC_MODEL=gpt-4o claude'
alias claude-qwen='ANTHROPIC_MODEL=qwen3-coder claude'
alias claude-local='ANTHROPIC_MODEL=ollama/codellama claude'
# Usage
claude-gpt # Starts Claude Code with GPT-4o
claude-qwen # Starts Claude Code with Qwen3-CoderMethod 3: OpenRouter Direct Connection
OpenRouter provides an "Anthropic skin" that speaks Claude Code's native protocol. No proxy needed, direct connection with access to 200+ models.
Setup
# Get your key from openrouter.ai/keys
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""
# Start Claude Code
claudeSelecting Models
Use the /model command inside Claude Code to switch between OpenRouter models:
# Inside Claude Code
/model anthropic/claude-sonnet-4 # Claude Sonnet
/model openai/gpt-4o # GPT-4o
/model google/gemini-2.0-flash # Gemini 2.0 Flash
/model qwen/qwen3-coder # Qwen3-CoderOpenRouter Pricing
OpenRouter often beats direct API pricing through volume aggregation. Check openrouter.ai/models for current rates. Some models are free for limited use.
Method 4: anyclaude Wrapper
anyclaude is a drop-in wrapper from Coder that makes multi-provider switching dead simple. No proxy server, no config files—just environment variables.
anyclaude Setup
# Install with your package manager
bun add -g anyclaude # or npm, pnpm
# Set your keys
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
export XAI_API_KEY="..."
# Run with any model using prefixes
anyclaude --model openai/gpt-4o
anyclaude --model google/gemini-2.0-flash
anyclaude --model xai/grok-2The prefix tells anyclaude which provider to route to. It handles the API translation under the hood. For OpenRouter models, set OPENAI_API_URL to their endpoint.
This is probably the fastest way to test different models without committing to a full proxy setup. The tradeoff: less flexibility than LiteLLM for complex routing rules.
Method 5: Local LLMs with Ollama
Running models locally eliminates API costs and keeps your code private. The tradeoff: you need decent hardware and most local models don't support tool calling well.
Hardware Requirements
| Model Size | VRAM Needed | Example Models |
|---|---|---|
| 7B parameters | 8GB | CodeLlama-7B, Qwen2.5-Coder-7B |
| 14B parameters | 16GB | Qwen2.5-Coder-14B |
| 32B parameters | 24GB | Qwen2.5-Coder-32B |
| 70B+ parameters | 48GB+ | CodeLlama-70B, DeepSeek-Coder-V2 |
Setup with Ollama + LiteLLM
Local Model Setup
# 1. Install and start Ollama
brew install ollama # macOS
ollama serve
# 2. Pull a coding model
ollama pull qwen2.5-coder:14b
# 3. Configure LiteLLM to use Ollama
cat > litellm_config.yaml << 'EOF'
model_list:
- model_name: claude-sonnet-4-5-20250929
litellm_params:
model: ollama/qwen2.5-coder:14b
api_base: http://localhost:11434
EOF
# 4. Start proxy
litellm --config litellm_config.yaml --port 4000
# 5. Connect Claude Code
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="local"
claudeTool Calling Limitations
Most local models have weak or no tool calling support. This means Claude Code can generate code suggestions but can't automatically read files, run commands, or apply edits. You'll need to copy-paste more manually.
Model Compatibility Matrix
Claude Code's full feature set requires tool calling, large context windows, and reliable instruction following. Here's how popular models stack up:
| Model | Tool Calling | Context | Edit Accuracy | Best For |
|---|---|---|---|---|
| Claude Sonnet 4 | Excellent | 200K | 98% | Full Claude Code experience |
| GPT-4o | Good | 128K | 90% | Planning, complex reasoning |
| GPT-4o-mini | Good | 128K | 85% | Cost-efficient tasks |
| Gemini 2.0 Flash | Good | 1M | 88% | Large codebase analysis |
| Qwen3-Coder | Limited | 128K | 82% | Code generation only |
| DeepSeek-V3 | Limited | 128K | 80% | Explanations, generation |
| Local 7B models | Poor | 8-32K | 60% | Quick prototyping only |
"Edit Accuracy" measures how often the model's suggested changes apply cleanly without manual intervention. Lower accuracy means more failed edits and retry loops.
Fixing Common Problems
These come up constantly when troubleshooting alternative model setups. Most are fixable in under a minute.
"Cannot find matching context" / Edits Keep Failing
You ask for a change. The model generates what looks like valid code. Claude Code rejects it. This is the most common issue—GitHub is full of these reports.
The root cause: Claude Code's edit system expects diffs in a specific format. Claude generates them correctly because Anthropic trained it that way. Other models approximate but miss details—wrong line numbers, bad whitespace handling, mismatched context lines.
Quick fixes:
- Ask for smaller edits. "Change the function signature" works better than "refactor this file"
- Say "read the file first" before asking for edits. Forces the model to refresh its view
- Use
/compactbetween major changes to reset context
Proper fix: Use a dedicated code editing layer (like the Morph MCP) that intercepts edit operations and applies them with a purpose-built merge model. Takes raw model output and hits 98% success rate vs 70-80% for direct application.
Sessions Degrade Over Time
Start fresh, everything works. An hour in, suggestions get worse and latency spikes. Happens faster with non-Claude models.
This is context pollution. Claude Code searches files sequentially—each search loads more content into context, even irrelevant matches. The signal-to-noise ratio drops until the model is reasoning over garbage.
- Run
/compactproactively—every 10-15 turns, not just when prompted - Use
/clearwhen switching to unrelated tasks - Be specific in searches: "authentication error handler in auth.ts" beats "where do we handle errors"
Model Just Talks Instead of Acting
You: "Read the config file and update the timeout."
Model: "I would read the config file and then update the timeout to..."
Dead giveaway that tool calling isn't working. The model can't execute operations, so it describes what it would do.
- Verify your model supports function/tool calling (check OpenRouter's model list)
- GPT-4o, Gemini Pro, Claude: work. Qwen, most local models: don't
- If using local models, accept that you're getting a generation-only experience
Authentication/Connection Errors
# Most common issue: conflicting API key
export ANTHROPIC_API_KEY="" # Must be empty, not unset
# Check your variables are set correctly
echo "Base URL: $ANTHROPIC_BASE_URL"
echo "Auth Token: $ANTHROPIC_AUTH_TOKEN"
# Common mistake: trailing slash breaks it
# Wrong: https://openrouter.ai/api/
# Right: https://openrouter.ai/api
# Restart your terminal after changes
source ~/.zshrc # or ~/.bashrcFrequently Asked Questions
Can Claude Code work with any LLM?
Claude Code can connect to any OpenAI-compatible API, but full functionality requires tool/function calling support. Models without tool calling can only provide text responses, not execute file operations or commands.
What's the minimum context window for Claude Code?
Claude Code expects at least 128K tokens for its auto-compacting feature to work properly. With smaller context windows, you'll need to run /compact manually to prevent losing important context.
Will using GPT-4 with Claude Code cost less than Claude?
It depends on the model and provider. GPT-4o is roughly comparable in price to Claude Sonnet. OpenRouter often offers lower prices through volume aggregation. Local models eliminate API costs entirely but require capable hardware.
Why do my edits fail more often with alternative models?
Claude Code's edit system was optimized for Claude's output format. Alternative models may generate diffs with incorrect formatting, wrong line numbers, or missing context. Using a dedicated code editing layer like Morph's FastApply can improve accuracy to 98%.
Can I use Claude Code with a local Ollama model?
Yes, but you'll need a proxy like LiteLLM to translate between Ollama's API format and the Anthropic Messages API that Claude Code expects. Direct connection to Ollama doesn't work because the protocols are different.
What Actually Works (After Testing All This)
We've run Claude Code with probably a dozen different model backends at this point. Here's what we've learned:
Claude is still the best experience. The tool use just works. Edits apply cleanly. If you can afford it and don't have compliance restrictions, stick with Claude.
GPT-4o is the closest alternative. Tool calling works. Edits mostly apply. You'll hit maybe 10% more failures than Claude, which is livable. Cost is comparable.
Everything else is a tradeoff. Gemini's massive context is great for reading large codebases, but its edit success rate is lower. Qwen writes decent code but struggles with complex tool chains. Local models are free but feel like you're back in 2023.
The real killer is edit failures. When edits fail, you enter a retry loop that burns through tokens and time. We measured this: raw model diffs hit about 70-80% accuracy depending on the model. That 20-30% failure rate compounds fast over a session.
Two things help:
- Run /compact aggressively. Don't wait for Claude Code to auto-compact. Do it yourself every 10-15 turns, especially with alternative models that pollute context faster.
- Offload editing to specialized tools. The Morph MCP server intercepts edit operations and handles merging with a purpose-built model. Bumps accuracy to 98% regardless of which model generated the diff. Search goes through WarpGrep, which filters results before they hit your context.
The second option matters more as you scale. Doing 5 edits? Raw model is fine. Doing 50 edits in a session? You want something that doesn't make you manually fix every third one.
Try Morph Sub-Agents with Any Model
FastApply and WarpGrep work with Claude Code regardless of which LLM backend you're using. Improve edit accuracy and reduce context rot.