How to Use a Different LLM with Claude Code: Complete 2025 Guide

Run Claude Code with GPT-4, Gemini, Qwen, DeepSeek, or local models. Step-by-step setup for LiteLLM, OpenRouter, and direct API configuration.

December 30, 2025 · 1 min read
Claude Code connected to multiple LLM providers including OpenRouter, LiteLLM, and local models

Quick Answer

TL;DR

Set two environment variables to connect Claude Code to any OpenAI-compatible API:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""

This works with OpenRouter (200+ models), LiteLLM (any provider), and Z.AI. For local models like Ollama, you need LiteLLM as a translation layer since they use different protocols.

Why Developers Switch to Different Models

Claude Code ships locked to Anthropic's API. For most people, that's fine. But you'll want an escape hatch when:

  • Rate limits hit hard — Anthropic's per-minute caps are real. During crunch time, you'll hit them. One team we talked to started routing overflow to GPT-4o just to keep shipping.
  • Budget constraints — Claude Sonnet runs about $3/million input tokens. GPT-4o-mini is $0.15. For bulk refactoring or test generation, that 20x difference matters.
  • Air-gapped environments — Defense contractors and healthcare orgs often can't send code to external APIs. Period.
  • Curious about alternatives — Qwen3-Coder is surprisingly good at Python. Gemini's 1M context handles massive monorepos. You won't know until you try.

A setup we've seen work: GPT-4o for architecture discussions and planning (it's good at that), Claude for implementation (still the best at tool use), and a local 14B model for quick questions while offline. The proxy layer makes switching trivial.

Fair Warning

Claude Code was purpose-built for Claude models. According to Anthropic's engineering blog, the agent relies heavily on tool calling patterns that Claude handles natively. Other models can struggle with:

  • Multi-step tool chains (especially the "read file → think → edit file" loop)
  • Generating diffs that actually apply cleanly
  • Extended thinking and artifacts (Opus-only features anyway)

That said, GPT-4o and Gemini Pro are solid alternatives for most workflows.

Method 1: Direct API Configuration

The simplest approach works with providers that offer Anthropic-compatible endpoints. No proxy needed.

Providers with Native Anthropic API Support

ProviderBase URLNotes
OpenRouterhttps://openrouter.ai/api200+ models, Anthropic skin built-in
Z.AI (GLM)https://api.z.ai/api/anthropicChinese models, global access
Amazon BedrockVia LiteLLM proxyEnterprise, requires IAM setup
Azure OpenAIVia LiteLLM proxyEnterprise, requires configuration

Step-by-Step Setup

1. Get your API key from your chosen provider.

2. Set environment variables in your terminal:

Terminal Configuration

# For OpenRouter
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"

# Prevent Claude Code from trying Anthropic auth
export ANTHROPIC_API_KEY=""

3. Make it permanent by adding to your shell profile:

Shell Profile (~/.zshrc)

# Add to ~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""

4. Verify the connection:

# Restart your terminal, then run Claude Code
claude

# Inside Claude Code, check status
/status

Common Mistake

Don't put these in a project .env file. Claude Code doesn't read standard .env files. Use your shell profile or set them in the terminal directly.

Method 2: LiteLLM Proxy

LiteLLM acts as a universal translator between Claude Code and any LLM provider. It handles the API format conversion automatically.

Use this method when:

  • Your provider only offers OpenAI-format APIs
  • You want to route between multiple providers with fallbacks
  • You need usage tracking and cost controls
  • You're connecting to self-hosted models

Quick Setup

LiteLLM Installation

# Install LiteLLM
pip install litellm

# Create config file
cat > litellm_config.yaml << 'EOF'
model_list:
  - model_name: claude-sonnet-4-5-20250929
    litellm_params:
      model: openrouter/anthropic/claude-sonnet-4.5
      api_key: sk-or-v1-your-key

  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-your-openai-key

  - model_name: qwen3-coder
    litellm_params:
      model: openrouter/qwen/qwen3-coder
      api_key: sk-or-v1-your-key
EOF

# Start the proxy
litellm --config litellm_config.yaml --port 4000

Connect Claude Code to LiteLLM

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="sk-1234"  # Any value works locally
export ANTHROPIC_MODEL="claude-sonnet-4-5-20250929"  # Or gpt-4o, qwen3-coder

# Now run Claude Code
claude

The proxy translates requests on the fly. Claude Code thinks it's talking to Anthropic, but your requests go to whatever model you configured.

Shell Aliases for Fast Switching

Quick Model Switching

# Add to ~/.zshrc or ~/.bashrc
alias claude-gpt='ANTHROPIC_MODEL=gpt-4o claude'
alias claude-qwen='ANTHROPIC_MODEL=qwen3-coder claude'
alias claude-local='ANTHROPIC_MODEL=ollama/codellama claude'

# Usage
claude-gpt   # Starts Claude Code with GPT-4o
claude-qwen  # Starts Claude Code with Qwen3-Coder

Method 3: OpenRouter Direct Connection

OpenRouter provides an "Anthropic skin" that speaks Claude Code's native protocol. No proxy needed, direct connection with access to 200+ models.

Setup

# Get your key from openrouter.ai/keys
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""

# Start Claude Code
claude

Selecting Models

Use the /model command inside Claude Code to switch between OpenRouter models:

# Inside Claude Code
/model anthropic/claude-sonnet-4    # Claude Sonnet
/model openai/gpt-4o                 # GPT-4o
/model google/gemini-2.0-flash       # Gemini 2.0 Flash
/model qwen/qwen3-coder              # Qwen3-Coder

OpenRouter Pricing

OpenRouter often beats direct API pricing through volume aggregation. Check openrouter.ai/models for current rates. Some models are free for limited use.

Method 4: anyclaude Wrapper

anyclaude is a drop-in wrapper from Coder that makes multi-provider switching dead simple. No proxy server, no config files—just environment variables.

anyclaude Setup

# Install with your package manager
bun add -g anyclaude  # or npm, pnpm

# Set your keys
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
export XAI_API_KEY="..."

# Run with any model using prefixes
anyclaude --model openai/gpt-4o
anyclaude --model google/gemini-2.0-flash
anyclaude --model xai/grok-2

The prefix tells anyclaude which provider to route to. It handles the API translation under the hood. For OpenRouter models, set OPENAI_API_URL to their endpoint.

This is probably the fastest way to test different models without committing to a full proxy setup. The tradeoff: less flexibility than LiteLLM for complex routing rules.

Method 5: Local LLMs with Ollama

Running models locally eliminates API costs and keeps your code private. The tradeoff: you need decent hardware and most local models don't support tool calling well.

Hardware Requirements

Model SizeVRAM NeededExample Models
7B parameters8GBCodeLlama-7B, Qwen2.5-Coder-7B
14B parameters16GBQwen2.5-Coder-14B
32B parameters24GBQwen2.5-Coder-32B
70B+ parameters48GB+CodeLlama-70B, DeepSeek-Coder-V2

Setup with Ollama + LiteLLM

Local Model Setup

# 1. Install and start Ollama
brew install ollama  # macOS
ollama serve

# 2. Pull a coding model
ollama pull qwen2.5-coder:14b

# 3. Configure LiteLLM to use Ollama
cat > litellm_config.yaml << 'EOF'
model_list:
  - model_name: claude-sonnet-4-5-20250929
    litellm_params:
      model: ollama/qwen2.5-coder:14b
      api_base: http://localhost:11434
EOF

# 4. Start proxy
litellm --config litellm_config.yaml --port 4000

# 5. Connect Claude Code
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="local"
claude

Tool Calling Limitations

Most local models have weak or no tool calling support. This means Claude Code can generate code suggestions but can't automatically read files, run commands, or apply edits. You'll need to copy-paste more manually.

Model Compatibility Matrix

Claude Code's full feature set requires tool calling, large context windows, and reliable instruction following. Here's how popular models stack up:

ModelTool CallingContextEdit AccuracyBest For
Claude Sonnet 4Excellent200K98%Full Claude Code experience
GPT-4oGood128K90%Planning, complex reasoning
GPT-4o-miniGood128K85%Cost-efficient tasks
Gemini 2.0 FlashGood1M88%Large codebase analysis
Qwen3-CoderLimited128K82%Code generation only
DeepSeek-V3Limited128K80%Explanations, generation
Local 7B modelsPoor8-32K60%Quick prototyping only

"Edit Accuracy" measures how often the model's suggested changes apply cleanly without manual intervention. Lower accuracy means more failed edits and retry loops.

Fixing Common Problems

These come up constantly when troubleshooting alternative model setups. Most are fixable in under a minute.

"Cannot find matching context" / Edits Keep Failing

You ask for a change. The model generates what looks like valid code. Claude Code rejects it. This is the most common issue—GitHub is full of these reports.

The root cause: Claude Code's edit system expects diffs in a specific format. Claude generates them correctly because Anthropic trained it that way. Other models approximate but miss details—wrong line numbers, bad whitespace handling, mismatched context lines.

Quick fixes:

  • Ask for smaller edits. "Change the function signature" works better than "refactor this file"
  • Say "read the file first" before asking for edits. Forces the model to refresh its view
  • Use /compact between major changes to reset context

Proper fix: Use a dedicated code editing layer (like the Morph MCP) that intercepts edit operations and applies them with a purpose-built merge model. Takes raw model output and hits 98% success rate vs 70-80% for direct application.

Sessions Degrade Over Time

Start fresh, everything works. An hour in, suggestions get worse and latency spikes. Happens faster with non-Claude models.

This is context pollution. Claude Code searches files sequentially—each search loads more content into context, even irrelevant matches. The signal-to-noise ratio drops until the model is reasoning over garbage.

  • Run /compact proactively—every 10-15 turns, not just when prompted
  • Use /clear when switching to unrelated tasks
  • Be specific in searches: "authentication error handler in auth.ts" beats "where do we handle errors"

Model Just Talks Instead of Acting

You: "Read the config file and update the timeout."
Model: "I would read the config file and then update the timeout to..."

Dead giveaway that tool calling isn't working. The model can't execute operations, so it describes what it would do.

  • Verify your model supports function/tool calling (check OpenRouter's model list)
  • GPT-4o, Gemini Pro, Claude: work. Qwen, most local models: don't
  • If using local models, accept that you're getting a generation-only experience

Authentication/Connection Errors

# Most common issue: conflicting API key
export ANTHROPIC_API_KEY=""  # Must be empty, not unset

# Check your variables are set correctly
echo "Base URL: $ANTHROPIC_BASE_URL"
echo "Auth Token: $ANTHROPIC_AUTH_TOKEN"

# Common mistake: trailing slash breaks it
# Wrong: https://openrouter.ai/api/
# Right: https://openrouter.ai/api

# Restart your terminal after changes
source ~/.zshrc  # or ~/.bashrc

Frequently Asked Questions

Can Claude Code work with any LLM?

Claude Code can connect to any OpenAI-compatible API, but full functionality requires tool/function calling support. Models without tool calling can only provide text responses, not execute file operations or commands.

What's the minimum context window for Claude Code?

Claude Code expects at least 128K tokens for its auto-compacting feature to work properly. With smaller context windows, you'll need to run /compact manually to prevent losing important context.

Will using GPT-4 with Claude Code cost less than Claude?

It depends on the model and provider. GPT-4o is roughly comparable in price to Claude Sonnet. OpenRouter often offers lower prices through volume aggregation. Local models eliminate API costs entirely but require capable hardware.

Why do my edits fail more often with alternative models?

Claude Code's edit system was optimized for Claude's output format. Alternative models may generate diffs with incorrect formatting, wrong line numbers, or missing context. Using a dedicated code editing layer like Morph's FastApply can improve accuracy to 98%.

Can I use Claude Code with a local Ollama model?

Yes, but you'll need a proxy like LiteLLM to translate between Ollama's API format and the Anthropic Messages API that Claude Code expects. Direct connection to Ollama doesn't work because the protocols are different.

What Actually Works (After Testing All This)

We've run Claude Code with probably a dozen different model backends at this point. Here's what we've learned:

Claude is still the best experience. The tool use just works. Edits apply cleanly. If you can afford it and don't have compliance restrictions, stick with Claude.

GPT-4o is the closest alternative. Tool calling works. Edits mostly apply. You'll hit maybe 10% more failures than Claude, which is livable. Cost is comparable.

Everything else is a tradeoff. Gemini's massive context is great for reading large codebases, but its edit success rate is lower. Qwen writes decent code but struggles with complex tool chains. Local models are free but feel like you're back in 2023.

The real killer is edit failures. When edits fail, you enter a retry loop that burns through tokens and time. We measured this: raw model diffs hit about 70-80% accuracy depending on the model. That 20-30% failure rate compounds fast over a session.

Two things help:

  1. Run /compact aggressively. Don't wait for Claude Code to auto-compact. Do it yourself every 10-15 turns, especially with alternative models that pollute context faster.
  2. Offload editing to specialized tools. The Morph MCP server intercepts edit operations and handles merging with a purpose-built model. Bumps accuracy to 98% regardless of which model generated the diff. Search goes through WarpGrep, which filters results before they hit your context.

The second option matters more as you scale. Doing 5 edits? Raw model is fine. Doing 50 edits in a session? You want something that doesn't make you manually fix every third one.

Try Morph Sub-Agents with Any Model

FastApply and WarpGrep work with Claude Code regardless of which LLM backend you're using. Improve edit accuracy and reduce context rot.