Ollama runs open-source LLMs on your machine. MCP connects LLMs to external tools. Put them together and your local model can read files, query databases, search codebases, and take actions, all without sending a single token to a cloud provider. The catch: Ollama doesn't natively speak MCP. This guide covers three working approaches to bridge that gap, with code you can run today.
The Architecture
MCP uses a client-server model. An MCP server exposes tools (functions with typed inputs and outputs). An MCP client connects to servers, discovers available tools, and calls them on behalf of an LLM. The protocol uses JSON-RPC 2.0 over stdio or HTTP.
Ollama is not an MCP client. It is an inference server that runs models and exposes a chat API with tool calling support. To connect the two, you need a bridge layer that does three things:
Discover MCP tools
Connect to MCP servers, call list_tools(), and get the schema for every available tool (name, description, parameters).
Translate to Ollama format
Convert MCP tool schemas to the OpenAI-compatible tools format that Ollama understands. Pass them in the tools parameter of the chat API.
Execute and loop
When Ollama returns a tool_call, extract the tool name and arguments, call the MCP server via call_tool(), feed the result back to Ollama, and repeat until the model is done.
The Ollama + MCP data flow
User prompt
→ Bridge sends prompt + MCP tool schemas to Ollama
→ Ollama returns tool_call (e.g. "read_file", {"path": "src/index.ts"})
→ Bridge calls MCP server: session.call_tool("read_file", {"path": "src/index.ts"})
→ MCP server reads the file, returns contents
→ Bridge sends file contents back to Ollama as tool result
→ Ollama generates final response using the file contents
→ Bridge displays response to userEvery approach in this guide implements this loop differently. MCPHost does it in Go. The bridge does it in TypeScript. The Python approach gives you full control over each step.
Prerequisites
All three methods assume you have Ollama installed and at least one model pulled that supports tool calling.
Install Ollama and pull a model
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model with tool calling support
ollama pull qwen3:14b
# Verify it's running
curl http://localhost:11434/api/tagsTool calling is required
Not every Ollama model supports tool calling. If the model ignores your tools and responds with plain text, it does not support the tool calling format. Check the Ollama tools category for compatible models.
Method 1: MCPHost (Go CLI)
MCPHost is a Go CLI that connects any Ollama model to any MCP server. It handles tool discovery, schema translation, and the call-execute-respond loop. Fastest path from zero to working.
Install MCPHost
# Requires Go 1.22+
go install github.com/mark3labs/mcphost@latestCreate a server config (mcp-servers.json)
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/you/projects"
]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_your_token_here"
}
}
}
}Run MCPHost with Ollama
# Start an interactive session
mcphost -m ollama:qwen3:14b --config mcp-servers.json
# MCPHost will:
# 1. Start the MCP servers listed in config
# 2. Discover all available tools
# 3. Send them to Ollama with each prompt
# 4. Execute tool calls and feed results back
# Example prompt in the session:
> List all TypeScript files in src/ and summarize what each one doesMCPHost supports Ollama, OpenAI, Anthropic, and Google Gemini models through the same interface. Switching models is a flag change. The MCP server config stays the same.
Environment variables
If your Ollama instance runs on a different host, set OLLAMA_HOST before running MCPHost. Example: OLLAMA_HOST=http://192.168.1.100:11434 mcphost -m ollama:qwen3:14b --config servers.json
Method 2: ollama-mcp-bridge (TypeScript)
The ollama-mcp-bridge is a TypeScript application that connects Ollama to multiple MCP servers. It gives you more configurability than MCPHost, including model parameters, system prompts, and per-server settings.
Install and configure
git clone https://github.com/patruff/ollama-mcp-bridge
cd ollama-mcp-bridge
npm install
# Copy the template config
cp bridge_config.json.template bridge_config.jsonbridge_config.json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"]
},
"brave-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": {
"BRAVE_API_KEY": "your_brave_api_key"
}
}
},
"llm": {
"model": "qwen3:14b",
"baseUrl": "http://localhost:11434",
"temperature": 0.1,
"maxTokens": 4096
}
}Run the bridge
npm start
# The bridge starts all MCP servers, connects to Ollama,
# and opens an interactive chat session.
# Tool calls are handled automatically.Low temperature matters
Set temperature to 0.1 or lower for tool calling. Higher temperatures cause models to hallucinate tool names or produce malformed JSON arguments. This is the most common source of "MCP tools not working with Ollama" issues.
Method 3: Python MCP SDK + Ollama
Building your own client gives you full control. The MCP Python SDK handles server communication. The Ollama Python library handles inference. You write the loop that connects them. This is the right approach when you need custom behavior: filtering tools, transforming results, adding logging, or integrating into a larger application.
Install dependencies
pip install "mcp[cli]" ollamaollama_mcp_client.py
import asyncio
import json
import ollama
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
def mcp_tools_to_ollama(mcp_tools) -> list[dict]:
"""Convert MCP tool schemas to Ollama/OpenAI format."""
ollama_tools = []
for tool in mcp_tools:
ollama_tools.append({
"type": "function",
"function": {
"name": tool.name,
"description": tool.description or "",
"parameters": tool.inputSchema,
},
})
return ollama_tools
async def run(model: str, server_command: str, server_args: list[str]):
server_params = StdioServerParameters(
command=server_command,
args=server_args,
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Discover tools
tools_response = await session.list_tools()
tools = mcp_tools_to_ollama(tools_response.tools)
print(f"Connected. {len(tools)} tools available.")
messages = []
while True:
user_input = input("\nYou: ")
if user_input.lower() in ("quit", "exit"):
break
messages.append({"role": "user", "content": user_input})
# Call Ollama with tools
response = ollama.chat(
model=model,
messages=messages,
tools=tools,
options={"temperature": 0.1},
)
# Process tool calls in a loop
while response.message.tool_calls:
for tool_call in response.message.tool_calls:
name = tool_call.function.name
args = tool_call.function.arguments
print(f" → Calling tool: {name}({json.dumps(args)})")
result = await session.call_tool(name, arguments=args)
text = result.content[0].text if result.content else ""
messages.append(response.message)
messages.append({
"role": "tool",
"content": text,
})
# Let the model process tool results
response = ollama.chat(
model=model,
messages=messages,
tools=tools,
options={"temperature": 0.1},
)
print(f"\nAssistant: {response.message.content}")
messages.append(response.message)
if __name__ == "__main__":
asyncio.run(run(
model="qwen3:14b",
server_command="npx",
server_args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
))This is roughly 60 lines. The key function is mcp_tools_to_ollama, which converts MCP tool schemas (with inputSchema) to the OpenAI-compatible format Ollama expects. The loop handles multi-step tool calling: if the model calls a tool, the result goes back and the model can call another tool or generate a final response.
Run it
python ollama_mcp_client.py
# Connected. 11 tools available.
#
# You: What files are in /tmp?
# → Calling tool: list_directory({"path": "/tmp"})
#
# Assistant: Here are the files in /tmp:
# - session.log (2.1 KB)
# - build-output/ (directory)
# - notes.txt (340 bytes)
# ...Multiple servers
To connect to multiple MCP servers, open multiple stdio_client sessions and merge their tool lists. Give each tool a prefix (e.g., fs_read_file,gh_search_code) to avoid name collisions. Route tool calls to the correct session based on the prefix.
Which Models Work
MCP tool use depends entirely on the model's ability to generate correct tool calls: valid JSON, correct parameter names, appropriate tool selection. Not all models are equal here. Smaller models (<7B) frequently hallucinate tool names or produce malformed arguments.
| Model | Size | Tool Calling Reliability | Notes |
|---|---|---|---|
| Qwen 3 | 8B / 14B / 32B | Excellent | Most reliable tool calling in its class. Rarely hallucinates parameters. 32B recommended for complex multi-tool workflows. |
| Llama 3.3 | 70B | Excellent | Strong tool calling, large context (128K). Requires significant RAM (40GB+ for quantized). |
| Gemma 4 | 12B / 27B | Good | Native function calling with structured JSON output. Released April 2026. |
| Mistral / Mistral Large | 7B / 123B | Good | Solid tool calling. Mistral 7B is the smallest model with usable reliability. |
| Hermes 3 | 8B / 70B | Good | Fine-tuned for structured output and tool use. Good at following tool schemas. |
| GLM-4 | 9B | Good | Decent tool calling for its size. Chinese and English bilingual. |
| Llama 3.1 | 8B / 70B | Fair | Works but less reliable than Qwen 3 at equivalent sizes. 70B is solid. |
| Phi-4 | 14B | Fair | Supports tool calling but can struggle with complex tool schemas. |
The community consensus: start at 14B parameters minimum for anything beyond single-tool workflows. 32B+ for reliable multi-step tool use. Qwen 3 at 14B or 32B is the current sweet spot for tool calling quality per compute dollar.
MCP Servers Worth Using
MCP servers are model-agnostic. Anything that works with Claude Desktop works with Ollama through a bridge. These are the servers most useful for local development workflows.
Filesystem
Read, write, search, and manage files. Sandboxed to specific directories. The most fundamental MCP server. Install: npx @modelcontextprotocol/server-filesystem /path/to/dir
GitHub
Search code, read files, list issues, browse PRs. Useful for cross-repo exploration without cloning. Install: npx @modelcontextprotocol/server-github
Git
Local repo operations: branches, diffs, commits, log. Safer than giving the model raw shell access. Install: npx @modelcontextprotocol/server-git
Brave Search
Web search from your local model. Returns structured results. Requires a free Brave Search API key. Install: npx @modelcontextprotocol/server-brave-search
PostgreSQL / SQLite
Query your database directly. The model writes SQL, the server executes it and returns results. Read-only mode available. Community servers on npm.
Morph MCP (@morphllm/morphmcp)
WarpGrep codebase search and Fast Apply code editing. Search code across repos without embeddings. Apply LLM-generated edits at 10,500+ tok/s. Works with any MCP client.
MCPHost config with multiple servers
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
},
"git": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-git", "--repository", "."]
},
"brave-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": { "BRAVE_API_KEY": "BSA_your_key" }
},
"morph": {
"command": "npx",
"args": ["-y", "@morphllm/morphmcp@latest"],
"env": { "MORPH_API_KEY": "sk-morph-..." }
}
}
}Performance Tradeoffs
Running inference locally changes the performance profile. Tool execution speed stays the same (tools run on your machine regardless of where the LLM runs). But inference speed, context window size, and tool calling reliability all shift.
| Ollama (Local) | Cloud API (e.g. Claude, GPT-4) | |
|---|---|---|
| Inference speed | 30-60 tok/s (7B on M-series Mac), 10-20 tok/s (70B quantized) | 80-200+ tok/s depending on provider |
| Tool call latency | Same (tools run locally) | Same (tools run locally) |
| Network latency | Zero (everything local) | 50-200ms round trip to LLM provider per turn |
| Context window | 8K-128K depending on model and RAM | 128K-1M depending on provider |
| Tool calling reliability | Good with 14B+ models, excellent with 32B+ | Excellent (frontier models) |
| Privacy | Complete. Nothing leaves your machine. | Data sent to provider. Check their retention policies. |
| Cost | Electricity only. No per-token charges. | $3-15 per million input tokens depending on model |
| Setup complexity | Moderate. Need bridge layer, model selection, RAM considerations. | Low. API key and go. |
The total latency for a tool-calling turn is: inference time + tool execution time. If the model takes 2 seconds to decide which tool to call and the tool takes 500ms to execute, local and cloud are closer than the raw tok/s numbers suggest. The gap widens for tasks that require long generated responses (summaries, code generation) and narrows for tasks that are mostly tool calls (file browsing, database queries).
RAM planning
A 7B model needs roughly 4-6GB RAM. A 14B model needs 8-12GB. A 32B model needs 18-24GB. A 70B model (quantized to Q4) needs 40GB+. Each active MCP server adds 50-200MB. Plan your model choice around available system memory.
When to Use Cloud APIs Instead
Local LLMs with MCP are not universally better than cloud APIs. Each has a clear advantage zone.
Use Ollama + MCP when...
Data cannot leave your network. You need predictable costs (no per-token billing). You're running simple tool workflows (1-3 tools per turn). Latency to the cloud provider is high. You're experimenting and iterating on prompts.
Use cloud APIs when...
Task requires frontier model reasoning (complex multi-step planning). You need large context windows (100K+ tokens). Tool calling needs to be near-perfect (production pipelines). You're working with many concurrent users. You don't have a GPU or M-series Mac.
The hybrid approach works well: use Ollama for development and testing (fast iteration, no cost), then deploy with a cloud API for production (reliability, speed). Since MCP server configs are the same for both, switching is a model flag change.
Same MCP config, different model
# Local development
mcphost -m ollama:qwen3:14b --config mcp-servers.json
# Production (switch to cloud)
mcphost -m anthropic:claude-sonnet-4-20250514 --config mcp-servers.json
mcphost -m openai:gpt-4o --config mcp-servers.jsonFrequently Asked Questions
Does Ollama natively support MCP?
No. As of April 2026, Ollama does not have built-in MCP support. The feature request (GitHub issue #7865) is still open. You need a bridge: MCPHost, ollama-mcp-bridge, or a custom client built with the MCP SDK.
Which Ollama models work with MCP tool calling?
Qwen 3 (all sizes), Llama 3.1/3.3, Mistral, Hermes 3, Gemma 4, and GLM-4 all support tool calling. For MCP use, 14B+ parameters is the practical minimum. Qwen 3 32B is the most reliable option for complex multi-tool workflows.
What is the easiest way to connect Ollama to MCP servers?
MCPHost. Install Go, run go install github.com/mark3labs/mcphost@latest, write a JSON config listing your MCP servers, and run mcphost -m ollama:qwen3 --config servers.json. Working setup in under 5 minutes.
Can Ollama use the same MCP servers as Claude Desktop?
Yes. MCP servers are model-agnostic. The filesystem, GitHub, Brave Search, and database servers that work with Claude Desktop work identically with Ollama through any bridge. The protocol is the same; only the client changes.
Is Ollama + MCP slower than cloud APIs with MCP?
Inference is slower on consumer hardware (30-60 tok/s for 7B vs 100+ tok/s from cloud APIs). But tool execution time is identical and you eliminate network latency. For tool-heavy workflows where inference is a small fraction of total time, local performance can approach cloud.
Can I use Ollama's OpenAI-compatible API with MCP clients?
Yes. Ollama exposes http://localhost:11434/v1/chat/completions with tool support. Any MCP client built on the OpenAI SDK works by setting base_url to http://localhost:11434/v1 and using "ollama" as the API key.
What MCP servers work best with Ollama?
Filesystem, Git, GitHub, and Brave Search are the most useful for development workflows. Avoid servers that return very large responses, as local models have smaller effective context windows. Start with @modelcontextprotocol/server-filesystem and add from there.
How does Ollama MCP compare to Morph's MCP tools?
Different problems. Ollama + MCP gives you local, private inference with tool access. Morph's MCP tools (WarpGrep for code search, Fast Apply for edits, Sandbox for execution) are specialized coding subagents. You can use both: point MCPHost at Ollama for inference and include @morphllm/morphmcp in your server config for code search and editing.
MCP Tools Built for Coding Agents
Morph's MCP server gives any LLM (local or cloud) access to WarpGrep codebase search and Fast Apply code editing. No embeddings, no indexing. Works with Ollama, Claude, GPT-4, or any MCP-compatible client.