Ollama MCP: How to Connect Local LLMs to Any MCP Server

Connect Ollama models to MCP servers for local, private AI tool use. Setup guides for MCPHost, ollama-mcp-bridge, and the Python MCP SDK with working code examples. Covers which models support tool calling, performance tradeoffs, and when cloud APIs still win.

April 5, 2026 · 1 min read

Ollama runs open-source LLMs on your machine. MCP connects LLMs to external tools. Put them together and your local model can read files, query databases, search codebases, and take actions, all without sending a single token to a cloud provider. The catch: Ollama doesn't natively speak MCP. This guide covers three working approaches to bridge that gap, with code you can run today.

The Architecture

MCP uses a client-server model. An MCP server exposes tools (functions with typed inputs and outputs). An MCP client connects to servers, discovers available tools, and calls them on behalf of an LLM. The protocol uses JSON-RPC 2.0 over stdio or HTTP.

Ollama is not an MCP client. It is an inference server that runs models and exposes a chat API with tool calling support. To connect the two, you need a bridge layer that does three things:

Discover MCP tools

Connect to MCP servers, call list_tools(), and get the schema for every available tool (name, description, parameters).

Translate to Ollama format

Convert MCP tool schemas to the OpenAI-compatible tools format that Ollama understands. Pass them in the tools parameter of the chat API.

Execute and loop

When Ollama returns a tool_call, extract the tool name and arguments, call the MCP server via call_tool(), feed the result back to Ollama, and repeat until the model is done.

The Ollama + MCP data flow

User prompt
  → Bridge sends prompt + MCP tool schemas to Ollama
    → Ollama returns tool_call (e.g. "read_file", {"path": "src/index.ts"})
      → Bridge calls MCP server: session.call_tool("read_file", {"path": "src/index.ts"})
        → MCP server reads the file, returns contents
      → Bridge sends file contents back to Ollama as tool result
    → Ollama generates final response using the file contents
  → Bridge displays response to user

Every approach in this guide implements this loop differently. MCPHost does it in Go. The bridge does it in TypeScript. The Python approach gives you full control over each step.

Prerequisites

All three methods assume you have Ollama installed and at least one model pulled that supports tool calling.

Install Ollama and pull a model

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model with tool calling support
ollama pull qwen3:14b

# Verify it's running
curl http://localhost:11434/api/tags

Tool calling is required

Not every Ollama model supports tool calling. If the model ignores your tools and responds with plain text, it does not support the tool calling format. Check the Ollama tools category for compatible models.

14B+
Recommended model size
11434
Default Ollama port
stdio
MCP transport (most common)
JSON-RPC
MCP message format

Method 1: MCPHost (Go CLI)

MCPHost is a Go CLI that connects any Ollama model to any MCP server. It handles tool discovery, schema translation, and the call-execute-respond loop. Fastest path from zero to working.

Install MCPHost

# Requires Go 1.22+
go install github.com/mark3labs/mcphost@latest

Create a server config (mcp-servers.json)

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/you/projects"
      ]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_your_token_here"
      }
    }
  }
}

Run MCPHost with Ollama

# Start an interactive session
mcphost -m ollama:qwen3:14b --config mcp-servers.json

# MCPHost will:
# 1. Start the MCP servers listed in config
# 2. Discover all available tools
# 3. Send them to Ollama with each prompt
# 4. Execute tool calls and feed results back

# Example prompt in the session:
> List all TypeScript files in src/ and summarize what each one does

MCPHost supports Ollama, OpenAI, Anthropic, and Google Gemini models through the same interface. Switching models is a flag change. The MCP server config stays the same.

Environment variables

If your Ollama instance runs on a different host, set OLLAMA_HOST before running MCPHost. Example: OLLAMA_HOST=http://192.168.1.100:11434 mcphost -m ollama:qwen3:14b --config servers.json

Method 2: ollama-mcp-bridge (TypeScript)

The ollama-mcp-bridge is a TypeScript application that connects Ollama to multiple MCP servers. It gives you more configurability than MCPHost, including model parameters, system prompts, and per-server settings.

Install and configure

git clone https://github.com/patruff/ollama-mcp-bridge
cd ollama-mcp-bridge
npm install

# Copy the template config
cp bridge_config.json.template bridge_config.json

bridge_config.json

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"]
    },
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-brave-search"],
      "env": {
        "BRAVE_API_KEY": "your_brave_api_key"
      }
    }
  },
  "llm": {
    "model": "qwen3:14b",
    "baseUrl": "http://localhost:11434",
    "temperature": 0.1,
    "maxTokens": 4096
  }
}

Run the bridge

npm start

# The bridge starts all MCP servers, connects to Ollama,
# and opens an interactive chat session.
# Tool calls are handled automatically.

Low temperature matters

Set temperature to 0.1 or lower for tool calling. Higher temperatures cause models to hallucinate tool names or produce malformed JSON arguments. This is the most common source of "MCP tools not working with Ollama" issues.

Method 3: Python MCP SDK + Ollama

Building your own client gives you full control. The MCP Python SDK handles server communication. The Ollama Python library handles inference. You write the loop that connects them. This is the right approach when you need custom behavior: filtering tools, transforming results, adding logging, or integrating into a larger application.

Install dependencies

pip install "mcp[cli]" ollama

ollama_mcp_client.py

import asyncio
import json
import ollama
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client


def mcp_tools_to_ollama(mcp_tools) -> list[dict]:
    """Convert MCP tool schemas to Ollama/OpenAI format."""
    ollama_tools = []
    for tool in mcp_tools:
        ollama_tools.append({
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description or "",
                "parameters": tool.inputSchema,
            },
        })
    return ollama_tools


async def run(model: str, server_command: str, server_args: list[str]):
    server_params = StdioServerParameters(
        command=server_command,
        args=server_args,
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Discover tools
            tools_response = await session.list_tools()
            tools = mcp_tools_to_ollama(tools_response.tools)
            print(f"Connected. {len(tools)} tools available.")

            messages = []

            while True:
                user_input = input("\nYou: ")
                if user_input.lower() in ("quit", "exit"):
                    break

                messages.append({"role": "user", "content": user_input})

                # Call Ollama with tools
                response = ollama.chat(
                    model=model,
                    messages=messages,
                    tools=tools,
                    options={"temperature": 0.1},
                )

                # Process tool calls in a loop
                while response.message.tool_calls:
                    for tool_call in response.message.tool_calls:
                        name = tool_call.function.name
                        args = tool_call.function.arguments
                        print(f"  → Calling tool: {name}({json.dumps(args)})")

                        result = await session.call_tool(name, arguments=args)
                        text = result.content[0].text if result.content else ""

                        messages.append(response.message)
                        messages.append({
                            "role": "tool",
                            "content": text,
                        })

                    # Let the model process tool results
                    response = ollama.chat(
                        model=model,
                        messages=messages,
                        tools=tools,
                        options={"temperature": 0.1},
                    )

                print(f"\nAssistant: {response.message.content}")
                messages.append(response.message)


if __name__ == "__main__":
    asyncio.run(run(
        model="qwen3:14b",
        server_command="npx",
        server_args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
    ))

This is roughly 60 lines. The key function is mcp_tools_to_ollama, which converts MCP tool schemas (with inputSchema) to the OpenAI-compatible format Ollama expects. The loop handles multi-step tool calling: if the model calls a tool, the result goes back and the model can call another tool or generate a final response.

Run it

python ollama_mcp_client.py

# Connected. 11 tools available.
#
# You: What files are in /tmp?
#   → Calling tool: list_directory({"path": "/tmp"})
#
# Assistant: Here are the files in /tmp:
# - session.log (2.1 KB)
# - build-output/ (directory)
# - notes.txt (340 bytes)
# ...

Multiple servers

To connect to multiple MCP servers, open multiple stdio_client sessions and merge their tool lists. Give each tool a prefix (e.g., fs_read_file,gh_search_code) to avoid name collisions. Route tool calls to the correct session based on the prefix.

Which Models Work

MCP tool use depends entirely on the model's ability to generate correct tool calls: valid JSON, correct parameter names, appropriate tool selection. Not all models are equal here. Smaller models (<7B) frequently hallucinate tool names or produce malformed arguments.

ModelSizeTool Calling ReliabilityNotes
Qwen 38B / 14B / 32BExcellentMost reliable tool calling in its class. Rarely hallucinates parameters. 32B recommended for complex multi-tool workflows.
Llama 3.370BExcellentStrong tool calling, large context (128K). Requires significant RAM (40GB+ for quantized).
Gemma 412B / 27BGoodNative function calling with structured JSON output. Released April 2026.
Mistral / Mistral Large7B / 123BGoodSolid tool calling. Mistral 7B is the smallest model with usable reliability.
Hermes 38B / 70BGoodFine-tuned for structured output and tool use. Good at following tool schemas.
GLM-49BGoodDecent tool calling for its size. Chinese and English bilingual.
Llama 3.18B / 70BFairWorks but less reliable than Qwen 3 at equivalent sizes. 70B is solid.
Phi-414BFairSupports tool calling but can struggle with complex tool schemas.

The community consensus: start at 14B parameters minimum for anything beyond single-tool workflows. 32B+ for reliable multi-step tool use. Qwen 3 at 14B or 32B is the current sweet spot for tool calling quality per compute dollar.

MCP Servers Worth Using

MCP servers are model-agnostic. Anything that works with Claude Desktop works with Ollama through a bridge. These are the servers most useful for local development workflows.

Filesystem

Read, write, search, and manage files. Sandboxed to specific directories. The most fundamental MCP server. Install: npx @modelcontextprotocol/server-filesystem /path/to/dir

GitHub

Search code, read files, list issues, browse PRs. Useful for cross-repo exploration without cloning. Install: npx @modelcontextprotocol/server-github

Git

Local repo operations: branches, diffs, commits, log. Safer than giving the model raw shell access. Install: npx @modelcontextprotocol/server-git

Brave Search

Web search from your local model. Returns structured results. Requires a free Brave Search API key. Install: npx @modelcontextprotocol/server-brave-search

PostgreSQL / SQLite

Query your database directly. The model writes SQL, the server executes it and returns results. Read-only mode available. Community servers on npm.

Morph MCP (@morphllm/morphmcp)

WarpGrep codebase search and Fast Apply code editing. Search code across repos without embeddings. Apply LLM-generated edits at 10,500+ tok/s. Works with any MCP client.

MCPHost config with multiple servers

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
    },
    "git": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-git", "--repository", "."]
    },
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-brave-search"],
      "env": { "BRAVE_API_KEY": "BSA_your_key" }
    },
    "morph": {
      "command": "npx",
      "args": ["-y", "@morphllm/morphmcp@latest"],
      "env": { "MORPH_API_KEY": "sk-morph-..." }
    }
  }
}

Performance Tradeoffs

Running inference locally changes the performance profile. Tool execution speed stays the same (tools run on your machine regardless of where the LLM runs). But inference speed, context window size, and tool calling reliability all shift.

Ollama (Local)Cloud API (e.g. Claude, GPT-4)
Inference speed30-60 tok/s (7B on M-series Mac), 10-20 tok/s (70B quantized)80-200+ tok/s depending on provider
Tool call latencySame (tools run locally)Same (tools run locally)
Network latencyZero (everything local)50-200ms round trip to LLM provider per turn
Context window8K-128K depending on model and RAM128K-1M depending on provider
Tool calling reliabilityGood with 14B+ models, excellent with 32B+Excellent (frontier models)
PrivacyComplete. Nothing leaves your machine.Data sent to provider. Check their retention policies.
CostElectricity only. No per-token charges.$3-15 per million input tokens depending on model
Setup complexityModerate. Need bridge layer, model selection, RAM considerations.Low. API key and go.

The total latency for a tool-calling turn is: inference time + tool execution time. If the model takes 2 seconds to decide which tool to call and the tool takes 500ms to execute, local and cloud are closer than the raw tok/s numbers suggest. The gap widens for tasks that require long generated responses (summaries, code generation) and narrows for tasks that are mostly tool calls (file browsing, database queries).

RAM planning

A 7B model needs roughly 4-6GB RAM. A 14B model needs 8-12GB. A 32B model needs 18-24GB. A 70B model (quantized to Q4) needs 40GB+. Each active MCP server adds 50-200MB. Plan your model choice around available system memory.

When to Use Cloud APIs Instead

Local LLMs with MCP are not universally better than cloud APIs. Each has a clear advantage zone.

Use Ollama + MCP when...

Data cannot leave your network. You need predictable costs (no per-token billing). You're running simple tool workflows (1-3 tools per turn). Latency to the cloud provider is high. You're experimenting and iterating on prompts.

Use cloud APIs when...

Task requires frontier model reasoning (complex multi-step planning). You need large context windows (100K+ tokens). Tool calling needs to be near-perfect (production pipelines). You're working with many concurrent users. You don't have a GPU or M-series Mac.

The hybrid approach works well: use Ollama for development and testing (fast iteration, no cost), then deploy with a cloud API for production (reliability, speed). Since MCP server configs are the same for both, switching is a model flag change.

Same MCP config, different model

# Local development
mcphost -m ollama:qwen3:14b --config mcp-servers.json

# Production (switch to cloud)
mcphost -m anthropic:claude-sonnet-4-20250514 --config mcp-servers.json
mcphost -m openai:gpt-4o --config mcp-servers.json

Frequently Asked Questions

Does Ollama natively support MCP?

No. As of April 2026, Ollama does not have built-in MCP support. The feature request (GitHub issue #7865) is still open. You need a bridge: MCPHost, ollama-mcp-bridge, or a custom client built with the MCP SDK.

Which Ollama models work with MCP tool calling?

Qwen 3 (all sizes), Llama 3.1/3.3, Mistral, Hermes 3, Gemma 4, and GLM-4 all support tool calling. For MCP use, 14B+ parameters is the practical minimum. Qwen 3 32B is the most reliable option for complex multi-tool workflows.

What is the easiest way to connect Ollama to MCP servers?

MCPHost. Install Go, run go install github.com/mark3labs/mcphost@latest, write a JSON config listing your MCP servers, and run mcphost -m ollama:qwen3 --config servers.json. Working setup in under 5 minutes.

Can Ollama use the same MCP servers as Claude Desktop?

Yes. MCP servers are model-agnostic. The filesystem, GitHub, Brave Search, and database servers that work with Claude Desktop work identically with Ollama through any bridge. The protocol is the same; only the client changes.

Is Ollama + MCP slower than cloud APIs with MCP?

Inference is slower on consumer hardware (30-60 tok/s for 7B vs 100+ tok/s from cloud APIs). But tool execution time is identical and you eliminate network latency. For tool-heavy workflows where inference is a small fraction of total time, local performance can approach cloud.

Can I use Ollama's OpenAI-compatible API with MCP clients?

Yes. Ollama exposes http://localhost:11434/v1/chat/completions with tool support. Any MCP client built on the OpenAI SDK works by setting base_url to http://localhost:11434/v1 and using "ollama" as the API key.

What MCP servers work best with Ollama?

Filesystem, Git, GitHub, and Brave Search are the most useful for development workflows. Avoid servers that return very large responses, as local models have smaller effective context windows. Start with @modelcontextprotocol/server-filesystem and add from there.

How does Ollama MCP compare to Morph's MCP tools?

Different problems. Ollama + MCP gives you local, private inference with tool access. Morph's MCP tools (WarpGrep for code search, Fast Apply for edits, Sandbox for execution) are specialized coding subagents. You can use both: point MCPHost at Ollama for inference and include @morphllm/morphmcp in your server config for code search and editing.

MCP Tools Built for Coding Agents

Morph's MCP server gives any LLM (local or cloud) access to WarpGrep codebase search and Fast Apply code editing. No embeddings, no indexing. Works with Ollama, Claude, GPT-4, or any MCP-compatible client.