Code Execution with MCP: How to Give AI Agents Safe Code Running (2026)

What Is Code Execution with MCP

The Model Context Protocol is an open standard for connecting AI agents to external tools. An MCP server exposes capabilities (tools, resources, prompts) over a standard transport. An MCP client, running inside the agent, discovers those capabilities and calls them when the model decides to.

Code execution with MCP means one of those tools is a sandbox. The server exposes a tool like run_code or exec that accepts source code and returns stdout, stderr, and an exit code. The agent writes a program, calls the tool, reads the result, and decides what to do next. This is how agents go from generating code to actually running it.

98.7%

Token reduction with code execution (Anthropic)

150K → 2K

Tokens for same workflow

Open-source MCP code execution servers

The key insight from Anthropic's engineering blog: when agents process data through the context window, every intermediate value consumes tokens. When they write a program that processes data inside a sandbox, only the final result enters the context. The data never touches the model. This is not incremental optimization. It is a different architecture.

MCP Protocol Basics

MCP uses a JSON-RPC transport (stdio or HTTP+SSE). Servers register tools with names, descriptions, and JSON Schema input definitions. Clients call tools/list to discover available tools and tools/call to invoke them. Any sandbox that implements these two endpoints becomes an MCP code execution server.

Why Agents Need Code Execution

An agent that can only generate code is half an agent. The other half is execution: running tests, checking types, installing dependencies, verifying output. Without execution, the agent generates code blindly and hopes it works. With execution, the agent operates in a feedback loop: write, run, observe, fix.

The Token Problem

Consider an agent that needs to find the top 10 customers by revenue from a database. Without code execution, the agent makes a tool call to query the database, receives the full result set in the context window, then reasons over it to produce an answer. If the result set is 10,000 rows, that is tens of thousands of tokens consumed by intermediate data.

With code execution, the agent writes a Python script that queries the database, sorts by revenue, and prints the top 10. The script runs in the sandbox. Only 10 rows enter the context. Same answer, a fraction of the tokens.

The Verification Loop

Coding agents that generate code without running it produce code that works about 60-70% of the time on first attempt. Agents that run tests after generation and iterate on failures reach 85-95% success rates. The execution step is what closes the gap. MCP makes this execution step standardized and portable: the same agent code works with any MCP-compatible sandbox.

Three Patterns That Require Execution

Test-Driven Iteration

Agent writes code, writes tests, runs tests in the sandbox, reads failures, fixes the code, and re-runs. This loop continues until tests pass or the retry budget is exhausted. Every step depends on executing code and reading the result.

Data Processing

Agent queries APIs or databases through MCP tools, but processes the data inside the sandbox instead of the context window. Filtering, aggregation, joins, and formatting happen in code. The model only sees the processed output.

Multi-Tool Orchestration

Agent writes a program that calls multiple MCP tools in sequence with conditional logic. Instead of making 20 individual tool calls through the model, it writes one script that makes 20 calls inside the sandbox. One round-trip instead of twenty.

How MCP Code Execution Works

An MCP code execution server exposes one or more tools that accept code as input and return execution results. The server handles sandboxing internally. The agent does not need to know whether code runs in Docker, a microVM, or WebAssembly. It calls the tool, gets the result.

Tool Definition

The server registers tools during the MCP initialization handshake. A typical code execution tool looks like this:

MCP tool definition for code execution

{
  "name": "run_code",
  "description": "Execute code in a sandboxed environment. Supports Python, JavaScript, and shell commands. Returns stdout, stderr, and exit code.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "language": {
        "type": "string",
        "enum": ["python", "javascript", "shell"]
      },
      "code": {
        "type": "string",
        "description": "The source code to execute"
      },
      "timeout": {
        "type": "number",
        "description": "Max execution time in seconds",
        "default": 30
      }
    },
    "required": ["language", "code"]
  }
}

Execution Flow

When the agent decides to run code, the flow is:

Agent generates a tools/call request with the tool name and code input
MCP client sends the request to the code execution server
Server creates or reuses a sandbox, writes the code, executes it
Server returns stdout, stderr, exit code as the tool result
Agent reads the result and decides the next action

Agent calling an MCP code execution tool

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

// Connect to a code execution MCP server
const transport = new StdioClientTransport({
  command: "npx",
  args: ["-y", "@e2b/mcp-server"],
  env: { E2B_API_KEY: process.env.E2B_API_KEY }
});

const client = new Client({ name: "my-agent", version: "1.0" });
await client.connect(transport);

// Discover available tools
const { tools } = await client.listTools();
// tools includes: run_code, run_python, create_sandbox, etc.

// Execute code through MCP
const result = await client.callTool({
  name: "run_code",
  arguments: {
    language: "python",
    code: `
import json
data = [{"name": "Alice", "revenue": 50000},
        {"name": "Bob", "revenue": 120000},
        {"name": "Carol", "revenue": 75000}]
top = sorted(data, key=lambda x: x["revenue"], reverse=True)[:2]
print(json.dumps(top, indent=2))
`
  }
});

console.log(result.content);
// [{ "name": "Bob", "revenue": 120000 },
//  { "name": "Carol", "revenue": 75000 }]

The Filesystem-as-API Pattern

Anthropic's approach goes further. Instead of the agent calling MCP tools one at a time through the model, the server exposes tool definitions as files in the sandbox filesystem. The agent reads the tool schemas from disk, writes a program that calls multiple tools, and executes the program. The sandbox handles the MCP tool calls internally.

This inverts the typical flow. Instead of model → MCP client → server for each tool call, it becomes model → write program → execute once → read result. The model makes one decision ("write this program") instead of N decisions ("call this tool, then this tool, then this tool"). Fewer model round-trips means lower latency and fewer tokens.

Filesystem-as-API: agent writes a program that calls tools

# Inside the sandbox, the agent writes and runs this script.
# Tool schemas are available as files: /tools/query_database.json,
# /tools/send_email.json, etc.

import json

# The sandbox provides a call_tool() function that
# invokes MCP tools without going back through the model
from mcp_runtime import call_tool

# Step 1: Query the database
customers = call_tool("query_database", {
    "query": "SELECT name, revenue FROM customers ORDER BY revenue DESC LIMIT 10"
})

# Step 2: Format the report (happens in code, not in model context)
report = "Top 10 Customers by Revenue:\n"
for i, c in enumerate(customers["rows"], 1):
    report += f"{i}. {c['name']}: ${c['revenue']:,.0f}\n"

# Step 3: Send the report
call_tool("send_email", {
    "to": "team@company.com",
    "subject": "Weekly Revenue Report",
    "body": report
})

# Only this final output enters the model's context
print(f"Report sent. {len(customers['rows'])} customers included.")

Available MCP Code Execution Servers

The MCP ecosystem has several code execution servers, ranging from lightweight open-source projects to managed cloud services. Here is the landscape as of April 2026.

Server	Isolation	Languages	Persistence	Deployment
E2B MCP Server	MicroVM (Firecracker)	Python, JS, Shell	Session-scoped	Cloud (hosted)
Daytona MCP	Container / MicroVM	Python, TS, Go, +more	Session-scoped	Cloud (hosted)
code-sandbox-mcp	Docker container	Python, JavaScript	Per-container	Self-hosted
mcp-run-python	WebAssembly (Pyodide)	Python only	Ephemeral	Self-hosted
AgentExec MCP	Docker container	Python, Node.js, Go	Per-container	Self-hosted
Morph Sandbox MCP	MicroVM	Python, TS, Go, Rust	Session-scoped	Cloud (hosted)

E2B MCP Server

E2B is the most established cloud sandbox for AI agents. Their official MCP server (e2b-dev/mcp-server) wraps the E2B SDK in an MCP-compatible interface. Supports Python, JavaScript, and shell. Each sandbox runs in a Firecracker microVM with its own filesystem. Sub-500ms cold starts. Free tier includes 100 sandbox hours per month.

Adding E2B MCP server to Claude Desktop

// claude_desktop_config.json
{
  "mcpServers": {
    "e2b": {
      "command": "npx",
      "args": ["-y", "@e2b/mcp-server"],
      "env": {
        "E2B_API_KEY": "your-e2b-api-key"
      }
    }
  }
}

Daytona MCP

Daytona provides secure sandbox infrastructure with a full MCP integration. Supports Python, TypeScript, Go, and more. Includes filesystem access, process management, and Git operations inside the sandbox. The MCP server exposes tools for code execution, file manipulation, and environment management. Stronger isolation than Docker-only solutions, with persistent sessions.

code-sandbox-mcp (Philipp Schmid)

A lightweight, self-hosted MCP server that runs code in Docker containers. Exposes run_python_code and run_javascript_code tools. Good for local development and prototyping. Container isolation keeps untrusted code away from the host, but you manage Docker yourself. No cloud dependency.

mcp-run-python (Pydantic)

Runs Python in a WebAssembly sandbox using Pyodide. Zero Docker dependency since everything runs in-process. Fast startup. The tradeoff: Python-only, limited library support (anything with C extensions needs Pyodide compatibility), and the Pyodide sandbox was not designed to contain deliberately malicious code. Best for data processing tasks where you trust the model but want isolation from accidental filesystem damage.

AgentExec MCP

Built on the FastMCP framework, AgentExec packages shell and code execution in a Docker container. Supports Python, Node.js, and Go. Exposes tools for running code, executing shell commands, and managing the sandbox lifecycle. Good for teams that want a self-hosted, multi-language execution server.

Choosing a Server

For production AI tools, use a hosted service (E2B, Daytona, or Morph Sandbox) that provides microVM isolation and managed infrastructure. For local development and prototyping, code-sandbox-mcp or mcp-run-python are fast to set up. For air-gapped or on-premise requirements, AgentExec or code-sandbox-mcp with your own Docker infrastructure.

Security and Sandboxing

MCP is a transport protocol. It does not provide sandboxing. The code execution server is responsible for isolation. This distinction matters because a poorly sandboxed MCP server gives your agent the ability to run arbitrary code on your machine with your permissions.

Isolation Levels

Method	Isolation Strength	Startup Time	Tradeoff
No sandbox	None	Instant	Any code runs with host permissions. Never use in production.
WebAssembly	Medium	< 100ms	Fast, but limited library support and not designed for adversarial code
Docker container	High	200-500ms	Good isolation. Kernel is shared with host. Configure resource limits.
MicroVM (Firecracker)	Very high	< 300ms	Separate kernel. Strongest isolation for cloud workloads.

What to Check

Before deploying an MCP code execution server in production, verify five things:

Process isolation: Can code escape the sandbox? Docker containers share the host kernel. MicroVMs do not.
Filesystem containment: Can code read or write files outside the sandbox? Mount only what the sandbox needs.
Resource limits: Are CPU, memory, and execution time capped? An infinite loop should not consume your server.
Network restrictions: Can sandbox code make outbound HTTP requests? Restrict to necessary endpoints only.
Cleanup: Is the sandbox destroyed after the session? Leftover sandboxes consume resources and may retain sensitive data.

Human-in-the-Loop

The MCP specification recommends that tool invocations always have a human approval step. For code execution, this means showing the user what code the agent wants to run before executing it. Claude Desktop implements this by default. Custom agents should implement an approval flow for code execution in production, or scope the sandbox permissions tightly enough that automatic execution is safe.

Morph Sandbox as MCP Server

Morph Sandbox provides session-scoped code execution environments designed for AI agent workflows. It integrates as an MCP server, exposing tools for code execution, filesystem operations, and dependency management. Sandboxes run in microVMs with sub-300ms cold starts and persistent filesystems within a session.

<300ms

Cold start (MicroVM)

Languages (Python, TS, Go, Rust)

Extra cost (included with Morph API)

Setup

Configure Morph Sandbox as an MCP server in your agent or Claude Desktop:

Adding Morph Sandbox MCP to Claude Desktop

// claude_desktop_config.json
{
  "mcpServers": {
    "morph-sandbox": {
      "command": "npx",
      "args": ["-y", "@morphllm/sandbox-mcp"],
      "env": {
        "MORPH_API_KEY": "your-morph-api-key"
      }
    }
  }
}

Exposed Tools

The Morph Sandbox MCP server exposes these tools to connected agents:

Morph Sandbox MCP tools

// Tools available after connecting to Morph Sandbox MCP server:

// 1. create_sandbox - Create a new execution environment
//    Input: { template: "python-3.12" | "node-20" | "go" | "rust", timeout?: number }
//    Returns: { sandbox_id: string }

// 2. exec - Run a command in the sandbox
//    Input: { sandbox_id: string, command: string }
//    Returns: { stdout: string, stderr: string, exit_code: number }

// 3. write_file - Write a file to the sandbox filesystem
//    Input: { sandbox_id: string, path: string, content: string }
//    Returns: { success: boolean }

// 4. read_file - Read a file from the sandbox filesystem
//    Input: { sandbox_id: string, path: string }
//    Returns: { content: string }

// 5. destroy_sandbox - Clean up the sandbox
//    Input: { sandbox_id: string }
//    Returns: { success: boolean }

Multi-Step Agent Workflow

Because Morph sandboxes persist filesystem state within a session, agents can run multi-step workflows without re-creating state. The agent creates one sandbox, then writes files, installs packages, runs code, reads results, and iterates, all through MCP tool calls.

Agent workflow using Morph Sandbox MCP tools

// The agent makes these MCP tool calls in sequence:

// 1. Create a Python sandbox
const sandbox = await mcpClient.callTool({
  name: "create_sandbox",
  arguments: { template: "python-3.12", timeout: 300 }
});
const id = sandbox.content[0].text; // sandbox ID

// 2. Write the application code
await mcpClient.callTool({
  name: "write_file",
  arguments: {
    sandbox_id: id,
    path: "/app/analyzer.py",
    content: agentGeneratedCode
  }
});

// 3. Install dependencies
const install = await mcpClient.callTool({
  name: "exec",
  arguments: {
    sandbox_id: id,
    command: "pip install pandas numpy"
  }
});

// 4. Run tests (filesystem persists from step 2)
const testResult = await mcpClient.callTool({
  name: "exec",
  arguments: {
    sandbox_id: id,
    command: "cd /app && python -m pytest -v"
  }
});

// 5. If tests fail, agent fixes code and re-runs
// (no need to reinstall dependencies or rewrite other files)

// 6. Cleanup
await mcpClient.callTool({
  name: "destroy_sandbox",
  arguments: { sandbox_id: id }
});

Why Morph Sandbox for MCP

Bundled with Morph API

No separate vendor or billing. If you use Morph for LLM inference, sandbox access is included. One API key, one invoice.

Session-Scoped Persistence

Filesystem state persists across tool calls within a session. Install once, iterate many times. Agents don't waste tokens re-creating state.

MicroVM Isolation

Each sandbox runs in its own microVM with a separate kernel. Stronger isolation than Docker containers. Safe for running untrusted, LLM-generated code.

Multi-Language Templates

Pre-configured environments for Python, TypeScript, Go, and Rust. Custom Docker images supported for other runtimes. Sub-300ms cold starts with pre-warmed pools.

Frequently Asked Questions

What is code execution with MCP?

Code execution with MCP is a pattern where AI agents use the Model Context Protocol to discover and invoke code execution tools exposed by MCP servers. The agent writes code, sends it to a sandboxed environment via an MCP tool call, and reads the execution result (stdout, stderr, exit code). This gives agents the ability to test their own code, process data without loading it into the context window, and build multi-step workflows that would otherwise require many individual model calls.

How does MCP code execution reduce token usage?

When an agent processes data through the context window, every intermediate result consumes tokens. With code execution, the agent writes a program that processes data inside the sandbox. Only the final output enters the context. Anthropic measured a reduction from 150,000 tokens to 2,000 tokens for a data processing workflow. The savings come from keeping intermediate data in the sandbox and only surfacing the result the model needs to see.

What MCP servers support code execution?

E2B has an official MCP server for cloud sandboxes. Daytona provides MCP-integrated sandbox infrastructure. Pydantic's mcp-run-python runs Python in WebAssembly. Philipp Schmid's code-sandbox-mcp uses Docker for Python and JavaScript. AgentExec MCP supports Python, Node.js, and Go in Docker. Morph Sandbox provides cloud-hosted microVM sandboxes with an MCP interface. Each makes different tradeoffs between isolation strength, language support, and deployment model.

Is MCP code execution safe?

Safety depends on the sandbox, not the protocol. MCP is a transport layer for tool calls. The server behind it determines how code is isolated. Docker containers share the host kernel but provide strong process and filesystem isolation. MicroVMs (E2B, Morph) run a separate kernel for each sandbox, providing the strongest isolation. WebAssembly (mcp-run-python) sandboxes well but limits what code can do. Always treat LLM-generated code as untrusted and choose isolation that matches your threat model.

How do I add code execution to my MCP-enabled agent?

Configure your agent's MCP client to connect to a code execution server. For Claude Desktop, add the server config to claude_desktop_config.json. For custom agents using the MCP TypeScript or Python SDK, create a client transport pointing to the server. The server automatically registers its tools during the MCP handshake. Your agent discovers them via tools/list and calls them via tools/call. No changes to your agent's prompt or logic are required.

What is the difference between code execution MCP and a sandbox API?

A sandbox API is a service you call directly from application code via HTTP or SDK. Code execution MCP wraps a sandbox in the MCP protocol so agents can discover and use it through standard tool calling. The sandbox does the same thing in both cases: run code in isolation. The difference is who calls it. Application code calls a sandbox API directly. An AI agent calls it through MCP. Many providers (E2B, Morph) offer both interfaces.

Can MCP code execution handle multiple programming languages?

Yes, depending on the server. Docker-based servers support any language you can install in a container. E2B supports Python, JavaScript, and shell. Daytona supports Python, TypeScript, Go, and more. mcp-run-python is Python-only. Morph Sandbox has pre-configured templates for Python, TypeScript, Go, and Rust. For the widest language support, use a container-based server or a hosted solution with custom image support.

What is the Anthropic code execution with MCP pattern?

Anthropic published a pattern where MCP tool definitions are written to the sandbox filesystem as files. The agent reads tool schemas from disk, writes a program that calls multiple tools with conditional logic, and executes the program in one step. Instead of making N tool calls through the model (N model round-trips), the agent writes one program (one model round-trip). This reduces both token usage and latency for workflows that involve many tools or large intermediate datasets.

Try Morph Sandbox MCP

Give your AI agents safe code execution through MCP. Sub-300ms cold starts, session-scoped persistence, microVM isolation. Included with every Morph API plan.

Get API Key

Read the Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers