Sandbox API: How to Give AI Agents Safe Code Execution (2026)

A technical guide to sandbox APIs for AI coding tools. Covers REST vs SDK vs WebSocket patterns, evaluation criteria (cold start, persistence, multi-language, pricing), and a provider comparison of Morph, E2B, Modal, and Fly.io with code examples.

April 4, 2026 · 1 min read

What Is a Sandbox API

A sandbox API is a service that runs code in an isolated environment and returns the result over HTTP, WebSocket, or SDK. The caller sends code, the sandbox executes it inside a container or microVM, and the caller gets stdout, stderr, exit code, and any files produced. The host system is never exposed.

This matters because AI agents generate code they need to run. A coding agent that writes a Python function needs to execute tests. An agent that generates a data pipeline needs to install dependencies and verify the output. A code review agent needs to run linting and type checking. None of this should happen on the machine serving your API.

<300ms
Cold start (Morph Sandbox)
5-15%
Typical sandbox cost vs LLM spend
0
Host system exposure

Core Capabilities

A production sandbox API provides: process isolation (code cannot escape the sandbox), filesystem containment (reads/writes stay inside), resource limits (CPU, memory, time caps), network control (restrict or allow outbound calls), and artifact retrieval (pull files out of the sandbox after execution).

Who Needs a Sandbox API

Three categories of builders use sandbox APIs today. First, AI coding tool developers who need to run LLM-generated code safely. Second, education platforms that let students execute code in the browser. Third, teams building code evaluation pipelines for hiring, benchmarking, or automated testing. The AI agent use case is growing fastest because every new coding agent needs execution, and building your own sandbox from scratch is a multi-month infrastructure project.

Key API Design Patterns

Sandbox APIs follow three main integration patterns. The right choice depends on your latency requirements, how many execution steps your agent takes, and whether you need real-time output streaming.

REST API

Send code in a POST request, get results in the response. Simplest integration. Works for one-shot execution where you don't need streaming output. Typical round-trip: 200-500ms for short scripts, seconds for longer runs. Every request is stateless unless you manage session IDs yourself.

Native SDK

Language-specific client library that handles session management, file operations, and streaming internally. You write morph.sandbox.run(code) instead of constructing HTTP requests. SDKs manage connection pooling, retries, and sandbox lifecycle. This is the most common pattern for production AI tools.

WebSocket

Persistent connection that streams stdout/stderr in real time. Required for long-running processes, interactive REPL sessions, and agent loops that need to react to partial output. Higher integration complexity but essential when your agent watches output and decides the next step mid-execution.

REST: Simple but Limited

A REST sandbox API accepts a code payload and returns the execution result. This is the right pattern when your agent generates code, runs it, and reads the output in a single step. No session state, no streaming, no complexity.

REST: One-shot execution

curl -X POST https://api.sandbox.example/v1/execute \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "language": "python",
    "code": "import sys; print(sys.version)",
    "timeout": 30
  }'

# Response:
# {
#   "stdout": "3.12.3 (main, Apr 2026)\n",
#   "stderr": "",
#   "exit_code": 0,
#   "duration_ms": 142
# }

SDK: The Production Default

SDKs abstract the transport layer and add session management. A sandbox session persists filesystem state between executions, so your agent can write a file in step 1, install dependencies in step 2, and run tests in step 3. The SDK handles sandbox creation, keep-alive, and cleanup.

SDK: Multi-step agent workflow (Morph Sandbox)

import { MorphSandbox } from "@anthropic-ai/morph-sandbox";

const sandbox = await MorphSandbox.create({
  apiKey: process.env.MORPH_API_KEY,
  template: "python-3.12",
  timeout: 300, // 5 minute max lifetime
});

// Step 1: Write the code
await sandbox.filesystem.write(
  "/app/main.py",
  agentGeneratedCode
);

// Step 2: Install dependencies
const install = await sandbox.exec("pip install -r /app/requirements.txt");
if (install.exitCode !== 0) {
  throw new Error(`Dependency install failed: ${install.stderr}`);
}

// Step 3: Run tests
const result = await sandbox.exec("cd /app && python -m pytest -v");
console.log(result.stdout);
// Tests pass or fail inside the sandbox. Host is unaffected.

await sandbox.destroy();

WebSocket: Real-Time Streaming

WebSocket connections let your agent receive output as it is produced, not after execution completes. This matters for long-running processes (training scripts, build commands) and for agents that make decisions based on partial output. If your agent sees a test failure in line 3, it can kill the sandbox and start fixing the code instead of waiting for all 200 tests to finish.

WebSocket: Streaming stdout

const ws = sandbox.stream("python /app/train.py");

ws.on("stdout", (chunk) => {
  // React to output in real time
  if (chunk.includes("loss: NaN")) {
    ws.kill(); // Stop execution early
    agent.fixTrainingCode();
  }
});

ws.on("exit", ({ code }) => {
  if (code === 0) agent.reportSuccess();
});

Evaluation Criteria

Five factors separate sandbox APIs that work in demos from those that work in production.

1. Cold Start Time

Cold start is the time from "create sandbox" to "sandbox is ready to execute code." For interactive AI tools, anything over 1 second breaks the user experience. For background pipelines, 5-10 seconds is acceptable. The best providers achieve sub-300ms cold starts using pre-warmed pools of microVMs or containers.

2. Filesystem Persistence

Stateless sandboxes destroy everything after each execution. Stateful sandboxes preserve the filesystem between calls. AI agents almost always need state: they write code, install packages, run tests, fix failures, and run tests again. If each step starts from a blank filesystem, the agent wastes tokens and time re-creating context. Look for sandboxes that persist state for at least the duration of an agent session (5-30 minutes).

3. Multi-Language Support

Your users write in many languages. A sandbox that only supports Python is not enough if your agent handles TypeScript, Go, or Rust projects. The best providers ship pre-built templates for common runtimes and let you bring your own Docker image for anything else.

4. Pricing Model

Sandbox pricing comes in three models: per-second (you pay for sandbox uptime), per-execution (you pay per code run), and bundled (included with a broader platform). Per-second pricing is most common and aligns well with agent workflows where sandbox lifetime varies. Watch for minimum billing increments: a provider billing in 1-minute increments will be expensive if your average execution takes 3 seconds.

5. Integration Complexity

Count the lines of code from "npm install" to "running untrusted code safely." Some providers require custom Docker images, infrastructure configuration, and manual networking setup. Others give you a single SDK call. For most AI tool builders, the simpler path wins because sandbox infrastructure is not your product.

CriterionMust HaveNice to Have
Cold start< 1s for interactive< 300ms with pre-warming
PersistenceFilesystem survives between exec callsSnapshot/restore for long-lived sessions
LanguagesPython, JS/TS, Go, RustCustom Docker image support
PricingPer-second billing, no minimumsFree tier for development
IntegrationSDK with < 20 lines to first executionWebSocket streaming, file upload/download

Provider Comparison

Four providers cover the majority of the sandbox API market in 2026. Each makes different tradeoffs.

FeatureMorph SandboxE2BModalFly.io
Primary use caseAI agent code executionAI agent code executionML/data workloadsGeneral compute
Cold start< 300ms< 500ms< 1s (CPU), 30-60s (GPU)~300ms (Machines)
PersistenceSession-scoped filesystemSession-scoped filesystemVolumes (persistent)Volumes (persistent)
SDK languagesPython, TypeScriptPython, TypeScriptPythonREST API (any language)
Streaming outputWebSocket + SDKWebSocket + SDKGenerator-basedLogs API
Custom environmentsTemplates + DockerTemplates + DockerDocker imagesDocker images
GPU supportNoNoYes (A100, H100)Yes (L40S, A100)
Billing modelIncluded with Morph APIPer sandbox-secondPer CPU/GPU-secondPer Machine-second
Free tierYes (with Morph API)100 hours/month$30/month creditsFree allowance
Best forAI tools using Morph modelsStandalone AI sandboxGPU-heavy ML pipelinesCustom infrastructure

Morph Sandbox SDK

Built for AI agent workflows. Sandboxes persist filesystem state across executions within a session, so an agent can write files, install packages, and iterate without re-creating state. Included with Morph API plans, so teams already using Morph for LLM inference pay nothing extra for sandboxing. Python and TypeScript SDKs with WebSocket streaming.

E2B

The most-established standalone sandbox API for AI tools. Clean SDK design, good documentation, active open-source community. Sub-500ms cold starts. The main tradeoff is that it is a separate service with separate billing. If you already use another LLM provider, E2B adds another vendor and another cost line.

Modal

Designed for ML workloads, not specifically for AI agent sandboxing. The strength is GPU support: you can spin up A100 or H100 instances on demand. The Python-first SDK uses decorators and generators instead of explicit sandbox lifecycle management. Good for data science and training pipelines. Overkill if you just need to run pytest in a container.

Fly.io

General-purpose compute platform with Machines API for on-demand containers. Not sandbox-specific, so you build isolation and lifecycle management yourself. The advantage is flexibility: full control over networking, volumes, regions, and scaling. The disadvantage is that you are building sandbox infrastructure instead of buying it.

Morph Sandbox SDK: Code Examples

The Morph Sandbox SDK is designed for the most common AI agent pattern: create a sandbox, write files, execute code, read results, iterate, destroy. Here are concrete examples.

Basic: Run untrusted Python code

import { MorphSandbox } from "@anthropic-ai/morph-sandbox";

const sandbox = await MorphSandbox.create({
  apiKey: process.env.MORPH_API_KEY,
  template: "python-3.12",
});

const result = await sandbox.exec(`python3 -c "
import json
data = {'status': 'ok', 'values': [1, 2, 3]}
print(json.dumps(data))
"`);

console.log(result.stdout);  // {"status": "ok", "values": [1, 2, 3]}
console.log(result.exitCode); // 0

await sandbox.destroy();

Multi-step: Agent writes, tests, and iterates

const sandbox = await MorphSandbox.create({
  apiKey: process.env.MORPH_API_KEY,
  template: "node-20",
  timeout: 600,
});

// Agent writes code
await sandbox.filesystem.write("/app/index.ts", agentCode);
await sandbox.filesystem.write("/app/index.test.ts", agentTests);

// Install dependencies (filesystem persists between calls)
await sandbox.exec("cd /app && npm install");

// Run tests
let result = await sandbox.exec("cd /app && npx vitest run");

// If tests fail, agent fixes and re-runs
while (result.exitCode !== 0 && retries < 3) {
  const fixedCode = await llm.fixCode(agentCode, result.stderr);
  await sandbox.filesystem.write("/app/index.ts", fixedCode);
  result = await sandbox.exec("cd /app && npx vitest run");
  retries++;
}

// Pull generated artifacts
const coverage = await sandbox.filesystem.read("/app/coverage/lcov.info");

await sandbox.destroy();

Streaming: Watch output in real time

const sandbox = await MorphSandbox.create({
  apiKey: process.env.MORPH_API_KEY,
  template: "python-3.12",
});

await sandbox.filesystem.write("/app/build.py", buildScript);

// Stream output as it happens
const stream = sandbox.stream("cd /app && python build.py");

for await (const event of stream) {
  if (event.type === "stdout") {
    process.stdout.write(event.data);
  }
  if (event.type === "stderr" && event.data.includes("ERROR")) {
    stream.kill();
    break;
  }
}

await sandbox.destroy();

Why Session-Scoped Persistence Matters

AI agents rarely execute code in a single step. A typical agent loop is: write code, install dependencies, run tests, read errors, fix code, re-run tests. Each step depends on the filesystem state from the previous step. Ephemeral sandboxes that reset between calls force the agent to reinstall dependencies and rewrite files every iteration, wasting both tokens and time.

Pricing Comparison

Sandbox costs scale with usage. Here is what each provider charges as of April 2026.

ProviderBilling UnitPriceFree Tier
Morph SandboxIncluded with APIBundled with Morph plansYes (Morph free tier)
E2BPer sandbox-second$0.000056/s (~$0.20/hr)100 hrs/month
ModalPer CPU-second$0.000064/s (~$0.23/hr CPU)$30/month credits
Fly.ioPer Machine-secondFrom $0.0000025/s (shared)Free allowance

For teams already using Morph for LLM inference, the sandbox is free. For teams using other LLM providers, E2B is the most straightforward standalone option. Modal makes sense if you need GPUs. Fly.io is cheapest at raw compute level but requires more integration work.

Cost at Scale

A typical AI coding agent creates 10-50 sandbox sessions per user per day, with each session running 5-20 executions over 2-10 minutes. At 1,000 daily active users and an average sandbox lifetime of 5 minutes:

~$250/mo
E2B (30 sessions/user, 5 min avg)
~$290/mo
Modal (same usage, CPU only)
$0
Morph (included with API plan)

Sandbox cost is typically 5-15% of total LLM API spend. The cost is real but not the dominant expense. Pick your sandbox provider based on integration quality and feature fit, not price alone.

Frequently Asked Questions

What is a sandbox API?

A sandbox API provides an isolated code execution environment accessible over HTTP or SDK. It runs untrusted code without risking the host system. The sandbox handles process isolation, filesystem containment, network restrictions, and resource limits. AI coding tools use sandbox APIs to run tests, evaluate code, install dependencies, and execute shell commands safely.

Why do AI agents need a sandbox API?

LLM-generated code is untrusted code. It can contain bugs, infinite loops, unintended filesystem operations, or security vulnerabilities. Running it directly on a production server risks data loss, resource exhaustion, and security breaches. A sandbox provides containment: code runs in isolation with controlled resources. If something breaks, the sandbox is destroyed with no impact on the host.

What is the difference between E2B and Morph Sandbox?

E2B is a standalone sandbox API with its own billing. Morph Sandbox is bundled with the Morph API platform. Both provide session-scoped filesystem persistence and SDKs for Python and TypeScript. Morph is the better choice if you already use Morph for LLM inference (zero additional cost). E2B is the better choice if you use a different LLM provider and want a dedicated, well-documented sandbox service.

How much does a sandbox API cost?

E2B charges ~$0.20/hour per sandbox. Modal charges ~$0.23/hour for CPU. Fly.io starts at ~$0.01/hour for shared instances. Morph Sandbox is included with Morph API plans. At production scale with 1,000 daily users, expect $200-400/month for standalone providers. Sandbox cost is typically 5-15% of LLM API spend.

Can I use a sandbox API for production AI applications?

Yes. All four providers covered here support production workloads. Key requirements: sub-1s cold starts for interactive use, concurrency handling for parallel agent executions, and monitoring/logging for debugging. Test with your actual agent workflow before committing to a provider.

What languages do sandbox APIs support?

All major providers support Python, JavaScript/TypeScript, Go, Rust, Java, and Ruby out of the box. Most also support custom Docker images, so you can bring any runtime. The constraint is not language support but pre-built templates: having a template for your language means faster cold starts and fewer dependency issues.

REST API vs SDK vs WebSocket: which pattern should I use?

REST for one-shot execution in simple integrations. SDK for production AI tools (handles sessions, retries, streaming). WebSocket for real-time output streaming in agent loops. Most production tools use SDKs that wrap WebSocket connections internally. Start with the SDK and drop to WebSocket only if you need custom streaming behavior.

How do I evaluate sandbox API providers?

Test five things: cold start time under your expected load, filesystem persistence across execution steps, language/runtime support for your stack, pricing at your projected scale, and lines of code to integrate. Run your actual agent workflow against each provider. Demo performance does not always match production performance.

Start Building with Morph Sandbox SDK

Morph Sandbox gives AI agents safe, persistent code execution with sub-300ms cold starts. Included free with Morph API. Python and TypeScript SDKs with WebSocket streaming.