What Is a Sandbox API
A sandbox API is a service that runs code in an isolated environment and returns the result over HTTP, WebSocket, or SDK. The caller sends code, the sandbox executes it inside a container or microVM, and the caller gets stdout, stderr, exit code, and any files produced. The host system is never exposed.
This matters because AI agents generate code they need to run. A coding agent that writes a Python function needs to execute tests. An agent that generates a data pipeline needs to install dependencies and verify the output. A code review agent needs to run linting and type checking. None of this should happen on the machine serving your API.
Core Capabilities
A production sandbox API provides: process isolation (code cannot escape the sandbox), filesystem containment (reads/writes stay inside), resource limits (CPU, memory, time caps), network control (restrict or allow outbound calls), and artifact retrieval (pull files out of the sandbox after execution).
Who Needs a Sandbox API
Three categories of builders use sandbox APIs today. First, AI coding tool developers who need to run LLM-generated code safely. Second, education platforms that let students execute code in the browser. Third, teams building code evaluation pipelines for hiring, benchmarking, or automated testing. The AI agent use case is growing fastest because every new coding agent needs execution, and building your own sandbox from scratch is a multi-month infrastructure project.
Key API Design Patterns
Sandbox APIs follow three main integration patterns. The right choice depends on your latency requirements, how many execution steps your agent takes, and whether you need real-time output streaming.
REST API
Send code in a POST request, get results in the response. Simplest integration. Works for one-shot execution where you don't need streaming output. Typical round-trip: 200-500ms for short scripts, seconds for longer runs. Every request is stateless unless you manage session IDs yourself.
Native SDK
Language-specific client library that handles session management, file operations, and streaming internally. You write morph.sandbox.run(code) instead of constructing HTTP requests. SDKs manage connection pooling, retries, and sandbox lifecycle. This is the most common pattern for production AI tools.
WebSocket
Persistent connection that streams stdout/stderr in real time. Required for long-running processes, interactive REPL sessions, and agent loops that need to react to partial output. Higher integration complexity but essential when your agent watches output and decides the next step mid-execution.
REST: Simple but Limited
A REST sandbox API accepts a code payload and returns the execution result. This is the right pattern when your agent generates code, runs it, and reads the output in a single step. No session state, no streaming, no complexity.
REST: One-shot execution
curl -X POST https://api.sandbox.example/v1/execute \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"language": "python",
"code": "import sys; print(sys.version)",
"timeout": 30
}'
# Response:
# {
# "stdout": "3.12.3 (main, Apr 2026)\n",
# "stderr": "",
# "exit_code": 0,
# "duration_ms": 142
# }SDK: The Production Default
SDKs abstract the transport layer and add session management. A sandbox session persists filesystem state between executions, so your agent can write a file in step 1, install dependencies in step 2, and run tests in step 3. The SDK handles sandbox creation, keep-alive, and cleanup.
SDK: Multi-step agent workflow (Morph Sandbox)
import { MorphSandbox } from "@anthropic-ai/morph-sandbox";
const sandbox = await MorphSandbox.create({
apiKey: process.env.MORPH_API_KEY,
template: "python-3.12",
timeout: 300, // 5 minute max lifetime
});
// Step 1: Write the code
await sandbox.filesystem.write(
"/app/main.py",
agentGeneratedCode
);
// Step 2: Install dependencies
const install = await sandbox.exec("pip install -r /app/requirements.txt");
if (install.exitCode !== 0) {
throw new Error(`Dependency install failed: ${install.stderr}`);
}
// Step 3: Run tests
const result = await sandbox.exec("cd /app && python -m pytest -v");
console.log(result.stdout);
// Tests pass or fail inside the sandbox. Host is unaffected.
await sandbox.destroy();WebSocket: Real-Time Streaming
WebSocket connections let your agent receive output as it is produced, not after execution completes. This matters for long-running processes (training scripts, build commands) and for agents that make decisions based on partial output. If your agent sees a test failure in line 3, it can kill the sandbox and start fixing the code instead of waiting for all 200 tests to finish.
WebSocket: Streaming stdout
const ws = sandbox.stream("python /app/train.py");
ws.on("stdout", (chunk) => {
// React to output in real time
if (chunk.includes("loss: NaN")) {
ws.kill(); // Stop execution early
agent.fixTrainingCode();
}
});
ws.on("exit", ({ code }) => {
if (code === 0) agent.reportSuccess();
});Evaluation Criteria
Five factors separate sandbox APIs that work in demos from those that work in production.
1. Cold Start Time
Cold start is the time from "create sandbox" to "sandbox is ready to execute code." For interactive AI tools, anything over 1 second breaks the user experience. For background pipelines, 5-10 seconds is acceptable. The best providers achieve sub-300ms cold starts using pre-warmed pools of microVMs or containers.
2. Filesystem Persistence
Stateless sandboxes destroy everything after each execution. Stateful sandboxes preserve the filesystem between calls. AI agents almost always need state: they write code, install packages, run tests, fix failures, and run tests again. If each step starts from a blank filesystem, the agent wastes tokens and time re-creating context. Look for sandboxes that persist state for at least the duration of an agent session (5-30 minutes).
3. Multi-Language Support
Your users write in many languages. A sandbox that only supports Python is not enough if your agent handles TypeScript, Go, or Rust projects. The best providers ship pre-built templates for common runtimes and let you bring your own Docker image for anything else.
4. Pricing Model
Sandbox pricing comes in three models: per-second (you pay for sandbox uptime), per-execution (you pay per code run), and bundled (included with a broader platform). Per-second pricing is most common and aligns well with agent workflows where sandbox lifetime varies. Watch for minimum billing increments: a provider billing in 1-minute increments will be expensive if your average execution takes 3 seconds.
5. Integration Complexity
Count the lines of code from "npm install" to "running untrusted code safely." Some providers require custom Docker images, infrastructure configuration, and manual networking setup. Others give you a single SDK call. For most AI tool builders, the simpler path wins because sandbox infrastructure is not your product.
| Criterion | Must Have | Nice to Have |
|---|---|---|
| Cold start | < 1s for interactive | < 300ms with pre-warming |
| Persistence | Filesystem survives between exec calls | Snapshot/restore for long-lived sessions |
| Languages | Python, JS/TS, Go, Rust | Custom Docker image support |
| Pricing | Per-second billing, no minimums | Free tier for development |
| Integration | SDK with < 20 lines to first execution | WebSocket streaming, file upload/download |
Provider Comparison
Four providers cover the majority of the sandbox API market in 2026. Each makes different tradeoffs.
| Feature | Morph Sandbox | E2B | Modal | Fly.io |
|---|---|---|---|---|
| Primary use case | AI agent code execution | AI agent code execution | ML/data workloads | General compute |
| Cold start | < 300ms | < 500ms | < 1s (CPU), 30-60s (GPU) | ~300ms (Machines) |
| Persistence | Session-scoped filesystem | Session-scoped filesystem | Volumes (persistent) | Volumes (persistent) |
| SDK languages | Python, TypeScript | Python, TypeScript | Python | REST API (any language) |
| Streaming output | WebSocket + SDK | WebSocket + SDK | Generator-based | Logs API |
| Custom environments | Templates + Docker | Templates + Docker | Docker images | Docker images |
| GPU support | No | No | Yes (A100, H100) | Yes (L40S, A100) |
| Billing model | Included with Morph API | Per sandbox-second | Per CPU/GPU-second | Per Machine-second |
| Free tier | Yes (with Morph API) | 100 hours/month | $30/month credits | Free allowance |
| Best for | AI tools using Morph models | Standalone AI sandbox | GPU-heavy ML pipelines | Custom infrastructure |
Morph Sandbox SDK
Built for AI agent workflows. Sandboxes persist filesystem state across executions within a session, so an agent can write files, install packages, and iterate without re-creating state. Included with Morph API plans, so teams already using Morph for LLM inference pay nothing extra for sandboxing. Python and TypeScript SDKs with WebSocket streaming.
E2B
The most-established standalone sandbox API for AI tools. Clean SDK design, good documentation, active open-source community. Sub-500ms cold starts. The main tradeoff is that it is a separate service with separate billing. If you already use another LLM provider, E2B adds another vendor and another cost line.
Modal
Designed for ML workloads, not specifically for AI agent sandboxing. The strength is GPU support: you can spin up A100 or H100 instances on demand. The Python-first SDK uses decorators and generators instead of explicit sandbox lifecycle management. Good for data science and training pipelines. Overkill if you just need to run pytest in a container.
Fly.io
General-purpose compute platform with Machines API for on-demand containers. Not sandbox-specific, so you build isolation and lifecycle management yourself. The advantage is flexibility: full control over networking, volumes, regions, and scaling. The disadvantage is that you are building sandbox infrastructure instead of buying it.
Morph Sandbox SDK: Code Examples
The Morph Sandbox SDK is designed for the most common AI agent pattern: create a sandbox, write files, execute code, read results, iterate, destroy. Here are concrete examples.
Basic: Run untrusted Python code
import { MorphSandbox } from "@anthropic-ai/morph-sandbox";
const sandbox = await MorphSandbox.create({
apiKey: process.env.MORPH_API_KEY,
template: "python-3.12",
});
const result = await sandbox.exec(`python3 -c "
import json
data = {'status': 'ok', 'values': [1, 2, 3]}
print(json.dumps(data))
"`);
console.log(result.stdout); // {"status": "ok", "values": [1, 2, 3]}
console.log(result.exitCode); // 0
await sandbox.destroy();Multi-step: Agent writes, tests, and iterates
const sandbox = await MorphSandbox.create({
apiKey: process.env.MORPH_API_KEY,
template: "node-20",
timeout: 600,
});
// Agent writes code
await sandbox.filesystem.write("/app/index.ts", agentCode);
await sandbox.filesystem.write("/app/index.test.ts", agentTests);
// Install dependencies (filesystem persists between calls)
await sandbox.exec("cd /app && npm install");
// Run tests
let result = await sandbox.exec("cd /app && npx vitest run");
// If tests fail, agent fixes and re-runs
while (result.exitCode !== 0 && retries < 3) {
const fixedCode = await llm.fixCode(agentCode, result.stderr);
await sandbox.filesystem.write("/app/index.ts", fixedCode);
result = await sandbox.exec("cd /app && npx vitest run");
retries++;
}
// Pull generated artifacts
const coverage = await sandbox.filesystem.read("/app/coverage/lcov.info");
await sandbox.destroy();Streaming: Watch output in real time
const sandbox = await MorphSandbox.create({
apiKey: process.env.MORPH_API_KEY,
template: "python-3.12",
});
await sandbox.filesystem.write("/app/build.py", buildScript);
// Stream output as it happens
const stream = sandbox.stream("cd /app && python build.py");
for await (const event of stream) {
if (event.type === "stdout") {
process.stdout.write(event.data);
}
if (event.type === "stderr" && event.data.includes("ERROR")) {
stream.kill();
break;
}
}
await sandbox.destroy();Why Session-Scoped Persistence Matters
AI agents rarely execute code in a single step. A typical agent loop is: write code, install dependencies, run tests, read errors, fix code, re-run tests. Each step depends on the filesystem state from the previous step. Ephemeral sandboxes that reset between calls force the agent to reinstall dependencies and rewrite files every iteration, wasting both tokens and time.
Pricing Comparison
Sandbox costs scale with usage. Here is what each provider charges as of April 2026.
| Provider | Billing Unit | Price | Free Tier |
|---|---|---|---|
| Morph Sandbox | Included with API | Bundled with Morph plans | Yes (Morph free tier) |
| E2B | Per sandbox-second | $0.000056/s (~$0.20/hr) | 100 hrs/month |
| Modal | Per CPU-second | $0.000064/s (~$0.23/hr CPU) | $30/month credits |
| Fly.io | Per Machine-second | From $0.0000025/s (shared) | Free allowance |
For teams already using Morph for LLM inference, the sandbox is free. For teams using other LLM providers, E2B is the most straightforward standalone option. Modal makes sense if you need GPUs. Fly.io is cheapest at raw compute level but requires more integration work.
Cost at Scale
A typical AI coding agent creates 10-50 sandbox sessions per user per day, with each session running 5-20 executions over 2-10 minutes. At 1,000 daily active users and an average sandbox lifetime of 5 minutes:
Sandbox cost is typically 5-15% of total LLM API spend. The cost is real but not the dominant expense. Pick your sandbox provider based on integration quality and feature fit, not price alone.
Frequently Asked Questions
What is a sandbox API?
A sandbox API provides an isolated code execution environment accessible over HTTP or SDK. It runs untrusted code without risking the host system. The sandbox handles process isolation, filesystem containment, network restrictions, and resource limits. AI coding tools use sandbox APIs to run tests, evaluate code, install dependencies, and execute shell commands safely.
Why do AI agents need a sandbox API?
LLM-generated code is untrusted code. It can contain bugs, infinite loops, unintended filesystem operations, or security vulnerabilities. Running it directly on a production server risks data loss, resource exhaustion, and security breaches. A sandbox provides containment: code runs in isolation with controlled resources. If something breaks, the sandbox is destroyed with no impact on the host.
What is the difference between E2B and Morph Sandbox?
E2B is a standalone sandbox API with its own billing. Morph Sandbox is bundled with the Morph API platform. Both provide session-scoped filesystem persistence and SDKs for Python and TypeScript. Morph is the better choice if you already use Morph for LLM inference (zero additional cost). E2B is the better choice if you use a different LLM provider and want a dedicated, well-documented sandbox service.
How much does a sandbox API cost?
E2B charges ~$0.20/hour per sandbox. Modal charges ~$0.23/hour for CPU. Fly.io starts at ~$0.01/hour for shared instances. Morph Sandbox is included with Morph API plans. At production scale with 1,000 daily users, expect $200-400/month for standalone providers. Sandbox cost is typically 5-15% of LLM API spend.
Can I use a sandbox API for production AI applications?
Yes. All four providers covered here support production workloads. Key requirements: sub-1s cold starts for interactive use, concurrency handling for parallel agent executions, and monitoring/logging for debugging. Test with your actual agent workflow before committing to a provider.
What languages do sandbox APIs support?
All major providers support Python, JavaScript/TypeScript, Go, Rust, Java, and Ruby out of the box. Most also support custom Docker images, so you can bring any runtime. The constraint is not language support but pre-built templates: having a template for your language means faster cold starts and fewer dependency issues.
REST API vs SDK vs WebSocket: which pattern should I use?
REST for one-shot execution in simple integrations. SDK for production AI tools (handles sessions, retries, streaming). WebSocket for real-time output streaming in agent loops. Most production tools use SDKs that wrap WebSocket connections internally. Start with the SDK and drop to WebSocket only if you need custom streaming behavior.
How do I evaluate sandbox API providers?
Test five things: cold start time under your expected load, filesystem persistence across execution steps, language/runtime support for your stack, pricing at your projected scale, and lines of code to integrate. Run your actual agent workflow against each provider. Demo performance does not always match production performance.
Start Building with Morph Sandbox SDK
Morph Sandbox gives AI agents safe, persistent code execution with sub-300ms cold starts. Included free with Morph API. Python and TypeScript SDKs with WebSocket streaming.