What Is Modal Sandbox
Modal is a serverless compute platform built for Python workloads. It handles inference, training, batch processing, and scheduled jobs. In 2024, Modal added Sandboxes: isolated containers for running untrusted code from LLMs, users, or third-party sources.
The sandbox is a container that runs on gVisor, executes arbitrary commands, and returns stdout, stderr, and exit codes. You create it from the Modal Python SDK, run commands inside it, and tear it down when finished. Files written inside the sandbox stay within that container unless you mount a Modal Volume for persistence.
For teams already on Modal, adding sandboxed code execution takes a few lines of Python. For teams evaluating sandbox providers from scratch, the question is whether Modal's general-purpose compute platform is the right tool for a specific job: safe, fast, AI agent code execution.
Modal's Position in the Sandbox Market
Modal is one of four platforms commonly used for AI agent code execution, alongside E2B, Morph, and Fly.io. Modal is the only one that also handles GPU workloads natively. But it is also the only one not purpose-built for sandboxing, which shows up in cold start times, SDK coverage, and state management ergonomics.
How gVisor Isolation Works
Modal sandboxes run on gVisor, an open-source container runtime from Google. gVisor inserts a user-space kernel (called Sentry) between the containerized application and the host kernel. When code inside the sandbox makes a system call, Sentry intercepts it, validates it, and either handles it in user space or proxies a safe subset to the host.
This is different from how other sandbox providers handle isolation. E2B uses Firecracker microVMs, which boot an entire guest kernel on top of KVM hardware virtualization. Each E2B sandbox is a separate virtual machine. Standard Docker containers share the host kernel directly, providing namespace-level isolation but no syscall filtering.
gVisor (Modal)
User-space kernel intercepts system calls. Stronger than container namespaces, lighter than full VMs. The syscall interception layer adds some overhead per call. Some Linux syscalls are unsupported, which can break certain native libraries.
Firecracker microVM (E2B)
Each sandbox gets its own guest kernel running on KVM. Hardware-level isolation boundary. Untrusted code cannot reach the host kernel even if it escapes the container. Higher resource overhead per sandbox, but the strongest isolation guarantee available.
Container Namespace (Docker)
Shares the host kernel. Uses cgroups and namespaces for resource and visibility isolation. No syscall filtering by default. A kernel exploit in the container can compromise the host. Not recommended for running LLM-generated code in production.
gVisor Tradeoffs
gVisor provides a practical middle ground. It blocks most kernel exploits because untrusted code never touches the real kernel. But syscall interception adds latency to I/O-heavy workloads. Some Linux syscalls are not implemented in gVisor's Sentry, which means certain C extensions and native libraries fail with obscure errors. If your agent generates code that uses ptrace, raw sockets, or certain ioctl calls, it will not work inside a gVisor sandbox.
For most AI agent workloads (running Python scripts, executing test suites, installing pip packages), gVisor works fine. The edge cases appear when agents interact with low-level system interfaces or GPU drivers.
Modal Sandbox SDK
Modal's sandbox API is accessed primarily through the Python SDK. JavaScript/TypeScript and Go SDKs are now available in beta, though Modal Functions must still be defined in Python. Sandbox orchestration (creating containers, running commands, reading output) can happen from JS/TS/Go via the newer SDKs, but these lack the maturity and full feature parity of the Python SDK.
Basic: Create and run code in a Modal Sandbox
import modal
app = modal.App("sandbox-example")
@app.local_entrypoint()
def main():
sandbox = modal.Sandbox.create(
"bash", "-c", "echo hello from sandbox",
app=app,
timeout=300, # 5 minute max
)
sandbox.wait()
# Read output
print(sandbox.stdout.read())
# => "hello from sandbox\n"Multi-step: Run agent-generated Python with dependencies
import modal
app = modal.App("agent-sandbox")
image = modal.Image.debian_slim().pip_install("pandas", "pytest")
@app.local_entrypoint()
def run_agent_code():
sandbox = modal.Sandbox.create(
app=app,
image=image,
timeout=600,
)
# Write agent-generated code into the sandbox
proc = sandbox.exec(
"python", "-c", """
import pandas as pd
df = pd.DataFrame({"x": [1,2,3], "y": [4,5,6]})
print(df.describe().to_string())
"""
)
proc.wait()
print(proc.stdout.read())
# Run a second command in the same sandbox
# Filesystem state persists within the session
proc2 = sandbox.exec("ls", "/tmp")
proc2.wait()
print(proc2.stdout.read())
sandbox.terminate()With Volume: Persist files across sandbox sessions
import modal
app = modal.App("sandbox-with-volume")
vol = modal.Volume.from_name("sandbox-data", create_if_missing=True)
@app.local_entrypoint()
def main():
sandbox = modal.Sandbox.create(
app=app,
volumes={"/data": vol},
timeout=300,
)
# Write results to the volume
proc = sandbox.exec(
"bash", "-c",
"echo 'test results: all passed' > /data/results.txt"
)
proc.wait()
# Volume data persists after sandbox terminates
sandbox.terminate()
# Read from volume in a new sandbox later
sandbox2 = modal.Sandbox.create(
app=app,
volumes={"/data": vol},
)
proc = sandbox2.exec("cat", "/data/results.txt")
proc.wait()
print(proc.stdout.read())
# => "test results: all passed"
sandbox2.terminate()SDK Maturity Gap
Modal's Python SDK is the most complete, but JavaScript/TypeScript and Go SDKs are now in beta. If your AI agent runs in a Node.js or TypeScript environment, you can use the beta JS/TS SDK for sandbox orchestration. However, Modal Functions must still be defined in Python, and the beta SDKs may not yet support all features available in the Python SDK. Check Modal's docs for current beta SDK coverage before committing.
Cold Starts and Performance
Cold start is the time between requesting a sandbox and having it ready to execute code. For interactive AI agents, anything over 1 second breaks the feel of a responsive tool. For background pipelines, 5-10 seconds is acceptable.
Modal advertises sub-second cold starts for CPU containers. In practice, this depends on the image size, whether the platform has pre-warmed capacity, and current load. Modal aggressively spins down idle containers to save costs, which means cold starts happen more frequently than on platforms that maintain warm pools.
| Provider | Typical Cold Start | Warm Start | Notes |
|---|---|---|---|
| Morph Sandbox | < 300ms | < 50ms | Pre-warmed pool, purpose-built |
| E2B | < 500ms | < 100ms | Firecracker microVM, template caching |
| Modal | < 1s (claimed), 2-5s (observed) | < 200ms | Aggressive idle shutdown increases cold starts |
| Fly.io | ~300ms (Machines) | < 100ms | Depends on region and image size |
Why Cold Starts Matter More for Sandboxes
A sandbox is created for a specific agent task and destroyed when done. Unlike a long-running web server that starts once and handles thousands of requests, a sandbox has a short lifecycle: typically 1-15 minutes. If the platform takes 3 seconds to start each sandbox and your agent creates 30 sandboxes per user per day, that is 90 seconds of dead time per user, per day. Multiply by 1,000 users and the aggregate latency adds up.
Purpose-built sandbox providers optimize for this pattern. They maintain pre-warmed pools of ready containers, so "create sandbox" returns in under 300ms. Modal optimizes for a different pattern: long-running ML workloads where a 2-second cold start is irrelevant compared to a 10-minute training run.
Modal vs Purpose-Built Sandboxes
The core question is whether your sandbox needs are best served by a general compute platform that includes sandboxing, or by a platform built specifically for sandboxed code execution.
| Feature | Modal | Morph Sandbox | E2B |
|---|---|---|---|
| Built for | ML infrastructure + sandboxing | AI agent code execution | AI agent code execution |
| Isolation | gVisor (syscall interception) | Container isolation | Firecracker microVM |
| SDK languages | Python (primary), JS/TS + Go (beta) | Python + TypeScript | Python + TypeScript |
| Cold start | 1-5s typical | < 300ms | < 500ms |
| Filesystem persistence | Requires Volume mount | Session-scoped (automatic) | Session-scoped (automatic) |
| Streaming output | Generator-based (Python) | WebSocket + SDK | WebSocket + SDK |
| GPU support | Yes (A100, H100) | No | No |
| Billing | Per core-second + memory | Included with Morph API | Per sandbox-second |
| Custom images | modal.Image builder | Templates + Docker | Templates + Docker |
| Max concurrency | 20,000 containers | Plan-dependent | 20-100 (plan-dependent) |
Where Modal Wins
GPU access. If your agent generates CUDA code, fine-tunes models, or runs inference as part of its execution loop, Modal is the only sandbox platform that handles this without you provisioning separate GPU infrastructure. The ability to spin up an A100 sandbox on demand and tear it down after one execution is genuinely useful for ML-heavy agent workflows.
High concurrency. Modal scales to 20,000 concurrent containers. If you are running thousands of sandboxes simultaneously for a code evaluation pipeline or benchmark suite, Modal handles the orchestration. E2B caps at 20-100 concurrent sandboxes depending on plan. Morph scales with your API plan.
Where Modal Falls Short
Cold starts. The 2-5 second range observed under load is workable for batch processing but uncomfortable for interactive agent tools. Users notice.
SDK maturity. Modal's Python SDK is fully featured, but the JavaScript/TypeScript and Go SDKs are still in beta with incomplete feature parity. Modal Functions must still be defined in Python. E2B and Morph both ship production-grade TypeScript SDKs that work natively in Node.js environments without caveats.
State management. Modal sandboxes do not persist filesystem state by default. You need to explicitly mount a Volume to keep files between sandbox sessions. E2B and Morph give you session-scoped persistence automatically. For agent workflows that write files, install packages, and iterate, this is the difference between "it just works" and "configure infrastructure first."
Pricing Breakdown
Modal bills per CPU-core-second and per GiB-second of memory. There is no per-sandbox fee. You pay for the compute resources your sandbox uses while it is running.
| Resource | Price | Minimum | Notes |
|---|---|---|---|
| CPU | $0.0000394/core/second | 0.125 cores | 1 physical core = 2 vCPUs |
| Memory | $0.00000672/GiB/second | 128 MiB | Scales with CPU allocation |
| GPU (A100 40GB) | $0.001036/second | 1 GPU | ~$3.73/hour |
| Regional multiplier | 1.25x (US/EU) | N/A | Up to 2.5x for other regions |
Cost at Agent Scale
A typical AI agent sandbox session uses 1 CPU core and 1 GiB of memory for 5 minutes. On Modal, that costs approximately $0.014 per session (with the 1.25x US multiplier). At 1,000 daily active users running 30 sandbox sessions each:
Modal's per-core-second pricing is competitive for GPU workloads where no other sandbox provider operates. For CPU-only agent sandboxes, it is more expensive than E2B and significantly more expensive than Morph (which bundles sandbox with LLM inference). The $30/month free tier covers roughly 2,100 sandbox sessions at the usage pattern above, enough for development and testing.
When to Use Modal Sandbox
Modal sandbox is the right choice in specific scenarios. It is not the default recommendation for AI agent code execution.
Use Modal When
Your agent needs GPU compute inside sandboxes. You already use Modal for inference or training and want one platform. You need 1,000+ concurrent sandboxes. Your agent framework is Python-native. Cold start latency of 1-5 seconds is acceptable for your use case.
Use a Dedicated Sandbox When
You need sub-500ms cold starts for interactive tools. Your agent framework runs in TypeScript/Node.js. You want session-scoped filesystem persistence without configuring volumes. You need WebSocket streaming for real-time output. Sandbox cost needs to be bundled with your LLM provider.
The GPU Exception
If your agent writes code that requires a GPU to run, Modal is the practical choice. No other sandbox provider gives you on-demand A100 or H100 access inside a sandboxed container. This matters for agents that generate ML training scripts, CUDA kernels, or inference code. For CPU-only agent sandboxes, which cover 90%+ of coding agent use cases, purpose-built providers are faster and cheaper.
The Migration Path
Teams that start with Modal sandboxes because they already use Modal for other workloads sometimes migrate to a dedicated sandbox provider as their agent product scales. The motivation is usually cold start latency: what was fine during development becomes a user-facing performance issue at scale. The migration is straightforward because sandbox APIs have similar semantics: create, exec, read output, destroy.
Frequently Asked Questions
What is Modal Sandbox?
Modal Sandbox is a feature of the Modal serverless compute platform that runs untrusted code in gVisor-isolated containers. It supports arbitrary commands, configurable timeouts, volume mounts for persistence, and custom images. The sandbox is managed through Modal's Python SDK.
How does Modal sandbox isolation work?
Modal uses gVisor, which runs a user-space kernel that intercepts system calls before they reach the host OS. This is stronger than standard container isolation (which shares the host kernel) but provides a thinner boundary than Firecracker microVMs (which run a separate guest kernel with hardware virtualization). For most AI agent workloads, gVisor isolation is sufficient.
What are Modal sandbox cold starts?
Modal claims sub-second cold starts for CPU containers. Observed cold starts under production load range from 1 to 5 seconds, depending on image size and platform capacity. Modal aggressively reclaims idle containers, which increases cold start frequency compared to platforms that maintain pre-warmed pools. Warm starts (reusing a running container) are under 200ms.
How much do Modal sandboxes cost?
CPU sandboxes cost $0.0000394 per core per second plus $0.00000672 per GiB per second of memory. With the US regional multiplier (1.25x), a 5-minute sandbox using 1 core and 1 GiB costs about $0.014. GPU sandboxes cost more: an A100 40GB sandbox runs approximately $3.73/hour. Modal includes $30/month in free credits on the Starter plan.
Is Modal sandbox Python-only?
No longer. Modal's primary SDK is Python, but JavaScript/TypeScript and Go SDKs are now available in beta. You can run any language inside the sandbox container (Node.js, Go, Rust, Java, etc.). Sandbox orchestration (creating, managing, reading output) can happen from JS/TS/Go via the beta SDKs. Modal Functions must still be defined in Python. The beta SDKs may not yet have full feature parity with the Python SDK.
How does Modal Sandbox compare to E2B?
E2B uses Firecracker microVMs for stronger isolation. E2B offers both Python and TypeScript SDKs with full production support. E2B provides session-scoped filesystem persistence by default. Modal uses gVisor, has Python as its primary SDK (with JS/TS and Go in beta), and requires explicit volume mounts for persistence. Modal supports GPU sandboxes; E2B does not. Modal scales to higher concurrency (20,000 vs 20-100 on E2B).
How does Modal Sandbox compare to Morph Sandbox?
Morph Sandbox is purpose-built for AI agent code execution with sub-300ms cold starts, automatic session-scoped persistence, Python and TypeScript SDKs, and WebSocket streaming. It is included free with Morph API plans. Modal Sandbox is part of a broader compute platform with higher cold starts, a Python-primary SDK (JS/TS and Go in beta), and separate per-second billing. Morph is the better fit for teams already using Morph for LLM inference. Modal is the better fit for teams that need GPU sandboxes or already run ML workloads on Modal.
Can Modal sandbox handle GPU workloads?
Yes. Modal supports A100, H100, and other NVIDIA GPUs inside sandboxed containers. This is Modal's primary differentiator for sandboxing. If your agent generates code that needs GPU compute, Modal is one of the few platforms that handles it without separate infrastructure. Dedicated sandbox providers like E2B and Morph focus on CPU workloads.
Try Morph Sandbox SDK
Purpose-built for AI agent code execution. Sub-300ms cold starts, session-scoped persistence, Python and TypeScript SDKs, WebSocket streaming. Included free with every Morph API plan.