Modal Sandbox: Using Modal for AI Agent Code Execution (2026)

How gVisor Isolation Works

Modal sandboxes run on gVisor, an open-source container runtime from Google. gVisor inserts a user-space kernel (called Sentry) between the containerized application and the host kernel. When code inside the sandbox makes a system call, Sentry intercepts it, validates it, and either handles it in user space or proxies a safe subset to the host.

This is different from how other sandbox providers handle isolation. E2B uses Firecracker microVMs, which boot an entire guest kernel on top of KVM hardware virtualization. Each E2B sandbox is a separate virtual machine. Standard Docker containers share the host kernel directly, providing namespace-level isolation but no syscall filtering.

gVisor (Modal)

User-space kernel intercepts system calls. Stronger than container namespaces, lighter than full VMs. The syscall interception layer adds some overhead per call. Some Linux syscalls are unsupported, which can break certain native libraries.

Firecracker microVM (E2B)

Each sandbox gets its own guest kernel running on KVM. Hardware-level isolation boundary. Untrusted code cannot reach the host kernel even if it escapes the container. Higher resource overhead per sandbox, but the strongest isolation guarantee available.

Container Namespace (Docker)

Shares the host kernel. Uses cgroups and namespaces for resource and visibility isolation. No syscall filtering by default. A kernel exploit in the container can compromise the host. Not recommended for running LLM-generated code in production.

gVisor Tradeoffs

gVisor provides a practical middle ground. It blocks most kernel exploits because untrusted code never touches the real kernel. But syscall interception adds latency to I/O-heavy workloads. Some Linux syscalls are not implemented in gVisor's Sentry, which means certain C extensions and native libraries fail with obscure errors. If your agent generates code that uses ptrace, raw sockets, or certain ioctl calls, it will not work inside a gVisor sandbox.

For most AI agent workloads (running Python scripts, executing test suites, installing pip packages), gVisor works fine. The edge cases appear when agents interact with low-level system interfaces or GPU drivers.

Modal Sandbox SDK

Modal's sandbox API is accessed primarily through the Python SDK. JavaScript/TypeScript and Go SDKs are now available in beta, though Modal Functions must still be defined in Python. Sandbox orchestration (creating containers, running commands, reading output) can happen from JS/TS/Go via the newer SDKs, but these lack the maturity and full feature parity of the Python SDK.

Basic: Create and run code in a Modal Sandbox

import modal

app = modal.App("sandbox-example")

@app.local_entrypoint()
def main():
    sandbox = modal.Sandbox.create(
        "bash", "-c", "echo hello from sandbox",
        app=app,
        timeout=300,  # 5 minute max
    )
    sandbox.wait()

    # Read output
    print(sandbox.stdout.read())
    # => "hello from sandbox\n"

Multi-step: Run agent-generated Python with dependencies

import modal

app = modal.App("agent-sandbox")
image = modal.Image.debian_slim().pip_install("pandas", "pytest")

@app.local_entrypoint()
def run_agent_code():
    sandbox = modal.Sandbox.create(
        app=app,
        image=image,
        timeout=600,
    )

    # Write agent-generated code into the sandbox
    proc = sandbox.exec(
        "python", "-c", """
import pandas as pd
df = pd.DataFrame({"x": [1,2,3], "y": [4,5,6]})
print(df.describe().to_string())
"""
    )
    proc.wait()
    print(proc.stdout.read())

    # Run a second command in the same sandbox
    # Filesystem state persists within the session
    proc2 = sandbox.exec("ls", "/tmp")
    proc2.wait()
    print(proc2.stdout.read())

    sandbox.terminate()

With Volume: Persist files across sandbox sessions

import modal

app = modal.App("sandbox-with-volume")
vol = modal.Volume.from_name("sandbox-data", create_if_missing=True)

@app.local_entrypoint()
def main():
    sandbox = modal.Sandbox.create(
        app=app,
        volumes={"/data": vol},
        timeout=300,
    )

    # Write results to the volume
    proc = sandbox.exec(
        "bash", "-c",
        "echo 'test results: all passed' > /data/results.txt"
    )
    proc.wait()

    # Volume data persists after sandbox terminates
    sandbox.terminate()

    # Read from volume in a new sandbox later
    sandbox2 = modal.Sandbox.create(
        app=app,
        volumes={"/data": vol},
    )
    proc = sandbox2.exec("cat", "/data/results.txt")
    proc.wait()
    print(proc.stdout.read())
    # => "test results: all passed"
    sandbox2.terminate()

SDK Maturity Gap

Modal's Python SDK is the most complete, but JavaScript/TypeScript and Go SDKs are now in beta. If your AI agent runs in a Node.js or TypeScript environment, you can use the beta JS/TS SDK for sandbox orchestration. However, Modal Functions must still be defined in Python, and the beta SDKs may not yet support all features available in the Python SDK. Check Modal's docs for current beta SDK coverage before committing.

Cold Starts and Performance

Cold start is the time between requesting a sandbox and having it ready to execute code. For interactive AI agents, anything over 1 second breaks the feel of a responsive tool. For background pipelines, 5-10 seconds is acceptable.

Modal advertises sub-second cold starts for CPU containers. In practice, this depends on the image size, whether the platform has pre-warmed capacity, and current load. Modal aggressively spins down idle containers to save costs, which means cold starts happen more frequently than on platforms that maintain warm pools.

Provider	Typical Cold Start	Warm Start	Notes
Morph Sandbox	< 300ms	< 50ms	Pre-warmed pool, purpose-built
E2B	< 500ms	< 100ms	Firecracker microVM, template caching
Modal	< 1s (claimed), 2-5s (observed)	< 200ms	Aggressive idle shutdown increases cold starts
Fly.io	~300ms (Machines)	< 100ms	Depends on region and image size

Why Cold Starts Matter More for Sandboxes

A sandbox is created for a specific agent task and destroyed when done. Unlike a long-running web server that starts once and handles thousands of requests, a sandbox has a short lifecycle: typically 1-15 minutes. If the platform takes 3 seconds to start each sandbox and your agent creates 30 sandboxes per user per day, that is 90 seconds of dead time per user, per day. Multiply by 1,000 users and the aggregate latency adds up.

Purpose-built sandbox providers optimize for this pattern. They maintain pre-warmed pools of ready containers, so "create sandbox" returns in under 300ms. Modal optimizes for a different pattern: long-running ML workloads where a 2-second cold start is irrelevant compared to a 10-minute training run.

Modal vs Purpose-Built Sandboxes

The core question is whether your sandbox needs are best served by a general compute platform that includes sandboxing, or by a platform built specifically for sandboxed code execution.

Feature	Modal	Morph Sandbox	E2B
Built for	ML infrastructure + sandboxing	AI agent code execution	AI agent code execution
Isolation	gVisor (syscall interception)	Container isolation	Firecracker microVM
SDK languages	Python (primary), JS/TS + Go (beta)	Python + TypeScript	Python + TypeScript
Cold start	1-5s typical	< 300ms	< 500ms
Filesystem persistence	Requires Volume mount	Session-scoped (automatic)	Session-scoped (automatic)
Streaming output	Generator-based (Python)	WebSocket + SDK	WebSocket + SDK
GPU support	Yes (A100, H100)	No	No
Billing	Per core-second + memory	Included with Morph API	Per sandbox-second
Custom images	modal.Image builder	Templates + Docker	Templates + Docker
Max concurrency	20,000 containers	Plan-dependent	20-100 (plan-dependent)

Where Modal Wins

GPU access. If your agent generates CUDA code, fine-tunes models, or runs inference as part of its execution loop, Modal is the only sandbox platform that handles this without you provisioning separate GPU infrastructure. The ability to spin up an A100 sandbox on demand and tear it down after one execution is genuinely useful for ML-heavy agent workflows.

High concurrency. Modal scales to 20,000 concurrent containers. If you are running thousands of sandboxes simultaneously for a code evaluation pipeline or benchmark suite, Modal handles the orchestration. E2B caps at 20-100 concurrent sandboxes depending on plan. Morph scales with your API plan.

Where Modal Falls Short

Cold starts. The 2-5 second range observed under load is workable for batch processing but uncomfortable for interactive agent tools. Users notice.

SDK maturity. Modal's Python SDK is fully featured, but the JavaScript/TypeScript and Go SDKs are still in beta with incomplete feature parity. Modal Functions must still be defined in Python. E2B and Morph both ship production-grade TypeScript SDKs that work natively in Node.js environments without caveats.

State management. Modal sandboxes do not persist filesystem state by default. You need to explicitly mount a Volume to keep files between sandbox sessions. E2B and Morph give you session-scoped persistence automatically. For agent workflows that write files, install packages, and iterate, this is the difference between "it just works" and "configure infrastructure first."

Pricing Breakdown

Modal bills per CPU-core-second and per GiB-second of memory. There is no per-sandbox fee. You pay for the compute resources your sandbox uses while it is running.

Resource	Price	Minimum	Notes
CPU	$0.0000394/core/second	0.125 cores	1 physical core = 2 vCPUs
Memory	$0.00000672/GiB/second	128 MiB	Scales with CPU allocation
GPU (A100 40GB)	$0.001036/second	1 GPU	~$3.73/hour
Regional multiplier	1.25x (US/EU)	N/A	Up to 2.5x for other regions

Cost at Agent Scale

A typical AI agent sandbox session uses 1 CPU core and 1 GiB of memory for 5 minutes. On Modal, that costs approximately $0.014 per session (with the 1.25x US multiplier). At 1,000 daily active users running 30 sandbox sessions each:

~$420/mo

Modal (1 core, 1 GiB, 5 min avg)

~$250/mo

E2B (same usage pattern)

Morph (included with API plan)

Modal's per-core-second pricing is competitive for GPU workloads where no other sandbox provider operates. For CPU-only agent sandboxes, it is more expensive than E2B and significantly more expensive than Morph (which bundles sandbox with LLM inference). The $30/month free tier covers roughly 2,100 sandbox sessions at the usage pattern above, enough for development and testing.

When to Use Modal Sandbox

Modal sandbox is the right choice in specific scenarios. It is not the default recommendation for AI agent code execution.

Use Modal When

Your agent needs GPU compute inside sandboxes. You already use Modal for inference or training and want one platform. You need 1,000+ concurrent sandboxes. Your agent framework is Python-native. Cold start latency of 1-5 seconds is acceptable for your use case.

Use a Dedicated Sandbox When

You need sub-500ms cold starts for interactive tools. Your agent framework runs in TypeScript/Node.js. You want session-scoped filesystem persistence without configuring volumes. You need WebSocket streaming for real-time output. Sandbox cost needs to be bundled with your LLM provider.

The GPU Exception

If your agent writes code that requires a GPU to run, Modal is the practical choice. No other sandbox provider gives you on-demand A100 or H100 access inside a sandboxed container. This matters for agents that generate ML training scripts, CUDA kernels, or inference code. For CPU-only agent sandboxes, which cover 90%+ of coding agent use cases, purpose-built providers are faster and cheaper.

The Migration Path

Teams that start with Modal sandboxes because they already use Modal for other workloads sometimes migrate to a dedicated sandbox provider as their agent product scales. The motivation is usually cold start latency: what was fine during development becomes a user-facing performance issue at scale. The migration is straightforward because sandbox APIs have similar semantics: create, exec, read output, destroy.

Frequently Asked Questions

What is Modal Sandbox?

Modal Sandbox is a feature of the Modal serverless compute platform that runs untrusted code in gVisor-isolated containers. It supports arbitrary commands, configurable timeouts, volume mounts for persistence, and custom images. The sandbox is managed through Modal's Python SDK.

How does Modal sandbox isolation work?

Modal uses gVisor, which runs a user-space kernel that intercepts system calls before they reach the host OS. This is stronger than standard container isolation (which shares the host kernel) but provides a thinner boundary than Firecracker microVMs (which run a separate guest kernel with hardware virtualization). For most AI agent workloads, gVisor isolation is sufficient.

What are Modal sandbox cold starts?

Modal claims sub-second cold starts for CPU containers. Observed cold starts under production load range from 1 to 5 seconds, depending on image size and platform capacity. Modal aggressively reclaims idle containers, which increases cold start frequency compared to platforms that maintain pre-warmed pools. Warm starts (reusing a running container) are under 200ms.

How much do Modal sandboxes cost?

CPU sandboxes cost $0.0000394 per core per second plus $0.00000672 per GiB per second of memory. With the US regional multiplier (1.25x), a 5-minute sandbox using 1 core and 1 GiB costs about $0.014. GPU sandboxes cost more: an A100 40GB sandbox runs approximately $3.73/hour. Modal includes $30/month in free credits on the Starter plan.

Is Modal sandbox Python-only?

No longer. Modal's primary SDK is Python, but JavaScript/TypeScript and Go SDKs are now available in beta. You can run any language inside the sandbox container (Node.js, Go, Rust, Java, etc.). Sandbox orchestration (creating, managing, reading output) can happen from JS/TS/Go via the beta SDKs. Modal Functions must still be defined in Python. The beta SDKs may not yet have full feature parity with the Python SDK.

How does Modal Sandbox compare to E2B?

E2B uses Firecracker microVMs for stronger isolation. E2B offers both Python and TypeScript SDKs with full production support. E2B provides session-scoped filesystem persistence by default. Modal uses gVisor, has Python as its primary SDK (with JS/TS and Go in beta), and requires explicit volume mounts for persistence. Modal supports GPU sandboxes; E2B does not. Modal scales to higher concurrency (20,000 vs 20-100 on E2B).

How does Modal Sandbox compare to Morph Sandbox?

Morph Sandbox is purpose-built for AI agent code execution with sub-300ms cold starts, automatic session-scoped persistence, Python and TypeScript SDKs, and WebSocket streaming. It is included free with Morph API plans. Modal Sandbox is part of a broader compute platform with higher cold starts, a Python-primary SDK (JS/TS and Go in beta), and separate per-second billing. Morph is the better fit for teams already using Morph for LLM inference. Modal is the better fit for teams that need GPU sandboxes or already run ML workloads on Modal.

Can Modal sandbox handle GPU workloads?

Yes. Modal supports A100, H100, and other NVIDIA GPUs inside sandboxed containers. This is Modal's primary differentiator for sandboxing. If your agent generates code that needs GPU compute, Modal is one of the few platforms that handles it without separate infrastructure. Dedicated sandbox providers like E2B and Morph focus on CPU workloads.

Try Morph Sandbox SDK

Purpose-built for AI agent code execution. Sub-300ms cold starts, session-scoped persistence, Python and TypeScript SDKs, WebSocket streaming. Included free with every Morph API plan.

Get API Key

Read the Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Modal Sandbox: Using Modal for AI Agent Code Execution (2026)

How gVisor Isolation Works

gVisor (Modal)

Firecracker microVM (E2B)

Container Namespace (Docker)

gVisor Tradeoffs

Cold Starts and Performance

Why Cold Starts Matter More for Sandboxes

Pricing Breakdown

Cost at Agent Scale

When to Use Modal Sandbox

Use Modal When

Use a Dedicated Sandbox When

The GPU Exception

The Migration Path

Frequently Asked Questions

What is Modal Sandbox?

How does Modal sandbox isolation work?

What are Modal sandbox cold starts?

How much do Modal sandboxes cost?

Is Modal sandbox Python-only?

How does Modal Sandbox compare to E2B?

How does Modal Sandbox compare to Morph Sandbox?

Can Modal sandbox handle GPU workloads?

Try Morph Sandbox SDK

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Modal Sandbox: Using Modal for AI Agent Code Execution (2026)

What Is Modal Sandbox

Modal's Position in the Sandbox Market

How gVisor Isolation Works

gVisor (Modal)

Firecracker microVM (E2B)

Container Namespace (Docker)

gVisor Tradeoffs

Modal Sandbox SDK

Basic: Create and run code in a Modal Sandbox

Multi-step: Run agent-generated Python with dependencies

With Volume: Persist files across sandbox sessions

SDK Maturity Gap

Cold Starts and Performance

Why Cold Starts Matter More for Sandboxes

Modal vs Purpose-Built Sandboxes

Where Modal Wins

Where Modal Falls Short

Pricing Breakdown

Cost at Agent Scale

When to Use Modal Sandbox

Use Modal When

Use a Dedicated Sandbox When

The GPU Exception

The Migration Path

Frequently Asked Questions

What is Modal Sandbox?

How does Modal sandbox isolation work?

What are Modal sandbox cold starts?

How much do Modal sandboxes cost?

Is Modal sandbox Python-only?

How does Modal Sandbox compare to E2B?

How does Modal Sandbox compare to Morph Sandbox?

Can Modal sandbox handle GPU workloads?

Try Morph Sandbox SDK