Modal Sandbox: Using Modal for AI Agent Code Execution (2026)

A technical breakdown of Modal's sandbox feature for AI agent code execution. Covers gVisor isolation, SDK options (Python, JS/TS, Go), cold starts, pricing, and limitations. Compares Modal sandboxes to purpose-built alternatives like Morph and E2B.

April 4, 2026 · 1 min read

What Is Modal Sandbox

Modal is a serverless compute platform built for Python workloads. It handles inference, training, batch processing, and scheduled jobs. In 2024, Modal added Sandboxes: isolated containers for running untrusted code from LLMs, users, or third-party sources.

The sandbox is a container that runs on gVisor, executes arbitrary commands, and returns stdout, stderr, and exit codes. You create it from the Modal Python SDK, run commands inside it, and tear it down when finished. Files written inside the sandbox stay within that container unless you mount a Modal Volume for persistence.

For teams already on Modal, adding sandboxed code execution takes a few lines of Python. For teams evaluating sandbox providers from scratch, the question is whether Modal's general-purpose compute platform is the right tool for a specific job: safe, fast, AI agent code execution.

gVisor
Isolation technology (syscall interception)
Python+
Primary SDK (JS/TS, Go in beta)
20,000
Max concurrent containers

Modal's Position in the Sandbox Market

Modal is one of four platforms commonly used for AI agent code execution, alongside E2B, Morph, and Fly.io. Modal is the only one that also handles GPU workloads natively. But it is also the only one not purpose-built for sandboxing, which shows up in cold start times, SDK coverage, and state management ergonomics.

How gVisor Isolation Works

Modal sandboxes run on gVisor, an open-source container runtime from Google. gVisor inserts a user-space kernel (called Sentry) between the containerized application and the host kernel. When code inside the sandbox makes a system call, Sentry intercepts it, validates it, and either handles it in user space or proxies a safe subset to the host.

This is different from how other sandbox providers handle isolation. E2B uses Firecracker microVMs, which boot an entire guest kernel on top of KVM hardware virtualization. Each E2B sandbox is a separate virtual machine. Standard Docker containers share the host kernel directly, providing namespace-level isolation but no syscall filtering.

gVisor (Modal)

User-space kernel intercepts system calls. Stronger than container namespaces, lighter than full VMs. The syscall interception layer adds some overhead per call. Some Linux syscalls are unsupported, which can break certain native libraries.

Firecracker microVM (E2B)

Each sandbox gets its own guest kernel running on KVM. Hardware-level isolation boundary. Untrusted code cannot reach the host kernel even if it escapes the container. Higher resource overhead per sandbox, but the strongest isolation guarantee available.

Container Namespace (Docker)

Shares the host kernel. Uses cgroups and namespaces for resource and visibility isolation. No syscall filtering by default. A kernel exploit in the container can compromise the host. Not recommended for running LLM-generated code in production.

gVisor Tradeoffs

gVisor provides a practical middle ground. It blocks most kernel exploits because untrusted code never touches the real kernel. But syscall interception adds latency to I/O-heavy workloads. Some Linux syscalls are not implemented in gVisor's Sentry, which means certain C extensions and native libraries fail with obscure errors. If your agent generates code that uses ptrace, raw sockets, or certain ioctl calls, it will not work inside a gVisor sandbox.

For most AI agent workloads (running Python scripts, executing test suites, installing pip packages), gVisor works fine. The edge cases appear when agents interact with low-level system interfaces or GPU drivers.

Cold Starts and Performance

Cold start is the time between requesting a sandbox and having it ready to execute code. For interactive AI agents, anything over 1 second breaks the feel of a responsive tool. For background pipelines, 5-10 seconds is acceptable.

Modal advertises sub-second cold starts for CPU containers. In practice, this depends on the image size, whether the platform has pre-warmed capacity, and current load. Modal aggressively spins down idle containers to save costs, which means cold starts happen more frequently than on platforms that maintain warm pools.

ProviderTypical Cold StartWarm StartNotes
Morph Sandbox< 300ms< 50msPre-warmed pool, purpose-built
E2B< 500ms< 100msFirecracker microVM, template caching
Modal< 1s (claimed), 2-5s (observed)< 200msAggressive idle shutdown increases cold starts
Fly.io~300ms (Machines)< 100msDepends on region and image size

Why Cold Starts Matter More for Sandboxes

A sandbox is created for a specific agent task and destroyed when done. Unlike a long-running web server that starts once and handles thousands of requests, a sandbox has a short lifecycle: typically 1-15 minutes. If the platform takes 3 seconds to start each sandbox and your agent creates 30 sandboxes per user per day, that is 90 seconds of dead time per user, per day. Multiply by 1,000 users and the aggregate latency adds up.

Purpose-built sandbox providers optimize for this pattern. They maintain pre-warmed pools of ready containers, so "create sandbox" returns in under 300ms. Modal optimizes for a different pattern: long-running ML workloads where a 2-second cold start is irrelevant compared to a 10-minute training run.

Pricing Breakdown

Modal bills per CPU-core-second and per GiB-second of memory. There is no per-sandbox fee. You pay for the compute resources your sandbox uses while it is running.

ResourcePriceMinimumNotes
CPU$0.0000394/core/second0.125 cores1 physical core = 2 vCPUs
Memory$0.00000672/GiB/second128 MiBScales with CPU allocation
GPU (A100 40GB)$0.001036/second1 GPU~$3.73/hour
Regional multiplier1.25x (US/EU)N/AUp to 2.5x for other regions

Cost at Agent Scale

A typical AI agent sandbox session uses 1 CPU core and 1 GiB of memory for 5 minutes. On Modal, that costs approximately $0.014 per session (with the 1.25x US multiplier). At 1,000 daily active users running 30 sandbox sessions each:

~$420/mo
Modal (1 core, 1 GiB, 5 min avg)
~$250/mo
E2B (same usage pattern)
$0
Morph (included with API plan)

Modal's per-core-second pricing is competitive for GPU workloads where no other sandbox provider operates. For CPU-only agent sandboxes, it is more expensive than E2B and significantly more expensive than Morph (which bundles sandbox with LLM inference). The $30/month free tier covers roughly 2,100 sandbox sessions at the usage pattern above, enough for development and testing.

When to Use Modal Sandbox

Modal sandbox is the right choice in specific scenarios. It is not the default recommendation for AI agent code execution.

Use Modal When

Your agent needs GPU compute inside sandboxes. You already use Modal for inference or training and want one platform. You need 1,000+ concurrent sandboxes. Your agent framework is Python-native. Cold start latency of 1-5 seconds is acceptable for your use case.

Use a Dedicated Sandbox When

You need sub-500ms cold starts for interactive tools. Your agent framework runs in TypeScript/Node.js. You want session-scoped filesystem persistence without configuring volumes. You need WebSocket streaming for real-time output. Sandbox cost needs to be bundled with your LLM provider.

The GPU Exception

If your agent writes code that requires a GPU to run, Modal is the practical choice. No other sandbox provider gives you on-demand A100 or H100 access inside a sandboxed container. This matters for agents that generate ML training scripts, CUDA kernels, or inference code. For CPU-only agent sandboxes, which cover 90%+ of coding agent use cases, purpose-built providers are faster and cheaper.

The Migration Path

Teams that start with Modal sandboxes because they already use Modal for other workloads sometimes migrate to a dedicated sandbox provider as their agent product scales. The motivation is usually cold start latency: what was fine during development becomes a user-facing performance issue at scale. The migration is straightforward because sandbox APIs have similar semantics: create, exec, read output, destroy.

Frequently Asked Questions

What is Modal Sandbox?

Modal Sandbox is a feature of the Modal serverless compute platform that runs untrusted code in gVisor-isolated containers. It supports arbitrary commands, configurable timeouts, volume mounts for persistence, and custom images. The sandbox is managed through Modal's Python SDK.

How does Modal sandbox isolation work?

Modal uses gVisor, which runs a user-space kernel that intercepts system calls before they reach the host OS. This is stronger than standard container isolation (which shares the host kernel) but provides a thinner boundary than Firecracker microVMs (which run a separate guest kernel with hardware virtualization). For most AI agent workloads, gVisor isolation is sufficient.

What are Modal sandbox cold starts?

Modal claims sub-second cold starts for CPU containers. Observed cold starts under production load range from 1 to 5 seconds, depending on image size and platform capacity. Modal aggressively reclaims idle containers, which increases cold start frequency compared to platforms that maintain pre-warmed pools. Warm starts (reusing a running container) are under 200ms.

How much do Modal sandboxes cost?

CPU sandboxes cost $0.0000394 per core per second plus $0.00000672 per GiB per second of memory. With the US regional multiplier (1.25x), a 5-minute sandbox using 1 core and 1 GiB costs about $0.014. GPU sandboxes cost more: an A100 40GB sandbox runs approximately $3.73/hour. Modal includes $30/month in free credits on the Starter plan.

Is Modal sandbox Python-only?

No longer. Modal's primary SDK is Python, but JavaScript/TypeScript and Go SDKs are now available in beta. You can run any language inside the sandbox container (Node.js, Go, Rust, Java, etc.). Sandbox orchestration (creating, managing, reading output) can happen from JS/TS/Go via the beta SDKs. Modal Functions must still be defined in Python. The beta SDKs may not yet have full feature parity with the Python SDK.

How does Modal Sandbox compare to E2B?

E2B uses Firecracker microVMs for stronger isolation. E2B offers both Python and TypeScript SDKs with full production support. E2B provides session-scoped filesystem persistence by default. Modal uses gVisor, has Python as its primary SDK (with JS/TS and Go in beta), and requires explicit volume mounts for persistence. Modal supports GPU sandboxes; E2B does not. Modal scales to higher concurrency (20,000 vs 20-100 on E2B).

How does Modal Sandbox compare to Morph Sandbox?

Morph Sandbox is purpose-built for AI agent code execution with sub-300ms cold starts, automatic session-scoped persistence, Python and TypeScript SDKs, and WebSocket streaming. It is included free with Morph API plans. Modal Sandbox is part of a broader compute platform with higher cold starts, a Python-primary SDK (JS/TS and Go in beta), and separate per-second billing. Morph is the better fit for teams already using Morph for LLM inference. Modal is the better fit for teams that need GPU sandboxes or already run ML workloads on Modal.

Can Modal sandbox handle GPU workloads?

Yes. Modal supports A100, H100, and other NVIDIA GPUs inside sandboxed containers. This is Modal's primary differentiator for sandboxing. If your agent generates code that needs GPU compute, Modal is one of the few platforms that handles it without separate infrastructure. Dedicated sandbox providers like E2B and Morph focus on CPU workloads.

Try Morph Sandbox SDK

Purpose-built for AI agent code execution. Sub-300ms cold starts, session-scoped persistence, Python and TypeScript SDKs, WebSocket streaming. Included free with every Morph API plan.