nsjail: Lightweight Linux Sandboxing for AI Code Execution (2026)

A technical deep-dive into nsjail, Google's lightweight process isolation tool. Covers Linux namespaces, cgroups, seccomp-bpf, Kafel policy language, protobuf configuration, and production deployment patterns. Includes comparison with Docker, gVisor, and managed sandbox APIs for AI agent code execution.

April 4, 2026 ยท 1 min read

What Is nsjail

nsjail is a process isolation tool for Linux. It wraps a single process (or process tree) in a combination of kernel-level isolation mechanisms: namespaces, cgroups, rlimits, and seccomp-bpf syscall filters. The result is a sandbox that restricts what the process can see, do, and consume, without a container runtime or virtual machine.

Google developed nsjail internally and released it as open source. It is not an official Google product, but it has been continuously maintained since 2015. Google uses it for hosting CTF (Capture The Flag) security challenges, where contestants upload and execute arbitrary code. That origin matters: nsjail was built for adversarial workloads, not just development convenience.

<20ms
Process launch time
0
Daemon dependencies
7
Linux namespace types supported

The tool operates as a single static binary. You point it at an executable, specify isolation constraints, and it forks a sandboxed child process. No daemon, no image registry, no orchestrator. This simplicity is the core design decision: nsjail does one thing (process isolation) and composes with whatever infrastructure you already have.

Who uses nsjail in production

Windmill uses nsjail to sandbox Python and Go workflow executions. Each job runs in its own nsjail instance with filesystem isolation, network restrictions, and resource limits. Google uses nsjail for CTF hosting and internal security testing. Multiple code evaluation platforms use it for running untrusted submissions in competitive programming and hiring pipelines.

Architecture: How nsjail Isolates Processes

nsjail layers four Linux kernel mechanisms. Each addresses a different class of escape or abuse.

Linux Namespaces

Isolates the process's view of system resources. PID namespace gives it its own process tree. Mount namespace controls filesystem visibility. Network namespace isolates network interfaces. User namespace maps UID/GID. UTS namespace isolates hostname. IPC namespace isolates inter-process communication. Cgroup namespace isolates cgroup visibility.

Cgroups (v1 and v2)

Limits resource consumption. Memory limits prevent OOM-killing the host. CPU limits prevent monopolizing cores. PID limits cap the number of processes the sandbox can spawn. Net_cls tags network traffic for QoS enforcement. Cgroups ensure a runaway process cannot degrade the host.

Seccomp-BPF

Filters system calls at the kernel boundary. Before the kernel executes any syscall from the sandboxed process, the BPF program inspects the call number and arguments. Disallowed syscalls are killed immediately. This blocks kernel exploits that rely on vulnerable syscall handlers.

Resource Limits (rlimits)

Per-process caps on file descriptors, virtual memory, CPU time, and file sizes. Unlike cgroups (which limit the group), rlimits constrain individual processes. nsjail applies both for defense in depth: cgroups for the sandbox as a whole, rlimits per process inside it.

Namespace Isolation in Detail

nsjail supports all seven Linux namespace types. Each is independently toggleable via configuration flags:

NamespaceIsolatesFlag
PIDProcess ID tree (sandbox sees PID 1 as init)clone_newpid
MountFilesystem mounts (chroot/pivot_root)clone_newns
NetworkNetwork interfaces, routing tables, firewall rulesclone_newnet
UserUID/GID mappings (enables rootless operation)clone_newuser
UTSHostname and domain nameclone_newuts
IPCSystem V IPC, POSIX message queuesclone_newipc
CgroupCgroup root directory visibilityclone_newcgroup

Filesystem Constraints

nsjail uses pivot_root() (preferred) or chroot() to change the filesystem root for the sandboxed process. You define explicit mount points: which host paths are visible, whether they are read-only or read-write, and where tmpfs or proc filesystems are mounted. Everything not explicitly mounted is invisible.

Typical mount configuration

mount {
  src: "/usr"
  dst: "/usr"
  is_bind: true
  rw: false
}

mount {
  src: "/lib"
  dst: "/lib"
  is_bind: true
  rw: false
}

mount {
  dst: "/tmp"
  fstype: "tmpfs"
  rw: true
  is_bind: false
}

mount {
  dst: "/proc"
  fstype: "proc"
  rw: false
}

# No /home, /root, /etc/shadow visible to sandbox

Configuration: Protobuf Config Files

nsjail accepts command-line flags for simple cases and protobuf-based configuration files for production deployments. The protobuf format is defined in config.proto in the nsjail repository. It covers every isolation knob: namespaces, mounts, cgroups, rlimits, seccomp policies, UID/GID mappings, and execution parameters.

Complete nsjail config for sandboxing Python execution

name: "python-sandbox"
description: "Sandbox for executing untrusted Python code"

mode: ONCE
hostname: "sandbox"
cwd: "/app"

time_limit: 30
max_cpus: 1

clone_newuser: true
clone_newnet: true
clone_newns: true
clone_newpid: true
clone_newipc: true
clone_newuts: true
clone_newcgroup: true

rlimit_as_type: HARD
rlimit_cpu_type: HARD
rlimit_fsize: 64    # Max file size: 64 MB
rlimit_nofile: 128  # Max open file descriptors

uidmap {
  inside_id: "1000"
  outside_id: ""
  count: 1
}
gidmap {
  inside_id: "1000"
  outside_id: ""
  count: 1
}

cgroup_mem_max: 536870912    # 512 MB
cgroup_pids_max: 64          # Max 64 processes
cgroup_cpu_ms_per_sec: 500   # 50% of one CPU core

mount {
  src: "/usr"
  dst: "/usr"
  is_bind: true
  rw: false
}
mount {
  src: "/lib"
  dst: "/lib"
  is_bind: true
  rw: false
}
mount {
  src: "/lib64"
  dst: "/lib64"
  is_bind: true
  rw: false
  mandatory: false
}
mount {
  src: "/bin"
  dst: "/bin"
  is_bind: true
  rw: false
}
mount {
  dst: "/tmp"
  fstype: "tmpfs"
  rw: true
}
mount {
  dst: "/app"
  fstype: "tmpfs"
  rw: true
}
mount {
  dst: "/proc"
  fstype: "proc"
  rw: false
}
mount {
  src: "/dev/null"
  dst: "/dev/null"
  is_bind: true
  rw: false
}
mount {
  src: "/dev/urandom"
  dst: "/dev/urandom"
  is_bind: true
  rw: false
}

exec_bin {
  path: "/usr/bin/python3"
  arg: "-c"
}

Command-Line Usage

For quick tests and simple sandboxing, nsjail's CLI flags work without a config file. This is useful during development, though production deployments should use config files for reproducibility.

nsjail CLI: sandbox a Python script

# Basic: run a Python script with network isolation
nsjail \
  --mode once \
  --chroot / \
  --user 65534 \
  --group 65534 \
  --time_limit 30 \
  --rlimit_as 512 \
  --rlimit_cpu 10 \
  --rlimit_nofile 64 \
  --clone_newnet \
  --clone_newpid \
  --clone_newns \
  --clone_newuser \
  -- /usr/bin/python3 /tmp/untrusted_script.py

# Using a protobuf config file
nsjail --config /etc/nsjail/python-sandbox.cfg \
  -- /usr/bin/python3 -c "print('hello from sandbox')"

# TCP listener mode: accept connections, sandbox each
nsjail --mode listen_tcp --port 9999 \
  --config /etc/nsjail/service.cfg \
  -- /usr/local/bin/my_service

Execution Modes

nsjail supports three execution modes, each suited to a different workload pattern:

ONCE

Runs the command once and exits. The standard mode for code execution: launch, run, collect output, done. Used by code evaluation pipelines and AI agent sandboxes.

RERUN

Re-executes the command each time it exits. Useful for persistent services that should restart on crash. The sandbox constraints persist across restarts.

LISTEN_TCP

Binds to a TCP port and forks a new sandboxed process for each incoming connection. Used for network services like HTTP handlers and CTF challenges. Each connection gets its own isolated process.

Seccomp-BPF and the Kafel Policy Language

Seccomp-BPF is the strongest isolation layer nsjail provides. Namespaces control visibility. Cgroups control resources. Seccomp-BPF controls what the process can ask the kernel to do. A sandboxed process that can only call read, write, mmap, and exit_group cannot open files, fork children, create sockets, or exploit kernel vulnerabilities through obscure syscall handlers.

Writing raw BPF bytecode is tedious and error-prone. nsjail uses Kafel, a domain-specific language that compiles human-readable rules into BPF programs. Kafel supports named policies, argument-level filtering, and policy composition.

Kafel policy: restrictive sandbox for code execution

POLICY code_execution {
  /* File I/O */
  ALLOW {
    read, write, readv, writev,
    open, openat, close,
    stat, fstat, lstat, newfstatat,
    lseek, access, faccessat
  }

  /* Memory management */
  ALLOW {
    mmap, munmap, mprotect, mremap, brk,
    madvise
  }

  /* Process lifecycle */
  ALLOW {
    clone, fork, vfork, execve,
    wait4, exit, exit_group,
    getpid, getppid, gettid
  }

  /* Signals */
  ALLOW {
    rt_sigaction, rt_sigprocmask,
    rt_sigreturn, kill
  }

  /* Filesystem metadata */
  ALLOW {
    getcwd, readlink, readlinkat,
    getdents, getdents64
  }

  /* Misc required for Python/Node */
  ALLOW {
    futex, clock_gettime, clock_nanosleep,
    nanosleep, getrandom, pipe, pipe2,
    dup, dup2, dup3, fcntl, ioctl,
    set_tid_address, set_robust_list,
    sched_getaffinity, sched_yield,
    arch_prctl, prctl, prlimit64
  }

  /* Block everything else */
  DEFAULT KILL
}

USE code_execution DEFAULT KILL

Syscall filtering is defense in depth

Seccomp-BPF policies do not replace namespace isolation. They complement it. A namespace escape exploit still needs to make syscalls, and seccomp-BPF blocks the syscalls the exploit relies on. Conversely, a seccomp bypass (rare, since the filter runs in kernel space) is contained by namespace isolation. The combination is stronger than either alone.

Argument-Level Filtering

Kafel can filter based on syscall arguments, not just syscall numbers. This lets you allow open() for reading but block open() for writing, or allow clone() for threads but block it for new processes.

Kafel: argument-level syscall filtering

POLICY restricted_io {
  /* Allow open() only for read (O_RDONLY = 0) */
  ALLOW {
    open { arg1 == 0 }
    openat { arg2 == 0 }
  }

  /* Allow clone() for threads only (CLONE_THREAD flag) */
  ALLOW {
    clone { arg0 & 0x00010000 }  /* CLONE_THREAD */
  }

  /* Allow socket() only for AF_UNIX (local IPC) */
  ALLOW {
    socket { arg0 == 1 }  /* AF_UNIX */
  }

  DEFAULT KILL
}

USE restricted_io DEFAULT KILL

nsjail vs Docker vs gVisor

nsjail, Docker, and gVisor operate at different layers. Choosing between them depends on what you are isolating, how fast you need it, and how much infrastructure you want to manage.

nsjailDockergVisor
Isolation targetSingle process / process treeApplication stack (full OS userspace)Application stack (intercepted syscalls)
Startup time< 20ms500ms - 2s200 - 500ms
Daemon requiredNoYes (dockerd)No (OCI runtime)
Syscall handlingKernel executes allowed syscalls directlyKernel executes all syscalls (default seccomp)Sentry reimplements syscalls in userspace
CPU overheadNear zero (BPF filter only)Near zero (namespace overhead)5-20% (userspace syscall translation)
Kernel attack surfaceReduced (blocked syscalls never reach kernel)Full (all syscalls reach kernel)Minimal (only ~20 host syscalls used)
Filesystem modelExplicit bind mountsLayered image filesystem (overlayfs)Layered image filesystem (overlayfs)
Network isolationNamespace-based (no networking by default)Bridge networking (veth pairs)Netstack (userspace TCP/IP) or host
ConfigurationProtobuf config + Kafel policiesDockerfile + composeOCI spec + runsc flags
Best forHigh-throughput code executionApplication deploymentDefense-in-depth sandboxing

When to Choose nsjail

nsjail is the right tool when you need to sandbox many short-lived processes with minimal overhead. A code evaluation platform that runs 10,000 student submissions per hour benefits from 20ms launch times. An AI agent loop that executes code 20 times per session benefits from no daemon dependency. If you are already running on Linux and have engineers comfortable with namespace and seccomp configuration, nsjail gives you fine-grained control that no higher-level tool matches.

When to Choose Docker

Docker is the right tool when you need reproducible environments with dependency management. If your sandboxed code requires specific system libraries, language runtimes, or complex dependency trees, Docker images solve that cleanly. Docker also provides a well-understood ecosystem: registries, CI/CD integration, orchestration (Kubernetes). The tradeoff is startup time, daemon dependency, and a larger attack surface (the full syscall set is available by default unless you add a custom seccomp profile).

When to Choose gVisor

gVisor is the right tool when you need the strongest isolation and can tolerate the performance cost. Its userspace kernel (Sentry) intercepts every syscall before it reaches the host kernel, reducing the kernel attack surface to roughly 20 host syscalls. This matters when running truly adversarial code. The cost is 5-20% CPU overhead and higher memory usage. gVisor is used by Google Cloud Run and GKE Sandbox.

Production Deployment Patterns

Running nsjail in production requires solving problems that do not appear in local testing: process lifecycle management, log collection, concurrent execution, and failure handling.

Pattern 1: nsjail Inside Docker

The most common production pattern is running nsjail inside a Docker container. Docker provides the base environment (system libraries, language runtimes, nsjail binary). nsjail provides per-execution isolation within that container. This gives you Docker's dependency management and image distribution with nsjail's sub-20ms per-process sandboxing.

Dockerfile: nsjail execution environment

FROM ubuntu:24.04

# Install nsjail and language runtimes
RUN apt-get update && apt-get install -y \
    nsjail \
    python3 python3-pip \
    nodejs npm \
    && rm -rf /var/lib/apt/lists/*

# Copy nsjail configuration
COPY nsjail-configs/ /etc/nsjail/

# Create sandbox workspace
RUN mkdir -p /sandbox/workspace && \
    chown 65534:65534 /sandbox/workspace

# The execution wrapper handles:
# - Writing user code to /sandbox/workspace
# - Invoking nsjail with the right config
# - Collecting stdout/stderr/exit code
COPY executor /usr/local/bin/executor

EXPOSE 8080
CMD ["executor", "--listen", ":8080"]

Pattern 2: Windmill-Style Worker Pool

Windmill's architecture demonstrates a production-grade pattern: a pool of worker processes, each running inside its own container, with nsjail providing per-job isolation. The worker receives a job from the queue, writes the code to a temporary directory, invokes nsjail with the appropriate config, collects output, and cleans up. nsjail's ONCE mode maps directly to this execute-and-discard pattern.

Worker loop: nsjail per-job isolation (pseudocode)

import subprocess
import tempfile
import json

def execute_sandboxed(code: str, language: str, timeout: int = 30):
    """Execute untrusted code inside an nsjail sandbox."""

    with tempfile.TemporaryDirectory() as workdir:
        # Write user code to workspace
        code_path = f"{workdir}/main.py"
        with open(code_path, "w") as f:
            f.write(code)

        # Execute inside nsjail
        result = subprocess.run(
            [
                "nsjail",
                "--config", f"/etc/nsjail/{language}-sandbox.cfg",
                "--bindmount_ro", f"{code_path}:/app/main.py",
                "--time_limit", str(timeout),
                "--", "/usr/bin/python3", "/app/main.py"
            ],
            capture_output=True,
            text=True,
            timeout=timeout + 5  # Buffer for nsjail setup
        )

        return {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "exit_code": result.returncode,
        }

# Worker loop
while True:
    job = queue.dequeue()
    result = execute_sandboxed(job.code, job.language)
    queue.complete(job.id, result)

Pattern 3: TCP Listener for Network Services

nsjail's LISTEN_TCP mode binds to a port and forks a sandboxed process per connection. This is how Google hosts CTF challenges: each contestant connects to a port, gets a fresh sandbox, and their session is fully isolated from other contestants. The same pattern works for sandboxed REPL services or interactive code execution endpoints.

nsjail TCP listener config

name: "repl-service"
mode: LISTEN_TCP
port: 9999
max_conns: 100        # Max concurrent sandboxes
max_conns_per_ip: 5   # Per-IP limit

time_limit: 60        # 60 second session timeout
cgroup_mem_max: 268435456  # 256 MB per session

clone_newuser: true
clone_newnet: true
clone_newns: true
clone_newpid: true

mount {
  src: "/usr"
  dst: "/usr"
  is_bind: true
  rw: false
}
mount {
  dst: "/tmp"
  fstype: "tmpfs"
  rw: true
}

exec_bin {
  path: "/usr/bin/python3"
  arg: "-i"  # Interactive mode
}

Limitations and Tradeoffs

nsjail is not a universal sandbox. Understanding its limitations is necessary before choosing it for production.

Linux-only

nsjail depends on Linux kernel features (namespaces, cgroups, seccomp-bpf). It does not work on macOS or Windows. Development and CI on non-Linux platforms require a Linux VM or Docker container.

Shared kernel attack surface

Unlike gVisor or Firecracker, nsjail runs sandboxed processes against the real kernel. A kernel exploit in an allowed syscall can escape the sandbox. Seccomp-BPF reduces the attack surface but does not eliminate it.

No built-in image management

nsjail does not have an image format or registry. You manage the root filesystem yourself through bind mounts and tmpfs. For complex dependency trees, this means either pre-installing packages on the host or using nsjail inside Docker.

Configuration complexity

A production nsjail config requires understanding namespaces, mount semantics, UID mappings, seccomp policies, and cgroup parameters. Getting the seccomp policy right for a given language runtime is iterative: run, hit a blocked syscall, allow it, repeat.

Seccomp Policy Iteration

The hardest part of deploying nsjail is getting the seccomp policy right. Python, Node.js, Go, and Rust each use different sets of syscalls. A policy that works for Python will block Go (which uses clone for goroutines). A policy that works for Go will block Node.js (which uses epoll_create for the event loop). You end up maintaining per-language seccomp policies, each developed through trial and error.

nsjail provides a --seccomp_log flag that logs blocked syscalls instead of killing the process. This is essential for policy development: run your workload with logging enabled, see which syscalls get blocked, and add them to the policy.

Developing a seccomp policy iteratively

# Step 1: Run with seccomp logging (not enforcing)
nsjail --config sandbox.cfg \
  --seccomp_log \
  -- /usr/bin/python3 -c "import numpy; print(numpy.__version__)"

# Step 2: Check the log for blocked syscalls
# nsjail will report: "seccomp violation: syscall 257 (openat)"

# Step 3: Add the missing syscalls to your Kafel policy

# Step 4: Test with enforcement enabled
nsjail --config sandbox.cfg \
  -- /usr/bin/python3 -c "import numpy; print(numpy.__version__)"

# Repeat until the workload runs clean

Managed Alternatives: When nsjail Is Not Worth the Effort

nsjail gives you full control over process isolation. That control comes with cost: you write and maintain seccomp policies, manage the host filesystem, handle process lifecycle, build monitoring, and operate the infrastructure. For many teams, this is not the right tradeoff.

If sandboxing is not your core product, if you are building an AI coding tool and need code execution as a feature, the engineering time spent on nsjail configuration and operations is time not spent on your actual product. Managed sandbox APIs exist specifically for this case.

Self-Hosted nsjailManaged Sandbox API
Setup timeDays to weeks (config, testing, infra)Minutes (install SDK, get API key)
Seccomp policyYou write and maintain per-language policiesPre-configured, tested across workloads
Dependency managementManual (host filesystem or Docker)Built-in (templates, package managers)
ScalingYou manage worker pools, load balancingAutomatic (provider infrastructure)
Multi-languageSeparate config per language runtimePre-built templates for common languages
Filesystem persistenceYou implement session state managementSession-scoped by default
Cost modelInfrastructure cost + engineering timePer-second or bundled pricing
ControlFull (every kernel knob exposed)Limited (provider's isolation model)

Morph Sandbox SDK: equivalent of 200 lines of nsjail config

import { MorphSandbox } from "@anthropic-ai/morph-sandbox";

// Create an isolated sandbox (sub-300ms cold start)
const sandbox = await MorphSandbox.create({
  apiKey: process.env.MORPH_API_KEY,
  template: "python-3.12",
  timeout: 300,
});

// Write untrusted code
await sandbox.filesystem.write("/app/main.py", untrustedCode);

// Execute with full isolation
const result = await sandbox.exec("cd /app && python main.py");
console.log(result.stdout);
console.log(result.exitCode);

// Filesystem persists between calls within the session
await sandbox.exec("pip install numpy pandas");
const analysis = await sandbox.exec("python /app/analyze.py");

await sandbox.destroy();

When to self-host nsjail

Self-hosted nsjail makes sense in three cases: (1) you need sub-20ms process launch and cannot tolerate network latency to a remote sandbox API, (2) you run on air-gapped or regulated infrastructure where external API calls are prohibited, or (3) sandboxing is your product and you need full control over the isolation stack. For everyone else, a managed API saves weeks of engineering time.

Frequently Asked Questions

What is nsjail?

nsjail is a lightweight, open-source process isolation tool developed at Google. It uses Linux namespaces (PID, mount, network, user, UTS, IPC, cgroup), cgroups for resource limits, and seccomp-bpf for syscall filtering. It sandboxes a single process or process tree without a daemon, container runtime, or root privileges.

How does nsjail differ from Docker?

nsjail isolates individual processes with sub-20ms startup and no daemon. Docker isolates application stacks with 500ms-2s startup and requires dockerd. nsjail uses protobuf configs and Kafel seccomp policies. Docker uses Dockerfiles. nsjail is purpose-built for sandboxing untrusted code. Docker is a general-purpose container platform. The most common production pattern combines both: Docker for environment management, nsjail for per-execution isolation inside the container.

Can I use nsjail for AI agent code execution?

Yes. Windmill uses nsjail in production for sandboxing workflow executions, including AI agent workloads. The sub-20ms startup and zero-daemon design make it efficient for agent loops that execute code many times per session. The tradeoff is operational: you maintain seccomp policies, filesystem configuration, and infrastructure yourself.

What is the Kafel language?

Kafel is a domain-specific language for defining seccomp-bpf policies. It compiles human-readable rules into BPF bytecode. Instead of writing raw BPF instructions, you define policies like ALLOW { read, write, open } DEFAULT KILL. Kafel supports named policies, argument-level filtering, and composition.

Does nsjail require root?

Not always. With user namespaces enabled (clone_newuser: true), nsjail can run as an unprivileged user. Some features like network namespace creation may require CAP_SYS_ADMIN or sysctl adjustments. Many production deployments run nsjail inside Docker containers that provide the necessary capabilities.

How does nsjail compare to gVisor?

nsjail filters syscalls at the kernel boundary. Allowed syscalls execute on the real kernel with near-zero overhead. gVisor intercepts all syscalls in a userspace kernel (Sentry), adding 5-20% CPU overhead but reducing the host kernel attack surface to roughly 20 syscalls. Choose nsjail for throughput, gVisor for defense-in-depth against kernel exploits.

What are managed alternatives to building with nsjail?

Managed sandbox APIs like Morph Sandbox SDK, E2B, and Modal handle isolation, scaling, and lifecycle management for you. Morph Sandbox SDK provides sub-300ms cold starts with session-scoped filesystem persistence, bundled with Morph API plans. These are the better choice when sandboxing is a feature of your product, not the product itself.

Or Use Morph Sandbox SDK for Managed Sandboxes

Skip the nsjail configuration. Morph Sandbox SDK gives AI agents isolated code execution with sub-300ms cold starts, session-scoped persistence, and zero infrastructure to manage. Included free with Morph API.