Modal Pricing Breakdown: Per-Second GPU Billing and What It Actually Costs at Scale

Modal charges per-second for CPU, memory, and GPU. The Starter plan includes $30/mo in free credits. H100s run $3.95/hr, A100s $2.10/hr, T4s $0.59/hr. But production workloads hit 3x multipliers that triple your bill. Full pricing breakdown with real cost examples.

April 5, 2026 ยท 1 min read

Modal bills per second for CPU, memory, and GPU. No idle charges. The Starter plan is free and includes $30/month in compute credits. An H100 costs $3.95/hr, an A100 40GB costs $2.10/hr, a T4 costs $0.59/hr.

Those are the base rates. Production workloads see multipliers that can push the real cost to 3.75x the listed price. This guide breaks down every line item so you can estimate what Modal actually costs for your workload.

Platform Plans

Modal has three platform tiers. The platform fee covers workspace features, concurrency limits, and log retention. Compute is billed separately on top.

$0/mo
Starter (+ $30 credits)
$250/mo
Team (+ $100 credits)
Custom
Enterprise
FeatureStarterTeamEnterprise
Monthly fee$0$250Custom
Included credits$30/mo$100/moVolume discounts
Workspace seats3UnlimitedUnlimited
Max containers1001,000Custom
GPU concurrency1050Custom
Log retention1 day30 daysCustom
Deployed crons5UnlimitedUnlimited
Web endpoints8UnlimitedUnlimited
Custom domainsNoYesYes
SSO / OktaNoNoYes

The Starter plan works for prototyping and solo development. The $30 monthly credit covers real experimentation: roughly 7.5 hours of H100 time or 50 hours of T4 time. But the 3-seat limit, 1-day log retention, and 100-container cap make it impractical for production.

The Team plan at $250/month makes sense once you need more than 3 seats or deploy production services. The $100 credit offsets some of the monthly fee, but the real value is the higher concurrency limits and 30-day logs.

Startup and academic credits

Modal offers up to $25,000 in credits for startups and $10,000 for academics. These are applied on top of the monthly credit and can significantly extend the prototyping phase before you start paying compute out of pocket.

GPU Pricing

Modal lists 10 GPU types. Prices are per-second with no minimum commitment. You pay only while your function is executing, not while the container is idle or cold-starting.

GPUPer SecondPer HourVRAM
B200$0.001736$6.25192 GB
H200$0.001261$4.54141 GB
H100$0.001097$3.9580 GB
RTX PRO 6000$0.000842$3.0348 GB
A100 (80GB)$0.000694$2.5080 GB
A100 (40GB)$0.000583$2.1040 GB
L40S$0.000542$1.9548 GB
A10G$0.000306$1.1024 GB
L4$0.000222$0.8024 GB
T4$0.000164$0.5916 GB

All GPU rates are preemptible

Modal does not support non-preemptible execution for GPU functions. Every GPU workload can be interrupted at any time and rescheduled. Modal will restart your function on the same input, but if your workload cannot tolerate interruptions (long training runs, stateful inference sessions), this is a meaningful risk. Design for idempotency or use checkpointing.

How Modal Compares to Other GPU Providers

Modal's serverless GPU rates are competitive with dedicated cloud providers, but remember these are preemptible. Reserved instances on AWS or Lambda Labs are cheaper per hour for sustained utilization above 70%.

ProviderH100 $/hrTypeNotes
RunPod (spot)$1.49SpotCommunity cloud
Vast.ai$1.87MarketplaceVariable availability
RunPod (on-demand)$2.49On-demandGuaranteed
CoreWeave$2.23ReservedCommitted use
Lambda Labs$2.99On-demandGuaranteed, no preemption
Modal$3.95PreemptibleServerless, per-second
AWS (p5.48xlarge)~$3.67On-demandPer-GPU equivalent
GCP~$4.50On-demandPer-GPU equivalent

Modal is $1-2/hr more expensive than RunPod or Lambda Labs for raw H100 time. The premium buys you serverless autoscaling, per-second billing, and zero idle costs. If your GPU utilization is bursty (inference endpoints with variable traffic), the per-second model can be cheaper overall because you pay nothing during idle periods. If utilization is sustained above 70%, a reserved instance elsewhere will save money.

CPU and Memory Pricing

CPU and memory are billed independently of GPUs. Every container uses CPU and memory. GPU containers also consume CPU and memory on top of the GPU charge.

$0.047/hr
Per CPU core (base rate)
$0.008/hr
Per GiB memory (base rate)
ResourcePer SecondPer HourPer Month (730 hrs)
1 CPU core$0.0000131$0.047$34.45
1 GiB memory$0.00000222$0.008$5.84
4 cores + 8 GiB$0.0000702$0.253$184.53
8 cores + 32 GiB$0.000176$0.633$462.06

These are base preemptible rates. A container with 4 CPU cores and 8 GiB of memory costs $0.253/hr at base. In practice, most production workloads need non-preemptible execution and run in a specific region, which applies the multipliers covered in the next section.

Minimum allocation is 0.125 CPU cores per container. Even a minimal container costs $0.006/hr for CPU alone before memory.

Sandbox Pricing

Modal sandboxes use a separate, higher pricing tier than standard functions. The sandbox CPU rate is 3x the standard rate.

ResourceStandard FunctionSandboxMultiplier
CPU (per core/sec)$0.0000131$0.000039423.0x
Memory (per GiB/sec)$0.00000222$0.000006723.0x
CPU (per core/hr)$0.047$0.1423.0x
Memory (per GiB/hr)$0.008$0.0243.0x

A sandbox with 1 CPU core and 2 GiB of memory costs $0.190/hr ($0.142 CPU + $0.048 memory). Run that continuously for a month: $138.70. A 4-core, 16 GiB sandbox runs $22.83/day or $685/month.

The 3x sandbox premium covers the gVisor isolation layer and the ability to run untrusted code. For AI agent workloads where you spin up hundreds of short-lived sandboxes, the per-second billing keeps costs proportional to actual execution time. But for long-running sandbox sessions, the premium adds up.

Sandbox GPU pricing

GPU sandboxes use the same GPU rates as standard functions (no additional sandbox multiplier on the GPU portion). Only CPU and memory carry the 3x sandbox premium. An A100 40GB sandbox still costs $2.10/hr for the GPU, plus the higher sandbox CPU/memory rates.

The Multiplier System

Modal's listed rates are preemptible base prices. Two multipliers can increase your actual cost significantly.

Non-Preemptible: 3x

Guarantees your function will not be interrupted. Only available for CPU functions. GPU functions cannot use non-preemptible mode. Applied by setting nonpreemptible=True in your function decorator.

Regional: 1.25x to 2.5x

US, EU, UK, and Asia-Pacific regions apply a 1.25x multiplier. Other regions go up to 2.5x. You cannot run at base rates in any named region.

ConfigurationPer HourPer Month (730 hrs)Multiplier
Base (preemptible, no region)$0.047$34.451x
US region, preemptible$0.059$43.071.25x
Non-preemptible, no region$0.142$103.343x
Non-preemptible, US region$0.177$129.173.75x

The bottom row is what most US-based production workloads actually pay for CPU. 3.75x the listed base rate. A function with 4 CPU cores and 8 GiB memory that costs $0.253/hr at base ends up at $0.949/hr in production (non-preemptible, US region).

GPU workloads cannot avoid preemption

The non-preemptible flag only works for CPU functions. All GPU workloads on Modal are preemptible by default, and there is no way to change this. If your GPU workload is interrupted, Modal restarts it on the same input. For long-running training jobs, this means mandatory checkpointing. For stateful inference, this means potential dropped requests during rescheduling.

How Multipliers Stack on Sandboxes

Sandboxes already carry a 3x base premium. Regional multipliers apply on top. A sandbox in the US region pays 3x (sandbox) times 1.25x (region) = 3.75x the standard CPU rate. One sandbox CPU core in the US costs $0.177/hr, the same as a non-preemptible standard function.

Storage and Network Costs

Modal does not publish pricing for Volumes (its distributed file storage), data egress, or network transfer. The pricing page covers compute only.

Volumes

Modal Volumes provide persistent distributed storage for models, datasets, and checkpoints. Pricing is not published. For large datasets, this is a meaningful unknown in your cost estimate.

Network Egress

Data transfer fees are not listed on Modal's pricing page. Major cloud providers typically charge $0.08-0.12/GB for egress. Modal's policy here is undocumented, which makes cost estimation harder for data-heavy workloads.

For workloads that primarily compute and return small results (inference endpoints, code execution), storage and egress are likely negligible. For workloads that move large datasets or store significant model checkpoints, you will need to contact Modal for pricing. This is a gap in their published pricing.

Real-World Cost Examples

Base rates are meaningless without context. Here is what common workloads actually cost on Modal, including the multipliers most workloads incur.

WorkloadResourcesUtilizationMonthly Cost
Inference endpoint (bursty)1x A100 40GB + 4 cores~15% (5 hrs/day)$340
Inference endpoint (steady)1x H100 + 8 cores~70% (17 hrs/day)$1,500
Batch training (nightly)4x H1004 hrs/day$1,920
AI agent sandboxes1 core, 2 GiB each1,000 sessions, 5 min avg$3.20
AI agent sandboxes (heavy)2 cores, 4 GiB each10,000 sessions, 10 min avg$95
CI/CD pipeline8 cores, 16 GiB2 hrs/day, non-preemptible$45
Always-on web service4 cores, 8 GiB, non-preemptible24/7$693

How the $30 Free Tier Goes

The $30 monthly credit on the Starter plan covers a surprising amount for development and prototyping:

GPUHours per MonthSessions (5 min each)
H1007.6 hours91 sessions
A100 (40GB)14.3 hours171 sessions
A10G27.3 hours327 sessions
L437.5 hours450 sessions
T450.8 hours610 sessions
CPU only (1 core)638 hours7,660 sessions

For prototyping inference endpoints or running a few hundred test sandboxes, the free tier is genuinely useful. It breaks down once you need sustained production workloads or non-preemptible execution.

Modal vs Alternatives

Modal competes in two markets: general GPU compute (against RunPod, Lambda Labs, AWS) and AI agent sandboxes (against E2B, Morph, Daytona). The right comparison depends on what you are building.

For GPU Compute

ModalRunPodLambda LabsAWS
H100 $/hr$3.95$2.49$2.99~$3.67
A100 40GB $/hr$2.10$1.19$1.10~$3.97
BillingPer-secondPer-secondPer-hourPer-second
Idle costsNoneNone (serverless)Yes (instances)Yes (instances)
AutoscalingAutomaticAutomatic (serverless)ManualAuto (with config)
Preemption riskAlwaysSpot onlyNoneSpot only
Cold start2-4s3-6sN/A (always-on)Minutes

Modal wins on developer experience: per-second billing with automatic scaling and zero config. RunPod and Lambda Labs win on raw price. AWS wins on ecosystem breadth. Choose based on whether you value convenience or cost.

For AI Agent Sandboxes

ModalE2BMorphDaytona
CPU $/core/hr$0.142$0.050Per-session$0.050
Cold start<1s~150ms<300ms~90ms
GPU supportYesNoNoNo
Max runtime24 hrs24 hrsPer-sessionUnlimited
Free credits$30/mo$100 one-timeFree tier$200 one-time
Built for agentsAdaptedPurpose-builtPurpose-builtDev environments
SDKPython, JS, Go (beta)Python, JS/TSREST APIREST API

For pure CPU sandbox workloads (running agent-generated code, executing tests, processing documents), E2B and Morph are cheaper and have faster cold starts. Modal's sandbox advantage is GPU access: if your agent needs to run ML inference inside the sandbox, Modal is the only option that supports it natively.

Different tools for different jobs

Modal is a general GPU compute platform that added sandboxes. Morph builds purpose-built infrastructure for coding agent workloads: sub-300ms sandbox cold starts, session-scoped persistence, and an API designed for agent orchestration rather than ML pipelines. If your primary need is safe code execution for AI agents, a purpose-built tool avoids paying the general-compute premium. If you need GPUs inside your sandboxes, Modal is the right choice.

When Modal Makes Sense

Good Fit: Bursty GPU Inference

Inference endpoints with variable traffic. Per-second billing means zero cost at idle. If your endpoint handles 100 requests/day at 2 seconds each, you pay $0.22/day on an A100 instead of $50+/day for a reserved instance.

Good Fit: Batch ML Jobs

Nightly training runs, data processing pipelines, or batch inference. Spin up 50 GPUs for 20 minutes, pay $66, shut down. No cluster management.

Poor Fit: Always-On Services

A 4-core web service running 24/7 costs $693/month on Modal (non-preemptible, US). A comparable VM on Fly.io or Railway costs $50-100/month. Modal's per-second billing only saves money if you have idle time.

Poor Fit: High-Volume CPU Sandboxes

10,000 agent sandbox sessions per day at 10 minutes each: $95/month on Modal vs ~$34/month on E2B. The 3x sandbox premium makes Modal 2.8x more expensive for CPU-only sandbox workloads.

The pattern: Modal is cost-effective when utilization is below 70% and you need GPUs. It gets expensive for always-on workloads and CPU-only sandboxes where dedicated alternatives exist.

FAQ

How much does Modal cost?

Modal's Starter plan is free with $30/month in compute credits. Compute is billed per second for CPU ($0.047/hr per core), memory ($0.008/hr per GiB), and GPU ($0.59/hr for T4 up to $6.25/hr for B200). The Team plan costs $250/month with $100 in credits. Production CPU workloads with non-preemptible execution in the US pay up to 3.75x the base rate.

How much does an H100 cost on Modal?

$3.95/hr ($0.001097/sec). This is a preemptible rate. Modal may interrupt your workload at any time and restart it. Non-preemptible mode is not available for GPU functions. For comparison, Lambda Labs charges $2.99/hr for a non-preemptible H100, and RunPod charges $2.49/hr on-demand.

Does Modal have a free tier?

Yes. The Starter plan includes $30/month in compute credits with no monthly fee. This covers roughly 7.5 hours of H100 time or 50 hours of T4 time. The Starter plan is limited to 3 seats, 100 containers, and 10 concurrent GPUs. Credits do not roll over month to month.

What is Modal's non-preemptible pricing?

Non-preemptible execution applies a 3x multiplier to CPU and memory costs. Combined with the 1.25x US regional multiplier, that is 3.75x the base rate. Non-preemptible is only available for CPU functions. GPU functions cannot use non-preemptible mode.

How does Modal sandbox pricing work?

Sandboxes use a separate pricing tier: $0.00003942/core/sec for CPU ($0.142/hr) and $0.00000672/GiB/sec for memory ($0.024/hr). This is 3x the standard function rate. GPU sandboxes use standard GPU rates with the sandbox premium only on CPU and memory. A 1-core, 2 GiB sandbox costs $0.190/hr.

Is Modal cheaper than AWS for GPU workloads?

For bursty workloads with low average utilization, yes. Modal's per-second billing means you pay nothing at idle. An inference endpoint handling 100 requests/day at 2 seconds each costs pennies on Modal versus $50+/day for a reserved AWS instance. For sustained utilization above 70%, AWS reserved instances or Lambda Labs are cheaper per GPU-hour.

Does Modal charge for storage?

Modal does not publish storage pricing for Volumes (its persistent file storage). Network egress fees are also undocumented. For compute-only workloads, this is not an issue. For data-heavy workloads, contact Modal for pricing details.

Related Guides

Purpose-built agent sandboxes, not general compute

Morph sandboxes are built for coding agent workloads: sub-300ms cold starts, session-scoped persistence, and per-session pricing. No GPU premium on CPU sandbox work. No multiplier surprises.