Modal Pricing (2026): GPU Rates, Plans, Multipliers & Real Costs

Modal bills per second for CPU, memory, and GPU. No idle charges. The Starter plan is free and includes $30/month in compute credits. An H100 costs $3.95/hr, an A100 40GB costs $2.10/hr, a T4 costs $0.59/hr.

Those are the base rates. Production workloads see multipliers that can push the real cost to 3.75x the listed price. This guide breaks down every line item so you can estimate what Modal actually costs for your workload.

Platform Plans

Modal has three platform tiers. The platform fee covers workspace features, concurrency limits, and log retention. Compute is billed separately on top.

$0/mo

Starter (+ $30 credits)

$250/mo

Team (+ $100 credits)

Custom

Enterprise

Feature	Starter	Team	Enterprise
Monthly fee	$0	$250	Custom
Included credits	$30/mo	$100/mo	Volume discounts
Workspace seats	3	Unlimited	Unlimited
Max containers	100	1,000	Custom
GPU concurrency	10	50	Custom
Log retention	1 day	30 days	Custom
Deployed crons	5	Unlimited	Unlimited
Web endpoints	8	Unlimited	Unlimited
Custom domains	No	Yes	Yes
SSO / Okta	No	No	Yes

The Starter plan works for prototyping and solo development. The $30 monthly credit covers real experimentation: roughly 7.5 hours of H100 time or 50 hours of T4 time. But the 3-seat limit, 1-day log retention, and 100-container cap make it impractical for production.

The Team plan at $250/month makes sense once you need more than 3 seats or deploy production services. The $100 credit offsets some of the monthly fee, but the real value is the higher concurrency limits and 30-day logs.

Startup and academic credits

Modal offers up to $25,000 in credits for startups and $10,000 for academics. These are applied on top of the monthly credit and can significantly extend the prototyping phase before you start paying compute out of pocket.

GPU Pricing

Modal lists 10 GPU types. Prices are per-second with no minimum commitment. You pay only while your function is executing, not while the container is idle or cold-starting.

GPU	Per Second	Per Hour	VRAM
B200	$0.001736	$6.25	192 GB
H200	$0.001261	$4.54	141 GB
H100	$0.001097	$3.95	80 GB
RTX PRO 6000	$0.000842	$3.03	48 GB
A100 (80GB)	$0.000694	$2.50	80 GB
A100 (40GB)	$0.000583	$2.10	40 GB
L40S	$0.000542	$1.95	48 GB
A10G	$0.000306	$1.10	24 GB
L4	$0.000222	$0.80	24 GB
T4	$0.000164	$0.59	16 GB

All GPU rates are preemptible

Modal does not support non-preemptible execution for GPU functions. Every GPU workload can be interrupted at any time and rescheduled. Modal will restart your function on the same input, but if your workload cannot tolerate interruptions (long training runs, stateful inference sessions), this is a meaningful risk. Design for idempotency or use checkpointing.

How Modal Compares to Other GPU Providers

Modal's serverless GPU rates are competitive with dedicated cloud providers, but remember these are preemptible. Reserved instances on AWS or Lambda Labs are cheaper per hour for sustained utilization above 70%.

Provider	H100 $/hr	Type	Notes
RunPod (spot)	$1.49	Spot	Community cloud
Vast.ai	$1.87	Marketplace	Variable availability
RunPod (on-demand)	$2.49	On-demand	Guaranteed
CoreWeave	$2.23	Reserved	Committed use
Lambda Labs	$2.99	On-demand	Guaranteed, no preemption
Modal	$3.95	Preemptible	Serverless, per-second
AWS (p5.48xlarge)	~$3.67	On-demand	Per-GPU equivalent
GCP	~$4.50	On-demand	Per-GPU equivalent

Modal is $1-2/hr more expensive than RunPod or Lambda Labs for raw H100 time. The premium buys you serverless autoscaling, per-second billing, and zero idle costs. If your GPU utilization is bursty (inference endpoints with variable traffic), the per-second model can be cheaper overall because you pay nothing during idle periods. If utilization is sustained above 70%, a reserved instance elsewhere will save money.

CPU and Memory Pricing

CPU and memory are billed independently of GPUs. Every container uses CPU and memory. GPU containers also consume CPU and memory on top of the GPU charge.

$0.047/hr

Per CPU core (base rate)

$0.008/hr

Per GiB memory (base rate)

Resource	Per Second	Per Hour	Per Month (730 hrs)
1 CPU core	$0.0000131	$0.047	$34.45
1 GiB memory	$0.00000222	$0.008	$5.84
4 cores + 8 GiB	$0.0000702	$0.253	$184.53
8 cores + 32 GiB	$0.000176	$0.633	$462.06

These are base preemptible rates. A container with 4 CPU cores and 8 GiB of memory costs $0.253/hr at base. In practice, most production workloads need non-preemptible execution and run in a specific region, which applies the multipliers covered in the next section.

Minimum allocation is 0.125 CPU cores per container. Even a minimal container costs $0.006/hr for CPU alone before memory.

Sandbox Pricing

Modal sandboxes use a separate, higher pricing tier than standard functions. The sandbox CPU rate is 3x the standard rate.

Resource	Standard Function	Sandbox	Multiplier
CPU (per core/sec)	$0.0000131	$0.00003942	3.0x
Memory (per GiB/sec)	$0.00000222	$0.00000672	3.0x
CPU (per core/hr)	$0.047	$0.142	3.0x
Memory (per GiB/hr)	$0.008	$0.024	3.0x

A sandbox with 1 CPU core and 2 GiB of memory costs $0.190/hr ($0.142 CPU + $0.048 memory). Run that continuously for a month: $138.70. A 4-core, 16 GiB sandbox runs $22.83/day or $685/month.

The 3x sandbox premium covers the gVisor isolation layer and the ability to run untrusted code. For AI agent workloads where you spin up hundreds of short-lived sandboxes, the per-second billing keeps costs proportional to actual execution time. But for long-running sandbox sessions, the premium adds up.

Sandbox GPU pricing

GPU sandboxes use the same GPU rates as standard functions (no additional sandbox multiplier on the GPU portion). Only CPU and memory carry the 3x sandbox premium. An A100 40GB sandbox still costs $2.10/hr for the GPU, plus the higher sandbox CPU/memory rates.

The Multiplier System

Modal's listed rates are preemptible base prices. Two multipliers can increase your actual cost significantly.

Non-Preemptible: 3x

Guarantees your function will not be interrupted. Only available for CPU functions. GPU functions cannot use non-preemptible mode. Applied by setting nonpreemptible=True in your function decorator.

Regional: 1.25x to 2.5x

US, EU, UK, and Asia-Pacific regions apply a 1.25x multiplier. Other regions go up to 2.5x. You cannot run at base rates in any named region.

Configuration	Per Hour	Per Month (730 hrs)	Multiplier
Base (preemptible, no region)	$0.047	$34.45	1x
US region, preemptible	$0.059	$43.07	1.25x
Non-preemptible, no region	$0.142	$103.34	3x
Non-preemptible, US region	$0.177	$129.17	3.75x

The bottom row is what most US-based production workloads actually pay for CPU. 3.75x the listed base rate. A function with 4 CPU cores and 8 GiB memory that costs $0.253/hr at base ends up at $0.949/hr in production (non-preemptible, US region).

GPU workloads cannot avoid preemption

The non-preemptible flag only works for CPU functions. All GPU workloads on Modal are preemptible by default, and there is no way to change this. If your GPU workload is interrupted, Modal restarts it on the same input. For long-running training jobs, this means mandatory checkpointing. For stateful inference, this means potential dropped requests during rescheduling.

How Multipliers Stack on Sandboxes

Sandboxes already carry a 3x base premium. Regional multipliers apply on top. A sandbox in the US region pays 3x (sandbox) times 1.25x (region) = 3.75x the standard CPU rate. One sandbox CPU core in the US costs $0.177/hr, the same as a non-preemptible standard function.

Storage and Network Costs

Modal does not publish pricing for Volumes (its distributed file storage), data egress, or network transfer. The pricing page covers compute only.

Volumes

Modal Volumes provide persistent distributed storage for models, datasets, and checkpoints. Pricing is not published. For large datasets, this is a meaningful unknown in your cost estimate.

Network Egress

Data transfer fees are not listed on Modal's pricing page. Major cloud providers typically charge $0.08-0.12/GB for egress. Modal's policy here is undocumented, which makes cost estimation harder for data-heavy workloads.

For workloads that primarily compute and return small results (inference endpoints, code execution), storage and egress are likely negligible. For workloads that move large datasets or store significant model checkpoints, you will need to contact Modal for pricing. This is a gap in their published pricing.

Real-World Cost Examples

Base rates are meaningless without context. Here is what common workloads actually cost on Modal, including the multipliers most workloads incur.

Workload	Resources	Utilization	Monthly Cost
Inference endpoint (bursty)	1x A100 40GB + 4 cores	~15% (5 hrs/day)	$340
Inference endpoint (steady)	1x H100 + 8 cores	~70% (17 hrs/day)	$1,500
Batch training (nightly)	4x H100	4 hrs/day	$1,920
AI agent sandboxes	1 core, 2 GiB each	1,000 sessions, 5 min avg	$3.20
AI agent sandboxes (heavy)	2 cores, 4 GiB each	10,000 sessions, 10 min avg	$95
CI/CD pipeline	8 cores, 16 GiB	2 hrs/day, non-preemptible	$45
Always-on web service	4 cores, 8 GiB, non-preemptible	24/7	$693

How the $30 Free Tier Goes

The $30 monthly credit on the Starter plan covers a surprising amount for development and prototyping:

GPU	Hours per Month	Sessions (5 min each)
H100	7.6 hours	91 sessions
A100 (40GB)	14.3 hours	171 sessions
A10G	27.3 hours	327 sessions
L4	37.5 hours	450 sessions
T4	50.8 hours	610 sessions
CPU only (1 core)	638 hours	7,660 sessions

For prototyping inference endpoints or running a few hundred test sandboxes, the free tier is genuinely useful. It breaks down once you need sustained production workloads or non-preemptible execution.

Modal vs Alternatives

Modal competes in two markets: general GPU compute (against RunPod, Lambda Labs, AWS) and AI agent sandboxes (against E2B, Morph, Daytona). The right comparison depends on what you are building.

For GPU Compute

	Modal	RunPod	Lambda Labs	AWS
H100 $/hr	$3.95	$2.49	$2.99	~$3.67
A100 40GB $/hr	$2.10	$1.19	$1.10	~$3.97
Billing	Per-second	Per-second	Per-hour	Per-second
Idle costs	None	None (serverless)	Yes (instances)	Yes (instances)
Autoscaling	Automatic	Automatic (serverless)	Manual	Auto (with config)
Preemption risk	Always	Spot only	None	Spot only
Cold start	2-4s	3-6s	N/A (always-on)	Minutes

Modal wins on developer experience: per-second billing with automatic scaling and zero config. RunPod and Lambda Labs win on raw price. AWS wins on ecosystem breadth. Choose based on whether you value convenience or cost.

For AI Agent Sandboxes

	Modal	E2B	Morph	Daytona
CPU $/core/hr	$0.142	$0.050	Per-session	$0.050
Cold start	<1s	~150ms	<300ms	~90ms
GPU support	Yes	No	No	No
Max runtime	24 hrs	24 hrs	Per-session	Unlimited
Free credits	$30/mo	$100 one-time	Free tier	$200 one-time
Built for agents	Adapted	Purpose-built	Purpose-built	Dev environments
SDK	Python, JS, Go (beta)	Python, JS/TS	REST API	REST API

For pure CPU sandbox workloads (running agent-generated code, executing tests, processing documents), E2B and Morph are cheaper and have faster cold starts. Modal's sandbox advantage is GPU access: if your agent needs to run ML inference inside the sandbox, Modal is the only option that supports it natively.

Different tools for different jobs

Modal is a general GPU compute platform that added sandboxes. Morph builds purpose-built infrastructure for coding agent workloads: sub-300ms sandbox cold starts, session-scoped persistence, and an API designed for agent orchestration rather than ML pipelines. If your primary need is safe code execution for AI agents, a purpose-built tool avoids paying the general-compute premium. If you need GPUs inside your sandboxes, Modal is the right choice.

FAQ

How much does Modal cost?

Modal's Starter plan is free with $30/month in compute credits. Compute is billed per second for CPU ($0.047/hr per core), memory ($0.008/hr per GiB), and GPU ($0.59/hr for T4 up to $6.25/hr for B200). The Team plan costs $250/month with $100 in credits. Production CPU workloads with non-preemptible execution in the US pay up to 3.75x the base rate.

How much does an H100 cost on Modal?

$3.95/hr ($0.001097/sec). This is a preemptible rate. Modal may interrupt your workload at any time and restart it. Non-preemptible mode is not available for GPU functions. For comparison, Lambda Labs charges $2.99/hr for a non-preemptible H100, and RunPod charges $2.49/hr on-demand.

Does Modal have a free tier?

Yes. The Starter plan includes $30/month in compute credits with no monthly fee. This covers roughly 7.5 hours of H100 time or 50 hours of T4 time. The Starter plan is limited to 3 seats, 100 containers, and 10 concurrent GPUs. Credits do not roll over month to month.

What is Modal's non-preemptible pricing?

Non-preemptible execution applies a 3x multiplier to CPU and memory costs. Combined with the 1.25x US regional multiplier, that is 3.75x the base rate. Non-preemptible is only available for CPU functions. GPU functions cannot use non-preemptible mode.

How does Modal sandbox pricing work?

Sandboxes use a separate pricing tier: $0.00003942/core/sec for CPU ($0.142/hr) and $0.00000672/GiB/sec for memory ($0.024/hr). This is 3x the standard function rate. GPU sandboxes use standard GPU rates with the sandbox premium only on CPU and memory. A 1-core, 2 GiB sandbox costs $0.190/hr.

Is Modal cheaper than AWS for GPU workloads?

For bursty workloads with low average utilization, yes. Modal's per-second billing means you pay nothing at idle. An inference endpoint handling 100 requests/day at 2 seconds each costs pennies on Modal versus $50+/day for a reserved AWS instance. For sustained utilization above 70%, AWS reserved instances or Lambda Labs are cheaper per GPU-hour.

Does Modal charge for storage?

Modal does not publish storage pricing for Volumes (its persistent file storage). Network egress fees are also undocumented. For compute-only workloads, this is not an issue. For data-heavy workloads, contact Modal for pricing details.

Related Guides

Purpose-built agent sandboxes, not general compute

Morph sandboxes are built for coding agent workloads: sub-300ms cold starts, session-scoped persistence, and per-session pricing. No GPU premium on CPU sandbox work. No multiplier surprises.

Try Morph Free

View Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Modal Pricing Breakdown: Per-Second GPU Billing and What It Actually Costs at Scale