Modal bills per second for CPU, memory, and GPU. No idle charges. The Starter plan is free and includes $30/month in compute credits. An H100 costs $3.95/hr, an A100 40GB costs $2.10/hr, a T4 costs $0.59/hr.
Those are the base rates. Production workloads see multipliers that can push the real cost to 3.75x the listed price. This guide breaks down every line item so you can estimate what Modal actually costs for your workload.
Platform Plans
Modal has three platform tiers. The platform fee covers workspace features, concurrency limits, and log retention. Compute is billed separately on top.
| Feature | Starter | Team | Enterprise |
|---|---|---|---|
| Monthly fee | $0 | $250 | Custom |
| Included credits | $30/mo | $100/mo | Volume discounts |
| Workspace seats | 3 | Unlimited | Unlimited |
| Max containers | 100 | 1,000 | Custom |
| GPU concurrency | 10 | 50 | Custom |
| Log retention | 1 day | 30 days | Custom |
| Deployed crons | 5 | Unlimited | Unlimited |
| Web endpoints | 8 | Unlimited | Unlimited |
| Custom domains | No | Yes | Yes |
| SSO / Okta | No | No | Yes |
The Starter plan works for prototyping and solo development. The $30 monthly credit covers real experimentation: roughly 7.5 hours of H100 time or 50 hours of T4 time. But the 3-seat limit, 1-day log retention, and 100-container cap make it impractical for production.
The Team plan at $250/month makes sense once you need more than 3 seats or deploy production services. The $100 credit offsets some of the monthly fee, but the real value is the higher concurrency limits and 30-day logs.
Startup and academic credits
Modal offers up to $25,000 in credits for startups and $10,000 for academics. These are applied on top of the monthly credit and can significantly extend the prototyping phase before you start paying compute out of pocket.
GPU Pricing
Modal lists 10 GPU types. Prices are per-second with no minimum commitment. You pay only while your function is executing, not while the container is idle or cold-starting.
| GPU | Per Second | Per Hour | VRAM |
|---|---|---|---|
| B200 | $0.001736 | $6.25 | 192 GB |
| H200 | $0.001261 | $4.54 | 141 GB |
| H100 | $0.001097 | $3.95 | 80 GB |
| RTX PRO 6000 | $0.000842 | $3.03 | 48 GB |
| A100 (80GB) | $0.000694 | $2.50 | 80 GB |
| A100 (40GB) | $0.000583 | $2.10 | 40 GB |
| L40S | $0.000542 | $1.95 | 48 GB |
| A10G | $0.000306 | $1.10 | 24 GB |
| L4 | $0.000222 | $0.80 | 24 GB |
| T4 | $0.000164 | $0.59 | 16 GB |
All GPU rates are preemptible
Modal does not support non-preemptible execution for GPU functions. Every GPU workload can be interrupted at any time and rescheduled. Modal will restart your function on the same input, but if your workload cannot tolerate interruptions (long training runs, stateful inference sessions), this is a meaningful risk. Design for idempotency or use checkpointing.
How Modal Compares to Other GPU Providers
Modal's serverless GPU rates are competitive with dedicated cloud providers, but remember these are preemptible. Reserved instances on AWS or Lambda Labs are cheaper per hour for sustained utilization above 70%.
| Provider | H100 $/hr | Type | Notes |
|---|---|---|---|
| RunPod (spot) | $1.49 | Spot | Community cloud |
| Vast.ai | $1.87 | Marketplace | Variable availability |
| RunPod (on-demand) | $2.49 | On-demand | Guaranteed |
| CoreWeave | $2.23 | Reserved | Committed use |
| Lambda Labs | $2.99 | On-demand | Guaranteed, no preemption |
| Modal | $3.95 | Preemptible | Serverless, per-second |
| AWS (p5.48xlarge) | ~$3.67 | On-demand | Per-GPU equivalent |
| GCP | ~$4.50 | On-demand | Per-GPU equivalent |
Modal is $1-2/hr more expensive than RunPod or Lambda Labs for raw H100 time. The premium buys you serverless autoscaling, per-second billing, and zero idle costs. If your GPU utilization is bursty (inference endpoints with variable traffic), the per-second model can be cheaper overall because you pay nothing during idle periods. If utilization is sustained above 70%, a reserved instance elsewhere will save money.
CPU and Memory Pricing
CPU and memory are billed independently of GPUs. Every container uses CPU and memory. GPU containers also consume CPU and memory on top of the GPU charge.
| Resource | Per Second | Per Hour | Per Month (730 hrs) |
|---|---|---|---|
| 1 CPU core | $0.0000131 | $0.047 | $34.45 |
| 1 GiB memory | $0.00000222 | $0.008 | $5.84 |
| 4 cores + 8 GiB | $0.0000702 | $0.253 | $184.53 |
| 8 cores + 32 GiB | $0.000176 | $0.633 | $462.06 |
These are base preemptible rates. A container with 4 CPU cores and 8 GiB of memory costs $0.253/hr at base. In practice, most production workloads need non-preemptible execution and run in a specific region, which applies the multipliers covered in the next section.
Minimum allocation is 0.125 CPU cores per container. Even a minimal container costs $0.006/hr for CPU alone before memory.
Sandbox Pricing
Modal sandboxes use a separate, higher pricing tier than standard functions. The sandbox CPU rate is 3x the standard rate.
| Resource | Standard Function | Sandbox | Multiplier |
|---|---|---|---|
| CPU (per core/sec) | $0.0000131 | $0.00003942 | 3.0x |
| Memory (per GiB/sec) | $0.00000222 | $0.00000672 | 3.0x |
| CPU (per core/hr) | $0.047 | $0.142 | 3.0x |
| Memory (per GiB/hr) | $0.008 | $0.024 | 3.0x |
A sandbox with 1 CPU core and 2 GiB of memory costs $0.190/hr ($0.142 CPU + $0.048 memory). Run that continuously for a month: $138.70. A 4-core, 16 GiB sandbox runs $22.83/day or $685/month.
The 3x sandbox premium covers the gVisor isolation layer and the ability to run untrusted code. For AI agent workloads where you spin up hundreds of short-lived sandboxes, the per-second billing keeps costs proportional to actual execution time. But for long-running sandbox sessions, the premium adds up.
Sandbox GPU pricing
GPU sandboxes use the same GPU rates as standard functions (no additional sandbox multiplier on the GPU portion). Only CPU and memory carry the 3x sandbox premium. An A100 40GB sandbox still costs $2.10/hr for the GPU, plus the higher sandbox CPU/memory rates.
The Multiplier System
Modal's listed rates are preemptible base prices. Two multipliers can increase your actual cost significantly.
Non-Preemptible: 3x
Guarantees your function will not be interrupted. Only available for CPU functions. GPU functions cannot use non-preemptible mode. Applied by setting nonpreemptible=True in your function decorator.
Regional: 1.25x to 2.5x
US, EU, UK, and Asia-Pacific regions apply a 1.25x multiplier. Other regions go up to 2.5x. You cannot run at base rates in any named region.
| Configuration | Per Hour | Per Month (730 hrs) | Multiplier |
|---|---|---|---|
| Base (preemptible, no region) | $0.047 | $34.45 | 1x |
| US region, preemptible | $0.059 | $43.07 | 1.25x |
| Non-preemptible, no region | $0.142 | $103.34 | 3x |
| Non-preemptible, US region | $0.177 | $129.17 | 3.75x |
The bottom row is what most US-based production workloads actually pay for CPU. 3.75x the listed base rate. A function with 4 CPU cores and 8 GiB memory that costs $0.253/hr at base ends up at $0.949/hr in production (non-preemptible, US region).
GPU workloads cannot avoid preemption
The non-preemptible flag only works for CPU functions. All GPU workloads on Modal are preemptible by default, and there is no way to change this. If your GPU workload is interrupted, Modal restarts it on the same input. For long-running training jobs, this means mandatory checkpointing. For stateful inference, this means potential dropped requests during rescheduling.
How Multipliers Stack on Sandboxes
Sandboxes already carry a 3x base premium. Regional multipliers apply on top. A sandbox in the US region pays 3x (sandbox) times 1.25x (region) = 3.75x the standard CPU rate. One sandbox CPU core in the US costs $0.177/hr, the same as a non-preemptible standard function.
Storage and Network Costs
Modal does not publish pricing for Volumes (its distributed file storage), data egress, or network transfer. The pricing page covers compute only.
Volumes
Modal Volumes provide persistent distributed storage for models, datasets, and checkpoints. Pricing is not published. For large datasets, this is a meaningful unknown in your cost estimate.
Network Egress
Data transfer fees are not listed on Modal's pricing page. Major cloud providers typically charge $0.08-0.12/GB for egress. Modal's policy here is undocumented, which makes cost estimation harder for data-heavy workloads.
For workloads that primarily compute and return small results (inference endpoints, code execution), storage and egress are likely negligible. For workloads that move large datasets or store significant model checkpoints, you will need to contact Modal for pricing. This is a gap in their published pricing.
Real-World Cost Examples
Base rates are meaningless without context. Here is what common workloads actually cost on Modal, including the multipliers most workloads incur.
| Workload | Resources | Utilization | Monthly Cost |
|---|---|---|---|
| Inference endpoint (bursty) | 1x A100 40GB + 4 cores | ~15% (5 hrs/day) | $340 |
| Inference endpoint (steady) | 1x H100 + 8 cores | ~70% (17 hrs/day) | $1,500 |
| Batch training (nightly) | 4x H100 | 4 hrs/day | $1,920 |
| AI agent sandboxes | 1 core, 2 GiB each | 1,000 sessions, 5 min avg | $3.20 |
| AI agent sandboxes (heavy) | 2 cores, 4 GiB each | 10,000 sessions, 10 min avg | $95 |
| CI/CD pipeline | 8 cores, 16 GiB | 2 hrs/day, non-preemptible | $45 |
| Always-on web service | 4 cores, 8 GiB, non-preemptible | 24/7 | $693 |
How the $30 Free Tier Goes
The $30 monthly credit on the Starter plan covers a surprising amount for development and prototyping:
| GPU | Hours per Month | Sessions (5 min each) |
|---|---|---|
| H100 | 7.6 hours | 91 sessions |
| A100 (40GB) | 14.3 hours | 171 sessions |
| A10G | 27.3 hours | 327 sessions |
| L4 | 37.5 hours | 450 sessions |
| T4 | 50.8 hours | 610 sessions |
| CPU only (1 core) | 638 hours | 7,660 sessions |
For prototyping inference endpoints or running a few hundred test sandboxes, the free tier is genuinely useful. It breaks down once you need sustained production workloads or non-preemptible execution.
Modal vs Alternatives
Modal competes in two markets: general GPU compute (against RunPod, Lambda Labs, AWS) and AI agent sandboxes (against E2B, Morph, Daytona). The right comparison depends on what you are building.
For GPU Compute
| Modal | RunPod | Lambda Labs | AWS | |
|---|---|---|---|---|
| H100 $/hr | $3.95 | $2.49 | $2.99 | ~$3.67 |
| A100 40GB $/hr | $2.10 | $1.19 | $1.10 | ~$3.97 |
| Billing | Per-second | Per-second | Per-hour | Per-second |
| Idle costs | None | None (serverless) | Yes (instances) | Yes (instances) |
| Autoscaling | Automatic | Automatic (serverless) | Manual | Auto (with config) |
| Preemption risk | Always | Spot only | None | Spot only |
| Cold start | 2-4s | 3-6s | N/A (always-on) | Minutes |
Modal wins on developer experience: per-second billing with automatic scaling and zero config. RunPod and Lambda Labs win on raw price. AWS wins on ecosystem breadth. Choose based on whether you value convenience or cost.
For AI Agent Sandboxes
| Modal | E2B | Morph | Daytona | |
|---|---|---|---|---|
| CPU $/core/hr | $0.142 | $0.050 | Per-session | $0.050 |
| Cold start | <1s | ~150ms | <300ms | ~90ms |
| GPU support | Yes | No | No | No |
| Max runtime | 24 hrs | 24 hrs | Per-session | Unlimited |
| Free credits | $30/mo | $100 one-time | Free tier | $200 one-time |
| Built for agents | Adapted | Purpose-built | Purpose-built | Dev environments |
| SDK | Python, JS, Go (beta) | Python, JS/TS | REST API | REST API |
For pure CPU sandbox workloads (running agent-generated code, executing tests, processing documents), E2B and Morph are cheaper and have faster cold starts. Modal's sandbox advantage is GPU access: if your agent needs to run ML inference inside the sandbox, Modal is the only option that supports it natively.
Different tools for different jobs
Modal is a general GPU compute platform that added sandboxes. Morph builds purpose-built infrastructure for coding agent workloads: sub-300ms sandbox cold starts, session-scoped persistence, and an API designed for agent orchestration rather than ML pipelines. If your primary need is safe code execution for AI agents, a purpose-built tool avoids paying the general-compute premium. If you need GPUs inside your sandboxes, Modal is the right choice.
When Modal Makes Sense
Good Fit: Bursty GPU Inference
Inference endpoints with variable traffic. Per-second billing means zero cost at idle. If your endpoint handles 100 requests/day at 2 seconds each, you pay $0.22/day on an A100 instead of $50+/day for a reserved instance.
Good Fit: Batch ML Jobs
Nightly training runs, data processing pipelines, or batch inference. Spin up 50 GPUs for 20 minutes, pay $66, shut down. No cluster management.
Poor Fit: Always-On Services
A 4-core web service running 24/7 costs $693/month on Modal (non-preemptible, US). A comparable VM on Fly.io or Railway costs $50-100/month. Modal's per-second billing only saves money if you have idle time.
Poor Fit: High-Volume CPU Sandboxes
10,000 agent sandbox sessions per day at 10 minutes each: $95/month on Modal vs ~$34/month on E2B. The 3x sandbox premium makes Modal 2.8x more expensive for CPU-only sandbox workloads.
The pattern: Modal is cost-effective when utilization is below 70% and you need GPUs. It gets expensive for always-on workloads and CPU-only sandboxes where dedicated alternatives exist.
FAQ
How much does Modal cost?
Modal's Starter plan is free with $30/month in compute credits. Compute is billed per second for CPU ($0.047/hr per core), memory ($0.008/hr per GiB), and GPU ($0.59/hr for T4 up to $6.25/hr for B200). The Team plan costs $250/month with $100 in credits. Production CPU workloads with non-preemptible execution in the US pay up to 3.75x the base rate.
How much does an H100 cost on Modal?
$3.95/hr ($0.001097/sec). This is a preemptible rate. Modal may interrupt your workload at any time and restart it. Non-preemptible mode is not available for GPU functions. For comparison, Lambda Labs charges $2.99/hr for a non-preemptible H100, and RunPod charges $2.49/hr on-demand.
Does Modal have a free tier?
Yes. The Starter plan includes $30/month in compute credits with no monthly fee. This covers roughly 7.5 hours of H100 time or 50 hours of T4 time. The Starter plan is limited to 3 seats, 100 containers, and 10 concurrent GPUs. Credits do not roll over month to month.
What is Modal's non-preemptible pricing?
Non-preemptible execution applies a 3x multiplier to CPU and memory costs. Combined with the 1.25x US regional multiplier, that is 3.75x the base rate. Non-preemptible is only available for CPU functions. GPU functions cannot use non-preemptible mode.
How does Modal sandbox pricing work?
Sandboxes use a separate pricing tier: $0.00003942/core/sec for CPU ($0.142/hr) and $0.00000672/GiB/sec for memory ($0.024/hr). This is 3x the standard function rate. GPU sandboxes use standard GPU rates with the sandbox premium only on CPU and memory. A 1-core, 2 GiB sandbox costs $0.190/hr.
Is Modal cheaper than AWS for GPU workloads?
For bursty workloads with low average utilization, yes. Modal's per-second billing means you pay nothing at idle. An inference endpoint handling 100 requests/day at 2 seconds each costs pennies on Modal versus $50+/day for a reserved AWS instance. For sustained utilization above 70%, AWS reserved instances or Lambda Labs are cheaper per GPU-hour.
Does Modal charge for storage?
Modal does not publish storage pricing for Volumes (its persistent file storage). Network egress fees are also undocumented. For compute-only workloads, this is not an issue. For data-heavy workloads, contact Modal for pricing details.
Related Guides
Purpose-built agent sandboxes, not general compute
Morph sandboxes are built for coding agent workloads: sub-300ms cold starts, session-scoped persistence, and per-session pricing. No GPU premium on CPU sandbox work. No multiplier surprises.