Fireworks vs Together AI Pricing (2026): Per-Token and GPU Rates Compared

Fireworks AI and Together AI both run OpenAI-compatible serverless endpoints and both rent GPUs by the hour. On the models people actually call, Fireworks lists lower per-token prices: DeepSeek V4 Pro at $1.74 per million input tokens versus Together's $2.10, Kimi K2.6 at $0.95 versus $1.20. Together wins the other half: a dedicated H100 at $6.49 per hour versus Fireworks' $7.00, plus raw HGX GPU clusters from $3.99 per hour that Fireworks does not offer at all.

The decision is serverless model mix versus owning your own GPU footprint. Below are exact prices from each provider's pricing page as of June 2026.

$1.74

DeepSeek V4 input (Fireworks)

$2.10

DeepSeek V4 input (Together)

$6.49

H100/hr (Together dedicated)

$7.00

H100/hr (Fireworks)

TL;DR

Pick Fireworks if your traffic is serverless per-token on big models. It lists lower rates on DeepSeek V4 ($1.74/$3.48 vs $2.10/$4.40) and Kimi K2.6 ($0.95/$4.00 vs $1.20/$4.50), caches input at a 50% discount, runs zero-data-retention on open models, and has a fixed 6,000 RPM ceiling you can plan against.
Pick Together if you want to own GPUs or go multimodal. Dedicated H100 is $6.49/hr (vs $7.00), it rents raw HGX H100/H200/B200 clusters from $3.99/hr reserved, downloadable fine-tuned weights, plus image, audio, embeddings, rerank, and a code interpreter on one bill.

Serverless Pricing, Model by Model

Per-token rates are where most teams spend. On the highest-volume models Fireworks is cheaper or tied; the two converge on the open-weight commodity models (GPT-OSS, GLM, Qwen, MiniMax). All prices are per 1M tokens, input / output.

Serverless Per-Token Pricing (per 1M tokens, June 2026)

Model	Fireworks (in / out)	Together (in / out)
DeepSeek V4 Pro	$1.74 / $3.48	$2.10 / $4.40
Kimi K2.6	$0.95 / $4.00	$1.20 / $4.50
GLM 5.1	$1.40 / $4.40	$1.40 / $4.40
GPT-OSS 120B	$0.15 / $0.60	$0.15 / $0.60
GPT-OSS 20B	$0.07 / $0.30	$0.05 / $0.20
Qwen3.6 Plus	$0.50 / $3.00	$0.50 / $3.00
MiniMax 2.7	$0.30 / $1.20	$0.30 / $1.20
DeepSeek V4 Flash (Morph)	$0.139 / $0.278	16-bit activations, codegen spec decoding + kernels

On the two models with real price separation, Fireworks wins. DeepSeek V4 Pro is about 17% cheaper on input and 21% cheaper on output. Kimi K2.6 is 21% cheaper on input. The commodity open models (GPT-OSS 120B, GLM 5.1, Qwen3.6, MiniMax 2.7) are priced identically, so on those the per-token cost is not a differentiator.

Each provider also ships cheaper tiers Fireworks and Together do not both list. Fireworks serves a DeepSeek V4 Flash at $0.14 / $0.28 and Kimi K2.5 at $0.60 / $3.00. Together lists GLM-5 at $1.00 / $3.20, Qwen3.5-397B-A17B at $0.60 / $3.60 ($0.35 cached input), and Llama 3.3 70B at $1.04 / $1.04.

Where the bill actually moves

On a serverless model the cheaper provider saves you cents to a few percent. The lever that moves real money is the cached-input discount on prompt-heavy workloads and the crossover to dedicated GPUs once sustained throughput is high enough that per-hour beats per-token. See the dedicated GPU and batch sections below.

Dedicated GPU Pricing: Together Undercuts on H100 and Rents Raw Clusters

Both let you reserve GPUs when serverless stops being economical. Together lists a lower dedicated H100 and, unlike Fireworks, rents raw HGX clusters you can run your own stack on. Fireworks bills per second and scales dedicated deployments to zero on idle.

On-Demand Dedicated GPU Pricing (per GPU-hour, June 2026)

GPU	Fireworks AI	Together AI
H100 80GB	$7.00	$6.49 dedicated / $5.49 cluster
H200 141GB	$7.00	$6.79 cluster (dedicated: contact sales)
B200 180GB	$10.00	$11.95 dedicated / $9.95 cluster
B300 288GB	$12.00	N/A
Reserved clusters	Not offered	$3.99-$9.65/hr (7-180+ day)
Billing granularity	Per second, scale to zero	Per hour

Together is cheaper on H100 at every tier ($6.49 dedicated, $5.49 cluster on-demand, $3.99 reserved). Fireworks is cheaper on B200 ($10.00 vs Together's $11.95 dedicated) and lists a B300 at $12.00 that Together does not publish. Together's reserved clusters (7 to 180+ day commitments at $3.99 to $9.65 per hour) make it double as a GPU cloud if you want to run your own engine.

Rate Limits: Fireworks Publishes a Fixed Ceiling, Together Is Dynamic

Fireworks gives you a number to plan against. Together scales limits with your sustained traffic and does not publish fixed per-model caps.

Rate Limits and Spending Tiers (June 2026)

Limit	Fireworks AI	Together AI
Requests per minute	10 (no card) -> 6,000 (with card)	Dynamic, no fixed published cap
Monthly budget tiers	$50 / $500 / $5,000 / $50,000	Not published
Guaranteed fixed throughput	6,000 RPM ceiling	Use a dedicated endpoint
Free credits for new accounts	$1	Not published as a fixed grant

Fireworks gates monthly spend by tier: $50/mo with a valid card, $500/mo after $50 spent or added, then $5,000 and $50,000 tiers. The 6,000 RPM ceiling is a hard fixed limit even at the top tier. Together's serverless limits scale with sustained traffic; if you need a guaranteed fixed limit, Together points you to a dedicated endpoint, which lines up with its raw-cluster posture above.

Fine-Tuning Pricing: Together Cheaper on Mid-to-Large LoRA

Both price fine-tuning per million training tokens. Together is cheaper above 16B; it also lets you download the resulting weights, which matters if you want to self-host later.

Fine-Tuning Pricing (per 1M training tokens, June 2026)

Model size	Fireworks LoRA SFT	Together LoRA SFT
Up to 16B	$0.50	$0.48
16.1B-80B / 17B-69B	$3.00	$1.50
80B-300B / 70B-100B	$6.00	$2.90
Above 300B	$10.00	Not published
Full SFT (up to 16B)	$1.00	Not published at this tier
DPO pricing	2x the SFT rate	$0.54 / $1.65 / $3.20 (LoRA)
Download weights	Plan-dependent	Yes

Together is half the price or less on the 17B-69B and 70B-100B LoRA tiers ($1.50 and $2.90 vs Fireworks' $3.00 and $6.00). Fireworks covers larger models, with bands up to 300B ($6.00) and above 300B ($10.00), where DPO costs 2x the SFT rate. Fireworks serves fine-tuned LoRA adapters serverless, so many small adapters share the base-model pool instead of each holding a dedicated GPU. Together lets you download fine-tuned weights to move off-platform.

Batch and Cached Input

Both cut cost on workloads that tolerate latency or reuse prompts.

Discount Mechanics (June 2026)

Discount	Fireworks AI	Together AI
Batch inference	50% of serverless	Up to 50% off (selected models)
Batch window	Best-effort	24h, not changeable
Batch size cap	Not published	50,000 reqs/batch, 100MB/file, 30B tokens/model
Cached input	50% discount (text + vision)	Model-dependent (e.g. DeepSeek V4 $0.20)

Fireworks applies a flat 50% cached-input discount across text and vision models, which helps prompt-heavy agents with stable system prompts: a $1.74 DeepSeek V4 input drops to $0.145 cached. Together's cached pricing is per model (DeepSeek V4 Pro cached input is $0.20, Kimi K2.6 $0.20). Together's Batch API runs a fixed 24h window with up to 50,000 requests per batch and 30B tokens enqueued per model; Fireworks batches at 50% of serverless on a best-effort window.

Compliance and Data Retention

Both clear the bar for regulated buyers. Fireworks publishes the more explicit data-retention posture.

Fireworks: SOC 2 Type II and HIPAA compliant. Zero-data-retention on open models: it does not log or store prompt or generation data without explicit opt-in. TLS 1.2+ in transit, AES-256 at rest.
Together: SOC 2 Type 2 certified, with an independent audit covering access management, encryption, incident response, and change management. HIPAA available via Business Associate Agreement for healthcare customers.

For HIPAA workloads, confirm a signed BAA on your specific plan with either provider before sending protected health information. Both expose OpenAI-compatible endpoints, so migrating between them, or away from either, is mostly a base-URL and API-key change.

Beyond Text Models: Together Is the Broader Bill

Fireworks concentrates on high-throughput text and vision serving plus embeddings and fine-tuning. Together adds image, audio, a rerank endpoint, and a code interpreter on one bill, alongside its GPU clusters.

Adjacent Services

Capability	Fireworks AI	Together AI
Text + vision LLMs	Yes	Yes
Embeddings	$0.008/1M (<=150M); Qwen3 8B $0.10/1M	Yes
Audio transcription	Limited	Whisper Large v3 $0.0015/min
Code interpreter	No	$0.03/session (60 min)
Sandbox compute	No	vCPU $0.0446/hr
Managed storage	No	$0.16/GiB/month
Raw GPU clusters	No	HGX H100/H200/B200

If your product spans modalities or you want sandboxed code execution next to inference, Together's single bill is the simpler integration. If you only serve text and vision and want the cheapest serverless rates on DeepSeek and Kimi, Fireworks is the tighter fit.

When to Pick Each

Pick Fireworks AI

Serverless-heavy on DeepSeek or Kimi. $1.74/$3.48 DeepSeek V4 and $0.95/$4.00 Kimi K2.6 beat Together's rates.
Prompt-heavy agents. Flat 50% cached-input discount across text and vision models.
Predictable rate ceiling. A fixed 6,000 RPM cap you can capacity-plan against.
Strict data handling. Zero-data-retention on open models, SOC 2 Type II, HIPAA.
Many small fine-tuned adapters. Serverless LoRA serving shares the base-model pool.

Pick Together AI

You want to own GPUs. Dedicated H100 $6.49/hr, raw HGX clusters from $3.99/hr reserved.
Multimodal product. Image, audio, embeddings, rerank, and a code interpreter on one bill.
Fine-tune and export. Cheaper mid-to-large LoRA ($1.50-$2.90) with downloadable weights.
Cheapest small open models. GPT-OSS 20B at $0.05/$0.20.
Sandbox compute next to inference. Code interpreter at $0.03/session, sandbox vCPU $0.0446/hr.

For DeepSeek itself, output fidelity is the differentiator most price tables hide. Most serverless providers quantize activations to fp8 to cut cost, which degrades output quality. Morph Open Source Models serve DeepSeek with 16-bit (bf16) activations and do not quantize activations to fp8, so output matches the reference weights. That makes Morph the best place to run DeepSeek when fidelity matters, and morph-dsv4flash is $0.139 input / $0.278 output per 1M tokens (pricing). For coding agents specifically, Morph runs codegen-tuned speculative decoding plus custom low-level inference kernels, making it the fastest and highest-quality option for code generation.

Neither Fireworks nor Together is built for the coding-agent apply loop. If applying model-generated code edits is the bottleneck, that is a different tool (Morph Fast Apply, ~10,500 tok/s, with published benchmarks), and semantic code search is WarpGrep ($0 up to 100k requests).

Frequently Asked Questions

Is Fireworks AI or Together AI cheaper?

On serverless per-token pricing, Fireworks is cheaper on the most-called models as of June 2026: DeepSeek V4 Pro is $1.74 input vs Together's $2.10, Kimi K2.6 is $0.95 vs $1.20. They tie on GLM 5.1 ($1.40/$4.40), GPT-OSS 120B ($0.15/$0.60), Qwen3.6 Plus ($0.50/$3.00), and MiniMax 2.7 ($0.30/$1.20). Together is cheaper on dedicated GPUs: H100 at $6.49/hr vs $7.00, and raw HGX H100 clusters from $5.49/hr on-demand ($3.99/hr reserved).

What does DeepSeek V4 cost on Fireworks vs Together?

$1.74 per million input ($0.145 cached) and $3.48 output on Fireworks; $2.10 input ($0.20 cached) and $4.40 output on Together. Fireworks is about 17% cheaper on input and 21% cheaper on output. Fireworks also serves a DeepSeek V4 Flash variant at $0.14/$0.28 for cheaper, lower-tier traffic.

What is the H100 price on Fireworks vs Together?

Fireworks lists on-demand H100 80GB at $7.00/hr. Together lists a dedicated H100 endpoint at $6.49/hr, and a raw HGX H100 cluster at $5.49/hr on-demand or $3.99 to $9.65/hr reserved depending on commitment. Together is cheaper at every H100 tier. Fireworks bills dedicated deployments per second and scales to zero on idle.

Do Fireworks and Together support fine-tuning?

Yes, both price per million training tokens. Fireworks LoRA SFT is $0.50 up to 16B, $3.00 for 16.1B-80B, $6.00 for 80B-300B, and $10.00 above 300B (DPO is 2x the SFT rate). Together LoRA SFT is $0.48 up to 16B, $1.50 for 17B-69B, and $2.90 for 70B-100B (DPO $0.54/$1.65/$3.20). Together is cheaper on mid-to-large jobs and lets you download the fine-tuned weights.

What are the rate limits on Fireworks vs Together?

Fireworks allows 10 RPM without a payment method and a 6,000 RPM fixed ceiling with a card, gated by monthly spending tiers of $50, $500, $5,000, and $50,000. Together does not publish fixed per-model limits; its serverless limits scale with sustained traffic, and it recommends a dedicated endpoint when you need a guaranteed fixed limit.

Are Fireworks and Together HIPAA and SOC 2 compliant?

Fireworks is SOC 2 Type II and HIPAA compliant and runs zero-data-retention on open models (no prompt or generation logging without opt-in; TLS 1.2+ in transit, AES-256 at rest). Together is SOC 2 Type 2 certified with HIPAA available via BAA. Confirm a signed BAA on your plan before sending protected health information.

Related Comparisons

Cheaper Per Token, or Own the GPUs

Fireworks lists lower serverless rates on DeepSeek V4 and Kimi K2.6; Together undercuts on dedicated H100 and rents raw clusters. Separately, if applying model-generated code edits is your bottleneck, that is a different tool.

Try Morph Free

Fast Apply benchmarks

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers

Fireworks vs Together AI Pricing (2026): Fireworks Wins DeepSeek V4 at $1.74 In, Together Owns the GPU Cloud