The most searched head-to-head in LLM observability, decided on the numbers that move a purchase: every free tier limit, the cost at 1M events a month, the real self-host footprint of each, and the failures neither tool catches. Pricing verified against each vendor's published page as of June 2026.
TL;DR
Pick Langfuse for open source, free self-hosting, unlimited seats on a $29/mo plan, and far lower cost past 100k events a month. Pick LangSmith if you are committed to LangChain or LangGraph and want first-party tracing, Prompt Hub, and annotation queues with zero assembly, and your volume stays modest. The cost curve favors Langfuse the moment you scale; the integration curve favors LangSmith if you never leave its framework.
Quick Comparison
| Dimension | Langfuse | LangSmith |
|---|---|---|
| License | MIT core (ee/ under enterprise license) | Closed source |
| Free tier | 50k units/mo, 2 users, 30-day access | 5k base traces/mo, 1 seat |
| First paid tier | Core $29/mo, 100k units, unlimited users | Plus $39/seat/mo, 10k base traces |
| Overage | $8 per 100k units (down to $6 at 50M+) | $2.50 per 1k base traces (14-day), $5 per 1k extended (400-day) |
| Self-hosting | Free: docker compose, Helm, or Terraform | Enterprise plan only, custom pricing |
| Trace store | ClickHouse (+ Postgres, Redis, S3) | Managed (closed) |
| Ecosystem fit | Framework-agnostic | First-party LangChain / LangGraph |
| Retention | 30d Hobby, 90d Core, 3yr Pro ($199/mo) | 14d base, 400d extended |
Pricing and the Cost at Volume
Both meter differently, which is where the gap opens. Langfuse bills units (any ingested event: a trace, an observation, or a score). LangSmith bills base traces. The quantities are not 1:1, so the honest comparison fixes a monthly event count and runs each pricing model against it.
At 1M events per month: Langfuse Core is $29 plus nine increments of $8 per 100k = $101/mo, unlimited users. LangSmith Plus is $39 for the seat plus 990k traces of overage at $2.50 per 1k = $2,514/mo, for one seat, at 14-day retention. The caveat that keeps this honest: a Langfuse unit is one ingested event, so a single agent trace with multiple observations and scores burns multiple units. Even assuming several units per trace, the order-of-magnitude gap survives.
The reddit consensus
The cost cliff is the most cited reason teams move. In an r/LangChain thread that started the moment LangSmith left free, the top reply was simply "We self-host Langfuse and are pretty happy so far," with another commenter calling free self-hosting "a key requirement." The direct vs thread lands the same way: "opensource and built using otel so less risk of vendor lock in with langfuse."
Self-Hosting Footprint
Free self-hosting is Langfuse's structural advantage, but it is not weightless. The v3 architecture splits transactional data (Postgres), analytics (ClickHouse), queues and cache (Redis/Valkey), and event payloads (S3-compatible storage) into separate services, run as web and worker containers. That separation is why it scales, and also why it is heavier than a single-binary tool. If you cannot run ClickHouse, that one constraint rules out the self-host path.
| Langfuse | LangSmith | |
|---|---|---|
| Stack | Web + worker, Postgres, ClickHouse, Redis/Valkey, S3 | Runs in your VPC (managed image) |
| Deploy | docker compose, Helm, or AWS/Azure/GCP Terraform | Enterprise plan only |
| License cost | $0 (MIT core) | Custom annual pricing |
Worth knowing: ClickHouse acquired Langfuse in January 2026, so the analytics engine under Langfuse and the company shipping it are now one and the same.
Lock-In and OpenTelemetry
The lock-in question is really an instrumentation question: if you rip the tool out in a year, do you re-instrument your codebase? Langfuse ingests OpenTelemetry spans, so a team can instrument once with OTel and keep Langfuse as a swappable backend. LangSmith is closed and first-party; its deepest value (Prompt Hub, native LangGraph tracing) is also what binds you to the LangChain stack. Neither is wrong, but they pull in opposite directions: Langfuse toward portability, LangSmith toward integration.
When LangSmith Wins
- You are all-in on LangChain or LangGraph and want first-party tracing with no glue code.
- You want Prompt Hub and annotation queues as part of the product, not assembled.
- Your trace volume stays modest, so per-trace overage never compounds into the four-figure range.
When Langfuse Wins
- You want open source (MIT core) and the option to self-host for free.
- You need unlimited seats without paying per head ($29/mo Core).
- Your volume is past ~100k events a month, where the $8-per-100k curve crushes $2.50-per-1k.
- You run more than one framework, or none, and want framework-agnostic tracing.
The Third Option: Own the Stack
There is a path neither vendor advertises, and a growing number of teams take it: instrument with OpenTelemetry, store the spans in your own ClickHouse, and skip the per-trace meter entirely. It is not exotic. Langfuse runs on ClickHouse, Helicone migrated to ClickHouse, and SigNoz is built on it. The instrumentation layer (OpenLLMetry, Apache-2.0) and the dashboard (Grafana OSS) are free; a small ClickHouse Cloud starts around $66/mo. In an r/LangChain thread from this month, the top reply was blunt: "instrument everything via native OpenTelemetry so you can swap backends when you inevitably get frustrated." We walk through the whole build in build your own LLM observability.
What Both Miss: Semantic Signals
Everything above measures the mechanics of a call. None of it measures the meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.
These failures are semantic, so the fix is a label on the content of each turn: is_user_frustrated, stuck-in-a-loop, leaked-thinking, jailbreak, or a signal specific to your product. Both Langfuse and LangSmith approximate this with LLM-as-judge evals, which run offline on samples. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the Langfuse or LangSmith span as an attribute.
Score a turn, then attach it to your trace
curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'
# {
# "model": "stuck-in-a-loop",
# "mode": "single_label",
# "classes": [
# { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
# { "class_id": 1, "label": "looping", "score": 0.96, "selected": true }
# ],
# "inference_time_ms": 88
# }The label comes back as an API response, not a dashboard panel, so it composes with whichever tool you picked: write it onto the span, alert on it in Slack, or route on it inline. It complements a tracing platform; it does not replace one.
Frequently Asked Questions
Langfuse vs LangSmith: which is cheaper?
Langfuse, by a wide margin at volume. At 1M events a month Langfuse Core is about $101 with unlimited users; LangSmith Plus is about $2,514 for one seat. The pricing section shows the math and the unit-vs-trace caveat.
Is Langfuse open source and LangSmith not?
Yes. Langfuse's core repo is MIT (ee/ folders under a separate license); LangSmith is closed source with Enterprise-only self-hosting.
Should I pick LangSmith if I use LangChain?
Often yes, if your volume stays modest: it is first-party to LangChain and LangGraph. Past ~100k traces a month, or if you need self-hosting or open source, Langfuse wins. See when Langfuse wins.
Do Langfuse or LangSmith catch wrong answers and frustrated users?
No. Both record structure (prompts, responses, latency, spans); neither labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.
Related comparisons
LangSmith Alternatives
Seven alternatives by use case: Langfuse, Helicone, Phoenix, Braintrust, Weave, plus the OpenTelemetry + ClickHouse DIY route.
Langfuse Alternatives
When MIT-core Langfuse isn't the fit: LangSmith, Helicone, Phoenix, Braintrust, and self-hosting on your own ClickHouse.
LangSmith vs Helicone
SDK tracing tied to LangChain vs a one-line Apache-2.0 proxy. Free tiers, the network-hop tradeoff, and what each misses.
Langfuse vs Helicone
Two open-source paths: Langfuse's SDK + ClickHouse stack vs Helicone's drop-in gateway. Both run on ClickHouse; you can too.
Arize Phoenix vs Langfuse
OTel-native, no event caps, one process vs the heavier MIT platform. The self-host footprint and lock-in question, settled.
Braintrust vs LangSmith
Eval-first scoring vs trace-first monitoring. Where per-score billing beats per-trace, and where it doesn't.
Add the layer the trace cannot see
Whichever platform you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.
