You searched this because LangSmith got expensive, you want open source or self-hosting, or you are not on LangChain. This page ranks the real alternatives on the numbers that move a decision: every free tier, the first paid tier, the license, and what each is actually best for. Pricing verified against each vendor's published page as of June 2026.
TL;DR
For open source plus free self-hosting plus the lowest cost at scale, pick Langfuse (MIT core, ~$101/mo at 1M events). For zero-code setup across 100+ models, pick Helicone (swap your base URL, no instrumentation). For the lightest OpenTelemetry-native self-host with no event caps, pick Arize Phoenix. For offline eval and experimentation, pick Braintrust. If your team already lives in Weights & Biases, pick W&B Weave. And if you want to own the stack outright, instrument with OpenTelemetry and store spans in your own ClickHouse. None of them, including LangSmith, labels the meaning of a turn; that gap is covered at the end.
Why People Leave LangSmith
LangSmith is a capable platform, and for a small LangChain app it is the path of least resistance. The reasons teams move are structural, and all three sharpen as volume grows.
The cost cliff. The free Developer plan stops at 5k base traces a month with one seat. Plus is $39 per seat and includes 10k base traces, then overage runs $2.50 per 1k base traces at 14-day retention, or $5 per 1k for extended 400-day retention. Those increments compound. At 1M base traces a month, Plus works out to roughly $2,514 ($39 plus 990k of overage at $2.50 per 1k), for a single seat.
Closed source. You cannot read the platform, fork it, or run it in your own VPC without an Enterprise contract. For teams that treat their trace data as sensitive, or that want to avoid a per-trace meter, that is a hard stop.
LangChain coupling. LangSmith is first-party to LangChain and LangGraph. The deepest value (native tracing, Prompt Hub, annotation queues) is most fluent inside that framework. Run something else, or nothing, and the integration depth turns into glue work.
The reddit pattern
The cost cliff is the most cited trigger. The r/LangChain thread that opened the moment LangSmith left free landed on "We self-host Langfuse and are pretty happy," with another reply calling free self-hosting "a key requirement." A separate open-source thread notes the common arc: "everyone starts with hosted observability tools and eventually ends up building at least part of the stack themselves."
For the head-to-head detail, see Langfuse vs LangSmith, Braintrust vs LangSmith, and LangSmith vs Helicone.
The Alternatives at a Glance
| Tool | Free tier | First paid tier | License | Best for |
|---|---|---|---|---|
| Langfuse | 50k units/mo | Core $29/mo, 100k units, unlimited users | MIT core | Open source, self-host, cost at scale, non-LangChain |
| Helicone | 10k requests/mo | Pro $79/mo, unlimited seats | Apache-2.0 | Zero-code gateway setup across 100+ models |
| Arize Phoenix | Self-host free, no event caps | Cloud / enterprise tiers | Elastic License 2.0 (source-available) | Lightest OTel-native self-host, eval tooling |
| Braintrust | Starter: $10 credits, 1GB, 10k scores, 14-day | Pro $249/mo, 5GB, 50k scores | Closed source | Offline eval and experimentation workflows |
| W&B Weave | 1 GB/mo ingestion | Pro from $60/mo | Closed source | Teams already in Weights & Biases |
| LangSmith (baseline) | 5k base traces/mo, 1 seat | Plus $39/seat/mo, 10k base traces | Closed source | LangChain / LangGraph first-party tracing |
One note on the numbers: a Langfuse unit is any ingested event (a trace, an observation, or a score), Helicone meters requests, and LangSmith meters base traces. The quantities are not 1:1, so the cost comparisons below fix a monthly event count and run each pricing model against it rather than comparing list prices directly.
Langfuse
Pick Langfuse if you want open source, free self-hosting, and the lowest cost at scale. Cloud is free to 50k units a month; Core is $29/mo for 100k units with unlimited users; overage is $8 per 100k units, dropping to $6 at 50M+; Pro is $199/mo. At 1M events a month Core works out to about $101 ($29 plus nine $8 increments), versus roughly $2,514 on LangSmith Plus. The core repo is MIT (~28.8k stars), and self-hosting runs on Postgres, ClickHouse, Redis, and S3-compatible storage at no license cost. ClickHouse acquired Langfuse in January 2026, so the analytics engine and the product are now one company. It is framework-agnostic and ingests OpenTelemetry, so it is the default move for teams leaving LangSmith over cost, source access, or LangChain coupling. Full breakdown in Langfuse vs LangSmith.
Helicone
Pick Helicone if you want observability with zero instrumentation. It runs as an AI gateway: swap your base URL and traffic flows through it across 100+ models, no code changes, with an async OpenLLMetry mode for teams that prefer not to proxy. Free covers 10k requests a month; Pro is $79/mo with unlimited seats; Team is $799/mo with SOC-2 and HIPAA. It is Apache-2.0 (~5.8k stars) and self-hostable. The gateway model means you also get caching, rate limiting, and key management at the proxy layer, not just tracing. See LangSmith vs Helicone for the side-by-side.
Arize Phoenix
Pick Arize Phoenix if you want the lightest self-host with no event caps. It self-hosts free as a single process via pip, Docker, or Helm, and it is OpenTelemetry-native through the OpenInference conventions, so the same spans work across other OTel backends. There are no per-event limits on the self-hosted edition, which makes it attractive at high volume where a metered vendor would compound. It ships strong eval tooling alongside tracing (~10.1k stars). The one caveat worth stating plainly: Phoenix is licensed under Elastic License 2.0, which is source-available rather than OSI open source, so it is not MIT- or Apache-equivalent if a strict open-source license is a hard requirement.
Braintrust
Pick Braintrust if your center of gravity is evaluation and experimentation, not production tracing. It is eval-first: the workflow is built around running offline experiments, scoring outputs, and comparing model and prompt versions. The free Starter plan gives $10 in credits, 1GB, 10k scores, and 14-day retention with unlimited users; Pro is $249/mo for 5GB and 50k scores. Overage is $4 per GB plus $2.50 per 1k scores on Starter, or $3 per GB plus $1.50 per 1k scores on Pro. It is closed source. If your team measures "did this prompt change make the answers better" more than "what happened in production at 2am," Braintrust is built for that question. Detail in Braintrust vs LangSmith.
W&B Weave
Pick W&B Weave if your team already runs Weights & Biases for model training and experiment tracking. Weave is the LLM-observability layer inside that ecosystem, so the value is continuity: one account, one UI, one billing relationship spanning training runs and production traces. The free tier includes 1 GB a month of ingestion; Pro starts from $60/mo, with overage at $0.10 per MB. It is closed source. For a team with no existing W&B footprint, the standalone alternatives above are usually a cleaner fit; for a team already inside W&B, Weave removes a second vendor.
The DIY Route
One more option sits underneath all of these: skip the vendor and own the stack. Instrument with OpenLLMetry (Apache-2.0, ~7.2k stars, free), send the spans to your own ClickHouse (Cloud Basic starts around $66/mo, or self-host the Apache-2.0 build for free), and visualize in Grafana OSS. It is not exotic; the vendors already run ClickHouse under the hood (both Langfuse and Helicone migrated to it). The payoff is no per-trace meter and full control of the data; the cost is that you operate the pipeline yourself. It is one option among the alternatives, not a free lunch. The full build is in build your own LLM observability.
The reason this keeps coming up is portability. An r/LangChain thread on what teams actually use circles back to instrumenting with OpenTelemetry so the backend stays swappable, and a r/LocalLLaMA thread on free LangSmith alternatives lands in the same place.
What Every Alternative Still Misses: Semantic Signals
Every tool above, LangSmith included, measures the mechanics of a call. None of it measures the meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.
These failures are semantic, so the fix is a label on the content of each turn: jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, domain, or a signal specific to your product. Every tool here approximates this with LLM-as-judge evals, which run offline on samples. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the span as an attribute. The base model is morph-reflex-v1, the built-in signals ship live and self-serve, and a custom signal takes under an hour to add.
Score a turn, then attach it to your trace
curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'
# {
# "model": "stuck-in-a-loop",
# "mode": "single_label",
# "classes": [
# { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
# { "class_id": 1, "label": "looping", "score": 0.96, "selected": true }
# ],
# "inference_time_ms": 88
# }
# The selected label is the class with "selected": true.The label comes back as an API response, not a dashboard panel, so it composes with whichever alternative you picked: write it onto the span, alert on it in Slack, or route on it inline. It complements a tracing platform; it does not replace one.
Frequently Asked Questions
What is the best open-source alternative to LangSmith?
Langfuse for most teams: MIT core, free self-hosting, and about $101/mo at 1M events versus roughly $2,514 on LangSmith Plus. If you want the lightest OpenTelemetry-native self-host with no event caps, Arize Phoenix is the other strong pick, though it is source-available under Elastic License 2.0 rather than OSI open source. See the table for the full set.
Why do teams move off LangSmith?
Cost (1M base traces is roughly $2,514/mo on Plus), closed source (Enterprise-only self-hosting), and LangChain coupling. The why people leave section breaks down each.
Which LangSmith alternative is cheapest at scale?
Langfuse among hosted vendors (~$101/mo at 1M events, unlimited users). Arize Phoenix self-hosted has no event caps and no license cost. The cheapest path overall is owning the stack, covered in the DIY route.
Do any of these alternatives catch wrong answers and frustrated users?
No. Every one records structure (prompts, responses, latency, spans); none labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.
Related comparisons
Langfuse Alternatives
When MIT-core Langfuse isn't the fit: LangSmith, Helicone, Phoenix, Braintrust, and self-hosting on your own ClickHouse.
Langfuse vs LangSmith
MIT-core and self-hostable vs first-party LangChain. Full pricing math: 1M traces costs $101/mo on Langfuse, $2,514/mo on LangSmith.
LangSmith vs Helicone
SDK tracing tied to LangChain vs a one-line Apache-2.0 proxy. Free tiers, the network-hop tradeoff, and what each misses.
Langfuse vs Helicone
Two open-source paths: Langfuse's SDK + ClickHouse stack vs Helicone's drop-in gateway. Both run on ClickHouse; you can too.
Arize Phoenix vs Langfuse
OTel-native, no event caps, one process vs the heavier MIT platform. The self-host footprint and lock-in question, settled.
Braintrust vs LangSmith
Eval-first scoring vs trace-first monitoring. Where per-score billing beats per-trace, and where it doesn't.
Add the layer the trace cannot see
Whichever alternative you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.
