LangSmith Alternatives (2026): Open Source, Self-Host, and Cost at Scale

You searched this because LangSmith got expensive, you want open source or self-hosting, or you are not on LangChain. This page ranks the real alternatives on the numbers that move a decision: every free tier, the first paid tier, the license, and what each is actually best for. Pricing verified against each vendor's published page as of June 2026.

$2,514/mo

LangSmith Plus at 1M base traces

$101/mo

Langfuse Core at 1M events

Arize Phoenix and Langfuse self-host license

TL;DR

For open source plus free self-hosting plus the lowest cost at scale, pick Langfuse (MIT core, ~$101/mo at 1M events). For zero-code setup across 100+ models, pick Helicone (swap your base URL, no instrumentation). For the lightest OpenTelemetry-native self-host with no event caps, pick Arize Phoenix. For open-source observability purpose-built around agents, pick Laminar (Apache-2.0). For offline eval and experimentation, pick Braintrust. If your team already lives in Weights & Biases, pick W&B Weave. And if you want to own the stack outright, instrument with OpenTelemetry and store spans in your own ClickHouse. None of them, including LangSmith, labels the meaning of a turn; that gap is covered at the end.

Why People Leave LangSmith

LangSmith is a capable platform, and for a small LangChain app it is the path of least resistance. The reasons teams move are structural, and all three sharpen as volume grows.

The cost cliff. The free Developer plan stops at 5k base traces a month with one seat. Plus is $39 per seat and includes 10k base traces, then overage runs $2.50 per 1k base traces at 14-day retention, or $5 per 1k for extended 400-day retention. Those increments compound. At 1M base traces a month, Plus works out to roughly $2,514 ($39 plus 990k of overage at $2.50 per 1k), for a single seat.

Closed source. You cannot read the platform, fork it, or run it in your own VPC without an Enterprise contract. For teams that treat their trace data as sensitive, or that want to avoid a per-trace meter, that is a hard stop.

LangChain coupling. LangSmith is first-party to LangChain and LangGraph. The deepest value (native tracing, Prompt Hub, annotation queues) is most fluent inside that framework. Run something else, or nothing, and the integration depth turns into glue work.

The reddit pattern

The cost cliff is the most cited trigger. The r/LangChain thread that opened the moment LangSmith left free landed on "We self-host Langfuse and are pretty happy," with another reply calling free self-hosting "a key requirement." A separate open-source thread notes the common arc: "everyone starts with hosted observability tools and eventually ends up building at least part of the stack themselves."

For the head-to-head detail, see Langfuse vs LangSmith and Braintrust vs LangSmith.

The Alternatives at a Glance

LangSmith alternatives (June 2026)

Tool	Free tier	First paid tier	License	Best for
Langfuse	50k units/mo	Core $29/mo, 100k units, unlimited users	MIT core	Open source, self-host, cost at scale, non-LangChain
Helicone	10k requests/mo	Pro $79/mo, unlimited seats	Apache-2.0	Zero-code gateway setup across 100+ models
Arize Phoenix	Self-host free, no event caps	Cloud / enterprise tiers	Elastic License 2.0 (source-available)	Lightest OTel-native self-host, eval tooling
Laminar	1 GB data/mo, 7-day retention, 1 seat	Hobby $30/mo, 3 GB, unlimited seats	Apache-2.0	Agent-native OTel tracing, plain-English Signals
Braintrust	Starter: $10 credits, 1GB, 10k scores, 14-day	Pro $249/mo, 5GB, 50k scores	Closed source	Offline eval and experimentation workflows
W&B Weave	1 GB/mo ingestion	Pro from $60/mo	Closed source	Teams already in Weights & Biases
LangSmith (baseline)	5k base traces/mo, 1 seat	Plus $39/seat/mo, 10k base traces	Closed source	LangChain / LangGraph first-party tracing

One note on the numbers: a Langfuse unit is any ingested event (a trace, an observation, or a score), Helicone meters requests, and LangSmith meters base traces. The quantities are not 1:1, so the cost comparisons below fix a monthly event count and run each pricing model against it rather than comparing list prices directly.

Langfuse

Pick Langfuse if you want open source, free self-hosting, and the lowest cost at scale. Cloud is free to 50k units a month; Core is $29/mo for 100k units with unlimited users; overage is $8 per 100k units, dropping to $6 at 50M+; Pro is $199/mo. At 1M events a month Core works out to about $101 ($29 plus nine $8 increments), versus roughly $2,514 on LangSmith Plus. The core repo is MIT (~28.8k stars), and self-hosting runs on Postgres, ClickHouse, Redis, and S3-compatible storage at no license cost. ClickHouse acquired Langfuse in January 2026, so the analytics engine and the product are now one company. It is framework-agnostic and ingests OpenTelemetry, so it is the default move for teams leaving LangSmith over cost, source access, or LangChain coupling. Full breakdown in Langfuse vs LangSmith.

Helicone

Pick Helicone if you want observability with zero instrumentation. It runs as an AI gateway: swap your base URL and traffic flows through it across 100+ models, no code changes, with an async OpenLLMetry mode for teams that prefer not to proxy. Free covers 10k requests a month; Pro is $79/mo with unlimited seats; Team is $799/mo with SOC-2 and HIPAA. It is Apache-2.0 (~5.8k stars) and self-hostable. The gateway model means you also get caching, rate limiting, and key management at the proxy layer, not just tracing.

Arize Phoenix

Pick Arize Phoenix if you want the lightest self-host with no event caps. It self-hosts free as a single process via pip, Docker, or Helm, and it is OpenTelemetry-native through the OpenInference conventions, so the same spans work across other OTel backends. There are no per-event limits on the self-hosted edition, which makes it attractive at high volume where a metered vendor would compound. It ships strong eval tooling alongside tracing (~10.1k stars). The one caveat worth stating plainly: Phoenix is licensed under Elastic License 2.0, which is source-available rather than OSI open source, so it is not MIT- or Apache-equivalent if a strict open-source license is a hard requirement.

Laminar

Pick Laminar if your workload is agents specifically and you want open source built around that case. Laminar (YC S24, Apache-2.0, ~3.1k stars) is purpose-built for AI-agent observability: its OpenTelemetry-native SDK auto-instruments the Claude Agent SDK, OpenAI Agents SDK, Vercel AI SDK, LangChain, Browser Use, and Playwright in one line, and the trace view renders a run as a readable transcript of reasoning, tool calls, and sub-agents rather than a flat span list. The ingestion engine is Rust with ~20x trace compression. Free is 1 GB of data with 7-day retention and one seat; Hobby is $30/mo (3 GB included, then $2/GB, 30-day retention, unlimited seats); Pro is $150/mo (10 GB included, then $1.50/GB, 6-month retention). Every feature ships in the OSS image, self-hostable via Docker Compose or Helm.

Braintrust

Pick Braintrust if your center of gravity is evaluation and experimentation, not production tracing. It is eval-first: the workflow is built around running offline experiments, scoring outputs, and comparing model and prompt versions. The free Starter plan gives $10 in credits, 1GB, 10k scores, and 14-day retention with unlimited users; Pro is $249/mo for 5GB and 50k scores. Overage is $4 per GB plus $2.50 per 1k scores on Starter, or $3 per GB plus $1.50 per 1k scores on Pro. It is closed source. If your team measures "did this prompt change make the answers better" more than "what happened in production at 2am," Braintrust is built for that question. Detail in Braintrust vs LangSmith.

W&B Weave

Pick W&B Weave if your team already runs Weights & Biases for model training and experiment tracking. Weave is the LLM-observability layer inside that ecosystem, so the value is continuity: one account, one UI, one billing relationship spanning training runs and production traces. The free tier includes 1 GB a month of ingestion; Pro starts from $60/mo, with overage at $0.10 per MB. It is closed source. For a team with no existing W&B footprint, the standalone alternatives above are usually a cleaner fit; for a team already inside W&B, Weave removes a second vendor.

The DIY Route

One more option sits underneath all of these: skip the vendor and own the stack. Instrument with OpenLLMetry (Apache-2.0, ~7.2k stars, free), send the spans to your own ClickHouse (Cloud Basic starts around $66/mo, or self-host the Apache-2.0 build for free), and visualize in Grafana OSS. It is not exotic; the vendors already run ClickHouse under the hood (both Langfuse and Helicone migrated to it). The payoff is no per-trace meter and full control of the data; the cost is that you operate the pipeline yourself. It is one option among the alternatives, not a free lunch. The full build is in build your own LLM observability.

The reason this keeps coming up is portability. An r/LangChain thread on what teams actually use circles back to instrumenting with OpenTelemetry so the backend stays swappable, and a r/LocalLLaMA thread on free LangSmith alternatives lands in the same place.

What Every Alternative Still Misses: Semantic Signals

Every tool above measures the mechanics of a call. Laminar's Signals reach for the meaning: you describe a behavior like "agent is stuck in a loop" in plain English and it reads completed runs and pings Slack when it sees one. That is the closest any tool here comes, but it runs as background analysis after the run finishes, metered by LLM tokens ($0.4 per 1M input on Pro), and alerts you rather than labeling the turn in band. Everything else measures only structure. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.

These failures are semantic, so the fix is a label on the content of each turn: jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, domain, or a signal specific to your product. The tools here approximate this with LLM-as-judge evals on offline samples, or with background analysis like Laminar's Signals, both of which land after the fact. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the span as an attribute or gate the response before it ships. The built-in signals ship live and self-serve, and a custom signal takes under an hour to add.

Score a turn, then attach it to your trace

curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'

# {
#   "model": "stuck-in-a-loop",
#   "mode": "single_label",
#   "classes": [
#     { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
#     { "class_id": 1, "label": "looping",     "score": 0.96, "selected": true }
#   ],
#   "inference_time_ms": 88
# }
# The selected label is the class with "selected": true.

The label comes back as an API response, not a dashboard panel, so it composes with whichever alternative you picked: write it onto the span, alert on it in Slack, or route on it inline. It complements a tracing platform; it does not replace one.

Frequently Asked Questions

What is the best open-source alternative to LangSmith?

Langfuse for most teams: MIT core, free self-hosting, and about $101/mo at 1M events versus roughly $2,514 on LangSmith Plus. If you want the lightest OpenTelemetry-native self-host with no event caps, Arize Phoenix is the other strong pick, though it is source-available under Elastic License 2.0 rather than OSI open source. See the table for the full set.

Why do teams move off LangSmith?

Cost (1M base traces is roughly $2,514/mo on Plus), closed source (Enterprise-only self-hosting), and LangChain coupling. The why people leave section breaks down each.

Which LangSmith alternative is cheapest at scale?

Langfuse among hosted vendors (~$101/mo at 1M events, unlimited users). Arize Phoenix self-hosted has no event caps and no license cost. The cheapest path overall is owning the stack, covered in the DIY route.

Do any of these alternatives catch wrong answers and frustrated users?

Mostly no. Laminar's Signals flag plain-English behaviors from completed runs, but as after-the-fact background analysis; the rest record only structure (prompts, responses, latency, spans). Catching wrong answers, frustration, or looping inline needs a per-turn classifier on top, covered in semantic signals.

Add the layer the trace cannot see

Whichever alternative you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.

Read the Reflexes docs

Build your own stack

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers

LangSmith Alternatives (2026): The Open-Source, Self-Host, and Cost-at-Scale Options