Langfuse Alternatives (2026): Lighter Self-Host, First-Party LangChain, Zero-Code Gateways, and Eval-First Tools

Langfuse is the default open-source pick, but its self-host stack is web + worker, Postgres, ClickHouse, Redis, and S3. Searchers usually want one of four things: something lighter (Arize Phoenix), first-party LangChain (LangSmith), a zero-code gateway (Helicone), or eval-first tooling (Braintrust, W&B Weave). Verified pricing, licenses, and self-host footprints for each.

June 15, 2026 · 1 min read
Langfuse Alternatives (2026): Lighter Self-Host, First-Party LangChain, Zero-Code Gateways, and Eval-First Tools

Langfuse is the default open-source pick for LLM observability, but its self-host stack is web and worker containers plus Postgres, ClickHouse, Redis/Valkey, and S3, it is not first-party to LangChain, and it was acquired by ClickHouse in January 2026. Most people searching for alternatives want one of four things: something lighter, first-party LangChain, a zero-code gateway, or eval-first tooling. Here is the best fit for each, with pricing verified against every vendor's published page as of June 2026.

5 stacks
To replace Postgres + ClickHouse + Redis + S3
1 process
Arize Phoenix, the lightest self-host
Jan 2026
ClickHouse acquired Langfuse

TL;DR

Want something lighter to self-host? Arize Phoenix runs as one process, OpenTelemetry-native, no event caps, free under the Elastic License 2.0. Want first-party LangChain? LangSmith, free to 5k traces then $39 per seat. Want a zero-code gateway across 100+ models? Helicone, base-URL swap, Apache-2.0, free to 10k requests. Want eval-first experimentation? Braintrust or W&B Weave. Want to own nothing extra? Instrument with OpenLLMetry and point spans at your own ClickHouse, the same engine Langfuse runs on. None of them labels the meaning of a turn, which is the gap a semantic classifier fills.

Why Look Past Langfuse

Langfuse earns its default status: MIT core, free self-hosting, ~28.8k GitHub stars, and a polished product. The reasons to look past it are specific, not vague.

The self-host footprint is heavy. The v3 architecture splits transactional data (Postgres), analytics (ClickHouse), queues and cache (Redis/Valkey), and event payloads (S3-compatible storage) across separate services, run as web and worker containers. That separation is why it scales, and also why it is more to operate than a single-binary tool. If you cannot run ClickHouse, that one constraint rules out the self-host path.

It is not first-party to LangChain. Langfuse is framework-agnostic by design, which is a strength for polyglot stacks and a friction point if you live entirely in LangChain or LangGraph, where LangSmith traces with no glue code.

ClickHouse acquired it in January 2026. The acquisition put the analytics engine and the product under one company. That is reassuring for some teams and a prompt to re-evaluate for others, especially anyone wary of consolidating their storage and their observability vendor into the same hands.

The reddit reads

The open-source question gets asked constantly. An r/LangChain thread on the best open-source options surfaces Langfuse and Phoenix as the two names that come up first, with Phoenix repeatedly cited as the lighter, OTel-native pick. A separate "what is everyone actually using" thread has a blunt top reply: "instrument everything via native OpenTelemetry so you can swap backends when you inevitably get frustrated."

The Alternatives at a Glance

Five products, plus the DIY path below. The column that decides most purchases is "best for," because these tools are not interchangeable: a gateway and an eval harness solve different problems.

ToolFree tierFirst paidLicenseSelf-host footprintBest for
Langfuse50k units/mo$29/mo, 100k unitsMIT corePostgres + ClickHouse + Redis + S3The OSS baseline
Arize PhoenixFree, no capsSelf-host freeElastic License 2.0One process (pip/Docker/Helm)Lightest self-host, OTel-native
LangSmith5k traces/mo$39/seat/moClosedEnterprise onlyFirst-party LangChain
Helicone10k requests/mo$79/mo ProApache-2.0Self-hostable (gateway)Zero-code gateway, 100+ models
Braintrust$10 credits / 1GB$249/mo ProClosedCloud / hybridEval-first experimentation
W&B Weave1 GB/moFrom $60/moClosedCloudExisting W&B users

Arize Phoenix

Phoenix is the closest answer to "Langfuse, but lighter." It runs as a single process you start with one pip install or docker run, with no ClickHouse, Redis, or object store to stand up first. It is OpenTelemetry-native through OpenInference, so instrumentation is portable: spans you emit for Phoenix work with any OTel backend, and you are never re-instrumenting because you swapped tools. Self-hosting is free, with no event caps, under the Elastic License 2.0. That license is source-available, not OSI-approved open source, which constrains reselling it as a service but is a non-issue for internal use. It carries ~10.1k stars and ships evals as a first-class feature.

Pick Phoenix if

You want the lightest self-host (one process, no separate analytics store to operate), OpenTelemetry portability so you are not locked to a vendor SDK, and evals built in. The Elastic License is fine for internal observability. This is the top "lighter than Langfuse" pick.

LangSmith

LangSmith is the alternative for teams that live in LangChain. It is first-party to LangChain and LangGraph, so tracing, Prompt Hub, annotation queues, and evals work with zero assembly. The free Developer plan gives 5k base traces a month on one seat; Plus is $39 per seat and includes 10k base traces, then overage is $2.50 per 1k base traces at 14-day retention ($5 per 1k for extended 400-day retention). It is closed source, and self-hosting in your own VPC is an Enterprise feature with custom pricing. The trade is integration depth against openness and cost at volume. We work the full pricing math in Langfuse vs LangSmith.

Pick LangSmith if

You are committed to LangChain or LangGraph, want first-party tracing with no glue code, and your trace volume stays modest enough that per-trace overage never compounds. You do not need open source or free self-hosting.

Helicone

Helicone is the zero-code option. In gateway mode you swap your provider base URL for Helicone's, and every request routes through it, traced, across 100+ models, with no SDK to install. If you would rather not sit a proxy in your request path, it also offers an async OpenLLMetry mode that ships spans out of band. The free tier is 10k requests a month; Pro is $79/mo and Team is $799/mo. It is Apache-2.0 (true OSI open source, unlike Phoenix's Elastic License) with ~5.8k stars. We cover the gateway-vs-SDK trade in Langfuse vs Helicone.

Pick Helicone if

You want tracing with no code changes, a base-URL swap across many models, and an Apache-2.0 license. A gateway in the request path is an acceptable trade for the convenience, or you use the async mode to avoid it.

Braintrust

Braintrust leads with evaluation rather than live tracing. It is built for offline experimentation: running prompt and model variants against datasets, scoring outputs, and comparing experiments before they ship. The free Starter plan gives $10 in credits, 1 GB of data, and 10k scores; Pro is $249/mo, with Starter overage at $4 per GB plus $2.50 per 1k scores. It is closed source. If your bottleneck is "is this prompt change actually better" more than "what happened in production," Braintrust is built for that question.

Pick Braintrust if

Your priority is offline eval and experimentation (prompt iteration, model comparison, scored datasets) rather than live production traces, and a closed-source SaaS is acceptable.

W&B Weave

Weave is the LLM observability layer from Weights & Biases. Its strongest case is gravity: if your team already runs W&B for ML experiment tracking, Weave keeps LLM traces and evals in the same place you already log everything else. The free tier includes 1 GB a month of ingestion; paid starts from $60/mo, with overage at $0.10 per MB. It is closed source. For a greenfield team with no W&B footprint, the other options here are usually a tighter fit; for an existing W&B shop, the consolidation is the whole point.

Pick W&B Weave if

You already use Weights & Biases and want LLM traces and evals in the same platform as your existing ML experiment tracking, rather than standing up a separate tool.

The Lightest Path: Own a Single ClickHouse

There is a path that sits under all of these, and it is especially natural here because Langfuse itself runs on ClickHouse. Instead of operating the multi-service Langfuse stack, you can own the storage layer directly: instrument with OpenLLMetry (Apache-2.0, ~7.2k stars, free), send OpenTelemetry spans to your own ClickHouse, and put a Grafana OSS dashboard on top. This is not exotic. Langfuse, Helicone, and SigNoz are all built on ClickHouse, so owning it yourself removes a layer rather than adding one.

The cost is modest: the instrumentation (OpenLLMetry) and dashboard (Grafana OSS) are free, self-hosted ClickHouse is Apache-2.0 and free, and a managed ClickHouse Cloud Basic instance starts around $66/mo. The win is portability: instrument once with OTel and the backend becomes swappable. The cost is that you build and own the dashboards a product would hand you. We walk through the whole build in build your own LLM observability.

What Every Alternative Still Misses: Semantic Signals

Every tool above, Langfuse included, measures the mechanics of a call and not its meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.

These failures are semantic, so the fix is a label on the content of each turn: is_user_frustrated, stuck-in-a-loop, leaked-thinking, jailbreak, or a signal specific to your product. Phoenix, LangSmith, Braintrust, and Weave approximate this with LLM-as-judge evals, which run offline on samples. A Morph Reflex is a classifier on the base model morph-reflex-v1 that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the span as an attribute. Built-in signals cover jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, and domain, and you can train a custom signal in under an hour.

Score a turn, then attach it to your trace

curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'

# {
#   "model": "stuck-in-a-loop",
#   "mode": "single_label",
#   "classes": [
#     { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
#     { "class_id": 1, "label": "looping",     "score": 0.96, "selected": true }
#   ],
#   "inference_time_ms": 88
# }
# The label is the class with "selected": true (here, "looping").

The label comes back as an API response, not a dashboard panel, so it composes with whichever tool you pick: write it onto the Phoenix or LangSmith span, alert on it in Slack, or route on it inline. It complements a tracing platform; it does not replace one. Reflexes is live and self-serve.

Frequently Asked Questions

What is the best open-source alternative to Langfuse?

For a lighter self-host, Arize Phoenix: one process, no event caps, OpenTelemetry-native, free under the Elastic License 2.0 (source-available, not OSI open source). For true OSI open source with a zero-code gateway, Helicone (Apache-2.0). See Arize Phoenix and Helicone.

Why are teams moving off Langfuse in 2026?

A heavy self-host footprint (Postgres + ClickHouse + Redis + S3), no first-party LangChain integration, and the ClickHouse acquisition in January 2026. See why look past Langfuse.

What is the lightest Langfuse alternative to self-host?

Arize Phoenix (one process), or owning a single ClickHouse with OpenLLMetry and Grafana if you want to skip a product entirely. See own a single ClickHouse.

Which alternative is best for LangChain users?

LangSmith, first-party to LangChain and LangGraph, free to 5k traces then $39 per seat. The full pricing math is in Langfuse vs LangSmith.

Do any of these catch wrong answers and frustrated users?

No. Every tool here records structure (prompts, responses, latency, spans); none labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.

Related comparisons

Add the layer no tracing tool can see

Whichever alternative you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.