Two open-source observability tools that both run on ClickHouse, decided on the one thing that actually differs: how you instrument. Langfuse is an SDK-traced platform; Helicone is a base-URL-swap gateway. Every free tier limit, every paid plan, the self-host footprint of each, and the failures neither catches. Pricing verified against each vendor's published page as of June 2026.
TL;DR
Pick Helicone when you want logs in minutes with zero code: it is an AI gateway in front of 100+ models, so a single base-URL swap starts capturing requests, and an async OpenLLMetry mode removes the proxy hop when you do not want it. Pick Langfuse when you want a full platform you instrument with an SDK: tracing plus prompt management, LLM-as-judge and code evals, datasets, and a playground, MIT core, self-hostable. Both are open source and both run on ClickHouse, so the decision is the instrumentation model, not the data store.
Quick Comparison
| Dimension | Langfuse | Helicone |
|---|---|---|
| License | MIT core (ee/ under enterprise license) | Apache-2.0 |
| Instrumentation | SDK tracing in your code | Base-URL swap gateway (or async OpenLLMetry) |
| Free tier | 50k units/mo, 2 users, 30-day access | 10k requests/mo, 1 seat, 1 GB, 7-day retention |
| First paid tier | Core $29/mo, 100k units, unlimited users | Pro $79/mo, unlimited seats, alerts, HQL |
| Top tier | Pro $199/mo, 3-year retention | Team $799/mo, 5 orgs, SOC-2 + HIPAA |
| Request-path hop | None (async SDK flush) | One hop via gateway (none in async mode) |
| Beyond tracing | Prompt mgmt, evals, datasets, playground | Gateway in front of 100+ models, caching, alerts |
| Trace store | ClickHouse (+ Postgres, Redis, S3) | ClickHouse (migrated from Postgres) |
The Instrumentation Tradeoff
This is the whole decision in one paragraph. Langfuse's SDK lives in your code and flushes spans asynchronously, so it adds no latency to the request path, but you have to write the instrumentation. Helicone's gateway sits in front of your model calls, so you change one base URL and ship zero code, but every request now takes one extra network hop through the proxy. Helicone's async OpenLLMetry mode collapses that tradeoff by logging out-of-band, removing the hop at the cost of doing a little instrumentation. So the matrix is: code change plus no hop (Langfuse SDK), no code change plus one hop (Helicone gateway), or a little code plus no hop (Helicone async).
Pick by where you want the work
If a base-URL swap is all the change you can make this sprint, Helicone's gateway is the fastest path to logs. If you cannot accept a per-request hop on a latency-sensitive product and you can spend an afternoon instrumenting, Langfuse's SDK (or Helicone's async mode) keeps the request path untouched.
Pricing
They meter differently, so read each against your own volume. Langfuse bills units (any ingested event: a trace, an observation, or a score). Helicone bills requests. A single agent turn is one Helicone request but several Langfuse units, so the two counts are not 1:1.
| Plan | Langfuse | Helicone |
|---|---|---|
| Free | 50k units/mo, 2 users, 30-day access | 10k requests/mo, 1 seat, 1 GB, 7-day |
| First paid | Core $29/mo, 100k units, unlimited users, 90-day | Pro $79/mo, unlimited seats, alerts, HQL |
| Overage | $8 per 100k units (down to $6 at 50M+) | Included in plan tiers |
| Top | Pro $199/mo, 3-year retention | Team $799/mo, 5 orgs, SOC-2 + HIPAA |
Langfuse's overage curve ($8 per 100k units, dropping to $6 above 50M units a month) makes it predictable at high volume. Helicone bundles features into flat plan tiers rather than per-request overage, so the choice at scale comes down to whether your spend is driven by event volume (Langfuse's model) or by seats, orgs, and compliance needs (Helicone's model).
Self-Hosting Footprint
Both self-host for free, and both lean on ClickHouse, so the ClickHouse question (can you run it?) applies to either. The shape of the stack differs.
| Langfuse | Helicone | |
|---|---|---|
| Services | Web + worker, Postgres, ClickHouse, Redis/Valkey, S3 | Next.js, Jawn collector, Supabase, ClickHouse, MinIO |
| Deploy | docker compose, Helm, or AWS/Azure/GCP Terraform | 5-service docker compose |
| License cost | $0 (MIT core) | $0 (Apache-2.0) |
Helicone runs a 5-service compose (Next.js app, the Jawn collector, Supabase, ClickHouse, and MinIO for object storage). Langfuse splits into web and worker containers backed by Postgres, ClickHouse, Redis/Valkey, and S3-compatible storage. Neither is a single binary, because both put analytics in ClickHouse, and Helicone's move to ClickHouse is on record: it migrated from Postgres to ClickHouse and reported query times falling from hundreds of seconds to about half a second. Langfuse, for its part, was acquired by ClickHouse in January 2026.
When Helicone Wins
- You want observability with zero code: a base-URL swap in front of 100+ models starts logging immediately.
- You want a gateway, not just a tracer: caching, alerts, and rate limiting at the proxy layer.
- You need unlimited seats early ($79/mo Pro) or SOC-2 and HIPAA ($799/mo Team).
- You can live with one network hop per request, or you adopt the async OpenLLMetry mode to remove it.
When Langfuse Wins
- You want a full platform, not just logs: prompt management, LLM-as-judge and code evals, datasets, and a playground.
- You cannot add a request-path hop and prefer SDK tracing that flushes asynchronously.
- You want unlimited users on a $29/mo plan with predictable $8-per-100k-unit overage at scale.
- You want OpenTelemetry-native ingestion so the backend stays swappable.
The Third Option: Own the ClickHouse
Both tools put their data in ClickHouse, which quietly opens a third path: instrument with OpenTelemetry, store the spans in your own ClickHouse, and run the dashboard yourself. It is not exotic. Langfuse runs on ClickHouse, Helicone migrated to ClickHouse, and the pattern is the same one both vendors use under the hood. The instrumentation layer (OpenLLMetry, Apache-2.0) and the dashboard (Grafana OSS) are free; a small ClickHouse Cloud starts around $66/mo, or you self-host ClickHouse for nothing. In an r/LangChain thread from this month, the top reply was blunt: "instrument everything via native OpenTelemetry so you can swap backends when you inevitably get frustrated." We walk through the whole build in build your own LLM observability.
What Both Miss: Semantic Signals
Everything above measures the mechanics of a call. None of it measures the meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.
These failures are semantic, so the fix is a label on the content of each turn: is_user_frustrated, stuck-in-a-loop, leaked-thinking, jailbreak, or a signal specific to your product. Both Langfuse and Helicone approximate this with LLM-as-judge or scoring run offline on samples. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the Langfuse trace or the Helicone request as an attribute.
Score a turn, then attach it to your trace
curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'
# {
# "model": "stuck-in-a-loop",
# "mode": "single_label",
# "classes": [
# { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
# { "class_id": 1, "label": "looping", "score": 0.96, "selected": true }
# ],
# "inference_time_ms": 88
# }The predicted label is the class with selected: true (here, looping); there is no top-level label or confidence field, just the per-class scores. The built-in signals cover jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, and domain, all on the morph-reflex-v1 base model, and you can train a custom signal in under an hour. The label comes back as an API response, not a dashboard panel, so it composes with whichever tool you picked: write it onto the span, alert on it in Slack, or route on it inline. It complements an observability tool; it does not replace one.
Frequently Asked Questions
Langfuse vs Helicone: what is the core difference?
How you instrument. Langfuse is an SDK-traced platform (prompt management, evals, datasets on top); Helicone is a base-URL-swap gateway with an async logging mode. Both are open source and both run on ClickHouse. See the instrumentation tradeoff.
Is Helicone easier to set up than Langfuse?
For first logs, yes: Helicone's gateway is a one-line base-URL change with no code. Langfuse needs SDK instrumentation but adds no request-path hop. The tradeoff section covers code-change-vs-hop in detail.
Are both open source?
Yes. Langfuse's core is MIT (ee/ folders under a separate license); Helicone is Apache-2.0. Both self-host for free on ClickHouse.
Do Langfuse or Helicone catch wrong answers and frustrated users?
No. Both record structure (prompts, responses, latency, spans); neither labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.
Related comparisons
LangSmith Alternatives
Seven alternatives by use case: Langfuse, Helicone, Phoenix, Braintrust, Weave, plus the OpenTelemetry + ClickHouse DIY route.
Langfuse Alternatives
When MIT-core Langfuse isn't the fit: LangSmith, Helicone, Phoenix, Braintrust, and self-hosting on your own ClickHouse.
Build Your Own LLM Observability
OpenTelemetry + Traceloop + ClickHouse, the stack the vendors run. Own your traces from ~$66/mo and run Reflexes for the signals traces miss.
Langfuse vs LangSmith
MIT-core and self-hostable vs first-party LangChain. Full pricing math: 1M traces costs $101/mo on Langfuse, $2,514/mo on LangSmith.
LangSmith vs Helicone
SDK tracing tied to LangChain vs a one-line Apache-2.0 proxy. Free tiers, the network-hop tradeoff, and what each misses.
Arize Phoenix vs Langfuse
OTel-native, no event caps, one process vs the heavier MIT platform. The self-host footprint and lock-in question, settled.
Add the layer the trace cannot see
Whichever tool you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.