Langfuse vs Helicone (2026): SDK Tracing vs Drop-In Gateway, Both on ClickHouse

Both are open source and both run on ClickHouse, but they instrument differently. Langfuse is an SDK-traced platform with prompt management, evals, and datasets (MIT core). Helicone is a base-URL-swap AI gateway with zero code, plus an async logging mode (Apache-2.0). Full pricing, self-host footprints, and the instrumentation tradeoff.

June 15, 2026 · 1 min read
Langfuse vs Helicone (2026): SDK Tracing vs Drop-In Gateway, Both on ClickHouse

Two open-source observability tools that both run on ClickHouse, decided on the one thing that actually differs: how you instrument. Langfuse is an SDK-traced platform; Helicone is a base-URL-swap gateway. Every free tier limit, every paid plan, the self-host footprint of each, and the failures neither catches. Pricing verified against each vendor's published page as of June 2026.

SDK vs gateway
Langfuse instruments code; Helicone swaps a base URL
MIT vs Apache-2.0
Both open source, both self-hostable
ClickHouse
The analytics store under both

TL;DR

Pick Helicone when you want logs in minutes with zero code: it is an AI gateway in front of 100+ models, so a single base-URL swap starts capturing requests, and an async OpenLLMetry mode removes the proxy hop when you do not want it. Pick Langfuse when you want a full platform you instrument with an SDK: tracing plus prompt management, LLM-as-judge and code evals, datasets, and a playground, MIT core, self-hostable. Both are open source and both run on ClickHouse, so the decision is the instrumentation model, not the data store.

Quick Comparison

DimensionLangfuseHelicone
LicenseMIT core (ee/ under enterprise license)Apache-2.0
InstrumentationSDK tracing in your codeBase-URL swap gateway (or async OpenLLMetry)
Free tier50k units/mo, 2 users, 30-day access10k requests/mo, 1 seat, 1 GB, 7-day retention
First paid tierCore $29/mo, 100k units, unlimited usersPro $79/mo, unlimited seats, alerts, HQL
Top tierPro $199/mo, 3-year retentionTeam $799/mo, 5 orgs, SOC-2 + HIPAA
Request-path hopNone (async SDK flush)One hop via gateway (none in async mode)
Beyond tracingPrompt mgmt, evals, datasets, playgroundGateway in front of 100+ models, caching, alerts
Trace storeClickHouse (+ Postgres, Redis, S3)ClickHouse (migrated from Postgres)

The Instrumentation Tradeoff

This is the whole decision in one paragraph. Langfuse's SDK lives in your code and flushes spans asynchronously, so it adds no latency to the request path, but you have to write the instrumentation. Helicone's gateway sits in front of your model calls, so you change one base URL and ship zero code, but every request now takes one extra network hop through the proxy. Helicone's async OpenLLMetry mode collapses that tradeoff by logging out-of-band, removing the hop at the cost of doing a little instrumentation. So the matrix is: code change plus no hop (Langfuse SDK), no code change plus one hop (Helicone gateway), or a little code plus no hop (Helicone async).

Pick by where you want the work

If a base-URL swap is all the change you can make this sprint, Helicone's gateway is the fastest path to logs. If you cannot accept a per-request hop on a latency-sensitive product and you can spend an afternoon instrumenting, Langfuse's SDK (or Helicone's async mode) keeps the request path untouched.

Pricing

They meter differently, so read each against your own volume. Langfuse bills units (any ingested event: a trace, an observation, or a score). Helicone bills requests. A single agent turn is one Helicone request but several Langfuse units, so the two counts are not 1:1.

PlanLangfuseHelicone
Free50k units/mo, 2 users, 30-day access10k requests/mo, 1 seat, 1 GB, 7-day
First paidCore $29/mo, 100k units, unlimited users, 90-dayPro $79/mo, unlimited seats, alerts, HQL
Overage$8 per 100k units (down to $6 at 50M+)Included in plan tiers
TopPro $199/mo, 3-year retentionTeam $799/mo, 5 orgs, SOC-2 + HIPAA

Langfuse's overage curve ($8 per 100k units, dropping to $6 above 50M units a month) makes it predictable at high volume. Helicone bundles features into flat plan tiers rather than per-request overage, so the choice at scale comes down to whether your spend is driven by event volume (Langfuse's model) or by seats, orgs, and compliance needs (Helicone's model).

Self-Hosting Footprint

Both self-host for free, and both lean on ClickHouse, so the ClickHouse question (can you run it?) applies to either. The shape of the stack differs.

LangfuseHelicone
ServicesWeb + worker, Postgres, ClickHouse, Redis/Valkey, S3Next.js, Jawn collector, Supabase, ClickHouse, MinIO
Deploydocker compose, Helm, or AWS/Azure/GCP Terraform5-service docker compose
License cost$0 (MIT core)$0 (Apache-2.0)

Helicone runs a 5-service compose (Next.js app, the Jawn collector, Supabase, ClickHouse, and MinIO for object storage). Langfuse splits into web and worker containers backed by Postgres, ClickHouse, Redis/Valkey, and S3-compatible storage. Neither is a single binary, because both put analytics in ClickHouse, and Helicone's move to ClickHouse is on record: it migrated from Postgres to ClickHouse and reported query times falling from hundreds of seconds to about half a second. Langfuse, for its part, was acquired by ClickHouse in January 2026.

When Helicone Wins

  • You want observability with zero code: a base-URL swap in front of 100+ models starts logging immediately.
  • You want a gateway, not just a tracer: caching, alerts, and rate limiting at the proxy layer.
  • You need unlimited seats early ($79/mo Pro) or SOC-2 and HIPAA ($799/mo Team).
  • You can live with one network hop per request, or you adopt the async OpenLLMetry mode to remove it.

When Langfuse Wins

  • You want a full platform, not just logs: prompt management, LLM-as-judge and code evals, datasets, and a playground.
  • You cannot add a request-path hop and prefer SDK tracing that flushes asynchronously.
  • You want unlimited users on a $29/mo plan with predictable $8-per-100k-unit overage at scale.
  • You want OpenTelemetry-native ingestion so the backend stays swappable.

The Third Option: Own the ClickHouse

Both tools put their data in ClickHouse, which quietly opens a third path: instrument with OpenTelemetry, store the spans in your own ClickHouse, and run the dashboard yourself. It is not exotic. Langfuse runs on ClickHouse, Helicone migrated to ClickHouse, and the pattern is the same one both vendors use under the hood. The instrumentation layer (OpenLLMetry, Apache-2.0) and the dashboard (Grafana OSS) are free; a small ClickHouse Cloud starts around $66/mo, or you self-host ClickHouse for nothing. In an r/LangChain thread from this month, the top reply was blunt: "instrument everything via native OpenTelemetry so you can swap backends when you inevitably get frustrated." We walk through the whole build in build your own LLM observability.

What Both Miss: Semantic Signals

Everything above measures the mechanics of a call. None of it measures the meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.

These failures are semantic, so the fix is a label on the content of each turn: is_user_frustrated, stuck-in-a-loop, leaked-thinking, jailbreak, or a signal specific to your product. Both Langfuse and Helicone approximate this with LLM-as-judge or scoring run offline on samples. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the Langfuse trace or the Helicone request as an attribute.

Score a turn, then attach it to your trace

curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'

# {
#   "model": "stuck-in-a-loop",
#   "mode": "single_label",
#   "classes": [
#     { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
#     { "class_id": 1, "label": "looping",     "score": 0.96, "selected": true }
#   ],
#   "inference_time_ms": 88
# }

The predicted label is the class with selected: true (here, looping); there is no top-level label or confidence field, just the per-class scores. The built-in signals cover jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, and domain, all on the morph-reflex-v1 base model, and you can train a custom signal in under an hour. The label comes back as an API response, not a dashboard panel, so it composes with whichever tool you picked: write it onto the span, alert on it in Slack, or route on it inline. It complements an observability tool; it does not replace one.

Frequently Asked Questions

Langfuse vs Helicone: what is the core difference?

How you instrument. Langfuse is an SDK-traced platform (prompt management, evals, datasets on top); Helicone is a base-URL-swap gateway with an async logging mode. Both are open source and both run on ClickHouse. See the instrumentation tradeoff.

Is Helicone easier to set up than Langfuse?

For first logs, yes: Helicone's gateway is a one-line base-URL change with no code. Langfuse needs SDK instrumentation but adds no request-path hop. The tradeoff section covers code-change-vs-hop in detail.

Are both open source?

Yes. Langfuse's core is MIT (ee/ folders under a separate license); Helicone is Apache-2.0. Both self-host for free on ClickHouse.

Do Langfuse or Helicone catch wrong answers and frustrated users?

No. Both record structure (prompts, responses, latency, spans); neither labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.

Related comparisons

Add the layer the trace cannot see

Whichever tool you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.