Arize Phoenix vs Langfuse (2026): Self-Host, OTel, and Event Caps Settled

Two self-hostable LLM tracing tools, both free of license cost, decided on the things that actually differ: the footprint each one takes to run, the OpenTelemetry and lock-in story, the event caps, the licensing fine print, and the failures neither tool catches. Facts verified against each project as of June 2026.

1 process

Phoenix self-host footprint

4 services

Langfuse self-host footprint

ELv2 vs MIT

Phoenix license vs Langfuse core

TL;DR

Pick Arize Phoenix for the lightest start: one process, no event caps, OpenTelemetry-native with OpenInference conventions, so your spans stay portable. Pick Langfuse for the fuller platform: prompt management, broader cloud tiers, and an architecture that scales because it splits transactional, analytics, queue, and payload storage into separate services. The licensing split matters too: Langfuse core is MIT (OSI open source); Phoenix is Elastic License 2.0 (source-available, not OSI open source). The decision is lightest OTel-native self-host versus full platform that scales.

Quick Comparison

Arize Phoenix vs Langfuse (June 2026)

Dimension	Arize Phoenix	Langfuse
License	Elastic License 2.0 (source-available, not OSI)	MIT core (ee/ under enterprise license)
Self-host footprint	One process: pip, one Docker container, or Helm	Web + worker, Postgres, ClickHouse, Redis/Valkey, S3
Event caps (self-host)	None (it is an app, not metered)	None self-hosted; cloud meters units
OpenTelemetry	Native, OpenInference conventions	Ingests OTel spans
Cloud / paid product	Arize AX, separate, quote-based	Free 50k units/mo; Core $29/mo; Pro $199/mo
Feature set	Tracing, LLM-as-judge evals, prompt playground, datasets/experiments	Tracing, prompt management, evals, broader cloud tiers
GitHub stars	~10.1k	~28.8k
Ownership	Arize	Acquired by ClickHouse (Jan 2026)

Self-Hosting Footprint

The free-to-self-host claim is true for both, so it settles nothing. What settles it is how much you stand up to get there. Phoenix is the lightest start: pip install arize-phoenix, one Docker container, or a Helm chart, and you have a running tracing backend in a single process. There is nothing to shard, no separate analytics store, no queue.

Langfuse v3 is heavier by design. It splits transactional data (Postgres), analytics (ClickHouse), queues and cache (Redis/Valkey), and event payloads (S3-compatible storage) into separate services, run as web and worker containers. That separation is the reason it scales to high ingest volumes, and also the reason it is more to operate than a single binary. The practical filter: if you cannot run ClickHouse, that one constraint rules out the Langfuse self-host path and leaves Phoenix.

What each takes to run yourself

	Arize Phoenix	Langfuse
Stack	Single process	Web + worker, Postgres, ClickHouse, Redis/Valkey, S3
Deploy	pip install, one Docker container, or Helm	docker compose, Helm, or Terraform
License cost	$0 (ELv2)	$0 (MIT core)
Scales to high volume	Single-process limits	Yes, via the split-service architecture

OpenTelemetry and Lock-In

The lock-in question is really an instrumentation question: if you replace the tool in a year, do you re-instrument your codebase? Phoenix answers it well. It is OpenTelemetry-native using OpenInference semantic conventions, so the spans you emit are standard OTel data. You can point them at Phoenix today and a different OTLP-compatible backend tomorrow without touching your instrumentation. Phoenix ships 40+ integrations across Python, TypeScript, and Java to produce those spans.

Langfuse ingests OpenTelemetry spans too, so the same portable pattern applies: instrument once with OTel, keep the backend swappable. Neither tool forces a proprietary SDK as the only way in, which is the point that matters for portability. Worth knowing for the longer view: ClickHouse acquired Langfuse in January 2026, so the OLAP engine under Langfuse and the company shipping it are now the same.

Event Caps and Cost

Phoenix self-hosted has no event caps. It is an application you run, so you pay for the infrastructure underneath it and nothing per trace. There is no metered tier to graduate out of. The commercial product is Arize AX, a separate quote-based platform; Phoenix itself is the free self-host app, and the two should not be conflated.

Langfuse self-hosting is also free of license cost and uncapped. Its cloud tiers are where metering shows up: 50k units a month free (2 users, 30-day access), Core at $29/mo for 100k units with unlimited users, overage at $8 per 100k units (dropping to $6 at 50M+), and Pro at $199/mo. A unit is any ingested event (a trace, an observation, or a score), so a single agent trace with several observations consumes several units. If you self-host Langfuse, none of that meter applies; the cloud pricing only matters if you choose Langfuse Cloud over running it yourself.

The open-source shortlist

When practitioners list the self-hostable LLM observability tools worth running, these two are usually on it. In a recent r/LangChain thread on the best open-source options, both Langfuse and Phoenix come up as the names people actually run, alongside the recurring point that OpenTelemetry portability is what keeps a backend choice from becoming a trap.

The Licensing Nuance

This one trips people up, so state it precisely. Langfuse's core repo is MIT, which is OSI-approved open source (its ee/ folders sit under a separate enterprise license). Arize Phoenix is under the Elastic License 2.0, which is source-available but not OSI-approved open source. The source is public, you can read it, fork it, and self-host it for free, but ELv2 restricts offering Phoenix as a managed service to third parties.

For most teams running tracing for their own product, ELv2 changes nothing: you self-host, you pay no license fee, you keep your data. The distinction matters if you intend to resell the tool as a hosted service, or if your procurement process draws a hard line at OSI-approved licenses. Do not call Phoenix "open source" without the ELv2 caveat; call it source-available.

When Phoenix Wins

You want the lightest self-host: one process, no separate analytics store, no queue to operate.
You want no event caps and no metered tier to manage, ever.
You are committed to OpenTelemetry and want OpenInference-native spans that stay portable.
You want tracing paired with LLM-as-judge evals, a prompt playground, and datasets/experiments in the same free app.

When Langfuse Wins

You want OSI-approved open source (MIT core), not a source-available license.
You need the fuller platform: prompt management plus broader managed cloud tiers if you do not self-host.
Your ingest volume is high enough that the split-service architecture (ClickHouse for analytics) is a feature, not overhead.
You can run ClickHouse and want a store that scales with you.

The Third Option: Own the Stack

Because both tools speak OpenTelemetry, there is a path that uses neither as a hard dependency: instrument once and keep the backend swappable. Instrument with OpenLLMetry (Apache-2.0, free), point the spans at Phoenix or Langfuse today, and move them to your own store later without re-instrumenting. If you want to skip a tracing product entirely, store the OTel spans in your own ClickHouse (Cloud Basic starts around $66/mo; self-host ClickHouse is Apache-2.0 and free) and dashboard them with Grafana OSS. We walk through the whole build in build your own LLM observability.

What Both Miss: Semantic Signals

Everything above measures the mechanics of a call. None of it measures the meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken.

These failures are semantic, so the fix is a label on the content of each turn: is_user_frustrated, stuck-in-a-loop, leaked-thinking, jailbreak, or a signal specific to your product. Both Phoenix and Langfuse approximate this with LLM-as-judge evals, which run offline on samples. A Morph Reflex is a classifier that returns the label inline, in under 90 milliseconds, cheap enough to run on every turn rather than a sample, then write back onto the Phoenix or Langfuse span as an attribute.

Score a turn, then attach it to your trace

curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
  -H "Authorization: Bearer $MORPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "stuck-in-a-loop", "text": "<the agent turn>"}'

# {
#   "model": "stuck-in-a-loop",
#   "mode": "single_label",
#   "classes": [
#     { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
#     { "class_id": 1, "label": "looping",     "score": 0.96, "selected": true }
#   ],
#   "inference_time_ms": 88
# }

The label comes back as an API response, not a dashboard panel, so it composes with whichever tool you picked: write it onto the span, alert on it in Slack, or route on it inline. The built-in signals cover jailbreak, guardrail, leaked-thinking, stuck-in-a-loop, incomplete-thought, ambiguity, difficulty, and domain, and a custom signal trains in under an hour. It complements a tracing platform; it does not replace one.

Frequently Asked Questions

Arize Phoenix vs Langfuse: which is lighter to self-host?

Phoenix. It runs as one process (pip, one Docker container, or Helm) with no event caps. Langfuse v3 splits into web and worker containers plus Postgres, ClickHouse, Redis/Valkey, and S3, which is heavier but scales further. See the self-hosting footprint.

Is Arize Phoenix open source?

It is source-available under the Elastic License 2.0, not OSI-approved open source. Langfuse's core repo is MIT, which is OSI open source. The licensing section spells out the difference.

Does Arize Phoenix charge per event?

No. Self-hosted Phoenix has no event caps because it is an app, not a metered service. The paid product is the separate Arize AX. See event caps and cost.

Do Phoenix or Langfuse catch wrong answers and frustrated users?

No. Both record structure (prompts, responses, latency, spans); neither labels meaning. Catching wrong answers, frustration, or looping needs a per-turn classifier on top, covered in semantic signals.

Add the layer the trace cannot see

Whichever platform you pick, Reflexes returns a semantic label on every turn in under 90 milliseconds, over an API that composes with your traces.

Read the Reflexes docs

Build your own stack

Kimi K3

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers

Arize Phoenix vs Langfuse (2026): Self-Host Footprint, OTel Lock-In, and Event Caps