Langfuse, Helicone, and SigNoz all store their traces in ClickHouse and ingest OpenTelemetry. The build-your-own path is that same stack without the per-trace meter: instrument with OpenLLMetry, export spans to your own ClickHouse from around $66 a month, dashboard in Grafana, and run Reflexes over the content for the signals a span never carries. Pricing verified against each vendor's published page as of June 2026.
TL;DR
Build the stack the vendors run: OpenLLMetry (Apache-2.0) for instrumentation, ClickHouse for the trace store, Grafana for dashboards, and Morph Reflexes for the semantic signals a trace cannot carry. The open components are free; ClickHouse Cloud Basic starts around $66 a month and there is no per-trace meter. You own the data, you avoid lock-in, and you keep the option to add what no managed tool ships: a label on the meaning of every turn.
Why Own the Stack
Three things push teams off managed observability, and all three are structural rather than cosmetic.
The meter compounds on agents. A 20-step agent turn is one user request that fans into 20 or more spans. On a per-trace plan that is one trace; on a per-event plan it is 20+ units. Either way the bill scales with agent depth, not with users.
Lock-in is an instrumentation question. If your tracing lives in a vendor SDK, ripping the tool out means re-instrumenting your code. OpenTelemetry inverts that: instrument once, point the exporter anywhere. In an r/LangChain thread from this month, the top reply was "instrument everything via native OpenTelemetry so you can swap backends when you inevitably get frustrated."
The vendors already run this stack. ClickHouse acquired Langfuse in January 2026; Helicone migrated to ClickHouse and cut query times from over 100 seconds to 0.5; SigNoz is built on it. Building your own is assembling the open parts they package and meter.
The Architecture
Four components, each swappable, connected by the OpenTelemetry protocol.
| Layer | Component | License / Cost | Job |
|---|---|---|---|
| Instrument | OpenLLMetry (Traceloop) | Apache-2.0, free | Auto-emit OTel spans for LLM, vector DB, and framework calls |
| Collect | OpenTelemetry Collector | Apache-2.0, free | Receive spans, batch, route to the store |
| Store | ClickHouse | Apache-2.0 self-host, or Cloud ~$66/mo | Columnar store for high-ingest trace analytics |
| Dashboard | Grafana OSS | AGPLv3, free | Query and visualize spans, latency, token cost |
| Label | Morph Reflexes | Per-event API | Classify the meaning of each turn, write back as span attributes |
The first four layers are the standard OpenTelemetry-to-ClickHouse observability pattern, the same one used for application traces. The fifth layer is the one specific to LLMs, because LLM failures are semantic and a generic span does not capture them.
Instrument with OpenLLMetry
OpenLLMetry is an Apache-2.0 set of OpenTelemetry instrumentations maintained by Traceloop, with about 7.2k GitHub stars. It auto-instruments LLM providers, vector databases, and frameworks (LangChain, LlamaIndex, CrewAI), emitting standard OTel spans. One initialization call sends those spans to any OTel endpoint, including a Collector in front of ClickHouse.
Initialize tracing and point it at your collector
from traceloop.sdk import Traceloop
# Spans go to your OTel Collector, which writes to ClickHouse.
Traceloop.init(
app_name="my-agent",
api_endpoint="http://otel-collector:4318",
)
# From here, every LLM and framework call is auto-instrumented as an
# OpenTelemetry span: prompt, response, model, tokens, latency, tool calls.Because the output is plain OpenTelemetry, nothing about this step binds you to a backend. The same spans can fan out to ClickHouse, Grafana Tempo, or a managed tool in parallel while you evaluate.
Store Spans in ClickHouse
ClickHouse is a columnar database built for high-ingest analytical queries, which is the trace workload exactly. The OpenTelemetry Collector has a ClickHouse exporter, so the path is Collector to ClickHouse with no custom glue. Once spans land, you query them with SQL: slowest agent turns, token cost per tool call, error rate by model.
Query agent traces with SQL
-- Most expensive agent turns in the last day, by total tokens
SELECT
SpanAttributes['traceloop.entity.name'] AS agent_step,
count() AS calls,
sum(toInt64(SpanAttributes['llm.usage.total_tokens'])) AS tokens
FROM otel_traces
WHERE Timestamp > now() - INTERVAL 1 DAY
GROUP BY agent_step
ORDER BY tokens DESC
LIMIT 20;Self-hosted ClickHouse is Apache-2.0 and free; you pay only for the machine. ClickHouse Cloud Basic runs around $66 a month in a low-usage worked example, billed on metered compute (about $0.22 per unit-hour) plus compressed storage. Either way, there is no per-trace charge, so agent depth does not inflate the bill.
Cost: DIY vs Managed
The open components are free, so the DIY line item is the trace store. Set that against what managed tools charge at the same volume.
| Option | Monthly cost | What you run / give up |
|---|---|---|
| DIY: OpenLLMetry + ClickHouse + Grafana | ~$50-100 | You operate ClickHouse; you own the data |
| Langfuse Core | ~$101 | Managed; MIT core if you self-host instead |
| LangSmith Plus | ~$2,514 | Managed, closed; one seat at 14-day retention |
The DIY number and Langfuse Core are close, which is the honest read: if you want managed and open, Langfuse Core is a fine deal and self-hosting Langfuse is free. DIY wins decisively over the closed per-trace plans, and it wins on control: your traces never leave your infrastructure, and you can run any classifier you want over them. For the full vendor-by-vendor pricing, see Langfuse vs LangSmith and LangSmith alternatives.
The Layer Traces Miss: Reflexes
A span records structure: prompt, response, latency, tokens, the call tree. It does not record meaning. A response that quotes the wrong refund policy returns a 200 with normal latency and a normal token count. A user who is quietly getting angry produces the same span as a delighted one. An agent stuck in a three-step loop looks like an agent doing work. The trace is green and the product is broken. These are the signals that are not tracebacks, and a DIY stack stores them in ClickHouse without ever labeling them.
A Morph Reflex is the inference layer that adds the label. It is a small, fast text classifier that scores the content of a turn and returns a label in under 90 milliseconds, cheap enough to run on every span rather than an offline sample. Built-in signals include stuck-in-a-loop, leaked-thinking, jailbreak, guardrail, incomplete-thought, ambiguity, and difficulty, and you can train a custom signal for your product in under an hour from a prompt, a labeled set, or an unlabeled one.
Label a span's content, then write it back to ClickHouse
curl -X POST "https://api.morphllm.com/v1/reflex/predict" \
-H "Authorization: Bearer $MORPH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "stuck-in-a-loop", "text": "<the agent turn from the span>"}'
# {
# "model": "stuck-in-a-loop",
# "mode": "single_label",
# "classes": [
# { "class_id": 0, "label": "progressing", "score": 0.04, "selected": false },
# { "class_id": 1, "label": "looping", "score": 0.96, "selected": true }
# ],
# "inference_time_ms": 88
# }The predicted label is the class with selected: true. Write it onto the span as an attribute, and now your ClickHouse query can do what no raw trace can: count looping agents, find frustrated sessions, and alert on policy violations that never threw an error. The same outputs feed evals, fine-tunes, and RL reward terms downstream. At the realtime rate of $0.0005 per event (one event is 2048 tokens), labeling every turn stays cheap enough to leave on in production.
Why a classifier, not an LLM-as-judge
The managed tools approximate semantic checks with LLM-as-judge evals that run offline on a sample. That misses the looping agent in real time and costs a full model call per check. A Reflex runs inline on every turn at classifier speed and price, so the signal lands before the agent takes its next step, not in tomorrow's eval run.
When to Buy Instead
DIY is not free of cost, it moves the cost from a bill to your team. Buy a managed tool when you do not want to operate ClickHouse, when you need prompt management and annotation queues out of the box, or when you are committed to LangChain and want first-party LangSmith tracing. Self-host Langfuse when you want a managed-grade product on your own infrastructure for the price of the servers. The decision is whether owning the data and the bill is worth running one more database. Either way, the semantic layer is additive: Reflexes returns a label over an API that writes onto a managed span or a DIY one the same way.
Frequently Asked Questions
Can I build my own LLM observability instead of buying a tool?
Yes, and it is the stack the vendors run: OpenLLMetry to instrument, ClickHouse to store, Grafana to dashboard. Langfuse, Helicone, and SigNoz all use ClickHouse. See the architecture.
What is OpenLLMetry and is it free?
OpenLLMetry is an Apache-2.0 set of OpenTelemetry instrumentations for LLM apps from Traceloop (~7.2k stars). It is free and emits standard OTel spans you can send anywhere, covered in instrument with OpenLLMetry.
How much does a DIY stack cost?
The instrumentation and dashboard are free; the trace store is the line item. ClickHouse Cloud Basic is around $66 a month, or self-host for the cost of a VM. The cost section compares it to managed plans.
What does a trace still miss in a DIY stack?
Meaning. Wrong answers, frustration, and looping all produce structurally normal spans. Labeling them needs a per-turn classifier, which is what Reflexes adds.
Related comparisons
Langfuse vs Helicone
Two open-source paths: Langfuse's SDK + ClickHouse stack vs Helicone's drop-in gateway. Both run on ClickHouse; you can too.
LangSmith Alternatives
Seven alternatives by use case: Langfuse, Helicone, Phoenix, Braintrust, Weave, plus the OpenTelemetry + ClickHouse DIY route.
Langfuse Alternatives
When MIT-core Langfuse isn't the fit: LangSmith, Helicone, Phoenix, Braintrust, and self-hosting on your own ClickHouse.
Langfuse vs LangSmith
MIT-core and self-hostable vs first-party LangChain. Full pricing math: 1M traces costs $101/mo on Langfuse, $2,514/mo on LangSmith.
LangSmith vs Helicone
SDK tracing tied to LangChain vs a one-line Apache-2.0 proxy. Free tiers, the network-hop tradeoff, and what each misses.
Arize Phoenix vs Langfuse
OTel-native, no event caps, one process vs the heavier MIT platform. The self-host footprint and lock-in question, settled.
Own the stack, add the layer it can't see
Run OpenTelemetry and ClickHouse for the traces; run Reflexes over the content for the semantic signals that never throw. One API, under 90 milliseconds a turn.
