The Problem: Every LLM Provider Speaks a Different Language
Traditional services had this problem in 2012. Every database driver, HTTP client, and message queue emitted logs in its own format. Teams wrote custom parsers. Dashboards broke when libraries updated. OpenTelemetry fixed it by defining one standard for traces, metrics, and logs across all services.
LLM applications have the same problem today, but worse. A single application might call OpenAI for chat completions, Anthropic for tool use, Cohere for embeddings, and Pinecone for vector search. Each SDK returns different field names for the same concepts. Token counts, latencies, model identifiers, and cost data all live in different shapes.
Fragmented Telemetry
OpenAI reports prompt_tokens. Anthropic reports input_tokens. Bedrock wraps it in inputTokenCount. Three providers, three schemas, one metric.
No Standard Tracing
A LangChain agent calling OpenAI, querying Pinecone, then calling Anthropic produces no unified trace. Each step is isolated. You cannot see the full request path.
Backend Lock-In
Proprietary SDKs tie you to one observability vendor. Switch from Datadog to Grafana and you rewrite all instrumentation. OpenTelemetry-based tooling avoids this.
OpenLLMetry applies the OpenTelemetry model to this problem. One instrumentation layer, one trace format, any backend. Your application calls Traceloop.init() and every LLM interaction becomes a standard OpenTelemetry span with consistent attributes.
How OpenLLMetry Works
OpenLLMetry is a collection of OpenTelemetry instrumentation packages. Each package wraps a specific LLM provider or framework SDK, intercepting calls to capture spans, attributes, and events. The traceloop-sdk initializes all installed instrumentations automatically.
When your code calls openai.chat.completions.create(), the OpenAI instrumentation intercepts the request and creates an OpenTelemetry span with:
- Model name and version (e.g.,
gpt-5.4) - Token counts: input tokens, output tokens, total tokens
- Latency: time to first token, total duration
- Prompts and completions (optionally, with privacy controls)
- Cost: calculated from token counts and model pricing
- Error details: rate limits, API failures, timeouts
These spans follow OpenTelemetry's gen_ai semantic conventions, so any OTel-compatible backend can parse them without custom configuration.
Non-Intrusive by Design
OpenLLMetry patches SDK methods at import time. Your application code does not change. No decorators on every function, no wrapper classes around API clients. Add two lines at startup and traces flow automatically. You can add workflow annotations with decorators for richer context, but they are optional.
Setup: Two Lines of Code
Python
Python Setup
pip install traceloop-sdkInitialize OpenLLMetry
from traceloop.sdk import Traceloop
Traceloop.init(app_name="my_llm_app")
# Your existing code works unchanged
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello"}]
)
# ^ This call is now automatically tracedTypeScript / Node.js
TypeScript Setup
npm install @traceloop/node-server-sdkInitialize in Node.js
import * as traceloop from "@traceloop/node-server-sdk";
traceloop.initialize({ appName: "my_llm_app" });
// All OpenAI, Anthropic, etc. calls are now tracedSDKs also exist for Go (go-openllmetry) and Ruby (openllmetry-ruby). All emit standard OTLP, so the same backend configuration works regardless of language.
Exporting to Your Backend
By default, traces export to Traceloop's hosted platform. To send to your own backend, set the standard OpenTelemetry environment variables:
Export to Any OTLP Backend
export OTEL_EXPORTER_OTLP_ENDPOINT="https://your-collector:4318"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"
# Or in code:
# Traceloop.init(exporter=your_custom_exporter)Supported Providers and Frameworks
OpenLLMetry ships 40+ instrumentation packages. Each is a separate pip/npm package that the SDK auto-detects and initializes.
LLM Providers (16)
| Provider | What Gets Traced |
|---|---|
| OpenAI / Azure OpenAI | Chat completions, embeddings, image generation, function calls |
| Anthropic | Messages, tool use, streaming responses |
| Google Gemini / Vertex AI | Generate content, embeddings, function calling |
| AWS Bedrock / SageMaker | Invoke model, converse, embeddings |
| Cohere | Chat, embed, rerank, classify |
| Mistral AI | Chat completions, embeddings |
| Groq | Chat completions with latency breakdown |
| Ollama | Local model inference |
| HuggingFace | Inference API calls |
| Together AI | Chat, completions, embeddings |
| Replicate | Model predictions |
| Aleph Alpha | Completions, embeddings, semantic search |
| IBM Watsonx | Foundation models, prompt tuning |
| NVIDIA NIM | Chat completions, embeddings |
| AI21 Labs | Completions, summarization |
| Writer | Palmyra model calls |
Frameworks (10+)
LangChain
Chains, agents, tools, retrievers. Full execution graph as nested spans.
LlamaIndex
Query engines, retrievers, response synthesizers, node parsing.
CrewAI
Agent tasks, crew execution, tool calls, delegation.
Haystack
Pipeline components, document stores, retrievers.
LiteLLM
Unified proxy calls across 100+ models with cost tracking.
LangGraph
Graph nodes, edges, state transitions in agentic workflows.
Vector Databases (7)
Chroma, Pinecone, Qdrant, Weaviate, Milvus, LanceDB, and Marqo. Each instrumentation captures query vectors, result counts, similarity scores, and latency.
Architecture: Instrumentation, Not Platform
OpenLLMetry sits between your application code and your observability backend. It does not store data. It does not provide a UI. It collects and exports.
Data Flow
Your App (OpenAI, Anthropic, LangChain, etc.)
|
v
OpenLLMetry Instrumentations (auto-patched at import)
|
v
OpenTelemetry SDK (spans, metrics, events)
|
v
OTLP Exporter
|
v
Any Backend: Datadog, Grafana Tempo, Honeycomb,
Jaeger, SigNoz, New Relic, Langfuse, Traceloop...This architecture means OpenLLMetry is not competing with observability platforms. It feeds them. You can use OpenLLMetry with Langfuse, with Datadog, with a self-hosted Jaeger instance, or with all three simultaneously using OpenTelemetry's fan-out exporters.
Privacy Controls
By default, OpenLLMetry does not log prompt or completion content, only metadata (model, tokens, latency). You can enable content logging with Traceloop.init(log_prompts=True) for debugging, or use the @traceloop.workflow decorator to selectively trace specific functions. This default-off approach avoids accidentally sending PII to external backends.
OpenLLMetry vs Langfuse vs LangSmith vs Helicone
These tools occupy different layers of the LLM observability stack. OpenLLMetry is an instrumentation library. Langfuse, LangSmith, and Helicone are observability platforms with UIs, storage, and analysis tools.
| OpenLLMetry | Langfuse | LangSmith | Helicone | |
|---|---|---|---|---|
| Type | Instrumentation library | Observability platform | Observability platform | Observability proxy |
| Open Source | Yes (Apache 2.0) | Yes (MIT, self-hostable) | No (proprietary) | Yes (Apache 2.0) |
| Integration Method | Auto-instrumentation (2 lines) | SDK or OTLP | SDK (best with LangChain) | HTTP proxy (URL swap) |
| Backend Flexibility | Any OTLP-compatible backend | Self-hosted or cloud | LangSmith cloud only | Helicone cloud or self-hosted |
| Prompt Management | No | Yes | Yes | No |
| Evaluation Tools | No | Yes (scoring, annotations) | Yes (datasets, evaluators) | No |
| Provider Coverage | 40+ auto-instrumentations | Via SDK or OTLP | Best with LangChain | Any provider via proxy |
| Free Tier | Unlimited (OSS). Traceloop hosted: 50K spans/mo | 50K events/mo | 5K traces/mo | 100K requests/mo |
| UI / Dashboard | No (use any backend) | Yes | Yes | Yes |
| GitHub Stars | 6.9k | 7.6k | N/A (closed source) | 3.2k |
Complementary, Not Competing
OpenLLMetry and Langfuse can work together. Langfuse accepts OTLP data, so you can use OpenLLMetry for instrumentation and Langfuse as the backend for visualization, prompt management, and evaluation. This gives you auto-instrumentation for 40+ providers (from OpenLLMetry) plus Langfuse's analysis UI.
When Each Tool Fits
OpenLLMetry
You already use Datadog/Grafana/Honeycomb and want LLM traces alongside your existing service traces. No new platform to learn.
Langfuse
You need a dedicated LLM observability platform with prompt management, evaluation, and scoring. Open source, self-hostable.
LangSmith
Your stack is primarily LangChain. LangSmith's integration is automatic and the debugging tools understand LangChain internals.
Helicone
You want the simplest possible setup. Swap your API base URL and all requests are logged. No SDK, no code changes.
Traceloop: From YC to ServiceNow
Traceloop was founded in 2023 by Nir Gazit (former chief architect at Fiverr, ML team lead at Google) and Gal Kleinman (senior data engineering at Fiverr). The company graduated from Y Combinator and raised $6.1M from Samsung NEXT, Ibex Investors, and Grand Ventures.
On March 2, 2026, ServiceNow acquired Traceloop for an estimated $60-80M. This was ServiceNow's third Israeli acquisition in months, following the $7.75B Armis Security deal. At acquisition, Traceloop's customer list included IBM, HiBob, Miro, and Dynatrace.
The OpenLLMetry project is Apache 2.0 licensed and continues independently of the acquisition. The open-source codebase has 105 contributors and 1,327 commits. Whether ServiceNow maintains the same investment in the open-source project remains an open question, but the Apache 2.0 license ensures the community can fork and maintain it regardless.
OpenTelemetry GenAI Semantic Conventions
OpenLLMetry is converging with a broader industry effort. The OpenTelemetry Generative AI Observability SIG (started April 2024) is defining official semantic conventions for LLM telemetry under the gen_ai.* namespace. These conventions standardize attribute names for prompts, completions, token counts, model identifiers, and tool calls across all providers.
Datadog already natively supports these conventions (OTel v1.37+). As the standard matures, OpenLLMetry's instrumentation packages are aligning their output to match, meaning traces emitted by OpenLLMetry will be parseable by any backend that implements the OTel GenAI spec.
Traceloop proposed donating OpenLLMetry's instrumentation code to the OpenTelemetry project directly. If accepted, OpenLLMetry's instrumentations would become official OTel packages, giving them the same maintenance guarantees as OTel's HTTP, gRPC, and database instrumentations.
What This Means for Adoption
The convergence between OpenLLMetry and OTel's GenAI SIG reduces risk for adopters. Even if OpenLLMetry's governance changes post-acquisition, the underlying semantic conventions are owned by the OpenTelemetry project (CNCF). Your traces will be compatible with the standard regardless of what happens to any single vendor.
When to Use OpenLLMetry
Existing OTel Infrastructure
You already run Datadog, Grafana, or Honeycomb for service observability. OpenLLMetry adds LLM traces to the same pipeline. No new vendor, no new dashboard.
Multi-Provider Applications
Your app calls OpenAI, Anthropic, and Cohere. OpenLLMetry normalizes all three into consistent spans. One schema for token counts, latency, and cost.
Agent Workflows
LangChain agents, CrewAI crews, or custom orchestration. OpenLLMetry traces the full execution graph: agent decisions, tool calls, retrieval, and LLM responses as nested spans.
Vendor Neutrality Required
You need to avoid lock-in to any single observability vendor. OpenLLMetry emits standard OTLP. Switch backends without rewriting instrumentation.
When to Look Elsewhere
- You want a turnkey platform: OpenLLMetry provides no UI, no storage, no evaluation tools. If you need those, use Langfuse or LangSmith directly.
- Proxy-based simplicity: Helicone requires zero code changes, just a URL swap. If you want the absolute minimum integration effort, a proxy approach is simpler than SDK instrumentation.
- LangChain-only stack: LangSmith's LangChain integration is deeper than OpenLLMetry's. If LangChain is your entire stack, LangSmith may provide better debugging.
Frequently Asked Questions
What is OpenLLMetry?
An open-source instrumentation library that extends OpenTelemetry to LLM applications. Built by Traceloop (acquired by ServiceNow, March 2026), it auto-instruments 40+ providers and frameworks including OpenAI, Anthropic, LangChain, and LlamaIndex. Apache 2.0 license, 6.9k GitHub stars.
How do I set up OpenLLMetry?
Python: pip install traceloop-sdk, then add Traceloop.init() to your entry point. TypeScript: npm install @traceloop/node-server-sdk and call traceloop.initialize(). Auto-instrumentation starts immediately. No code changes to your LLM calls.
Is OpenLLMetry free?
The library is free and Apache 2.0. Traceloop's hosted platform has a free tier (50K spans/month, 24-hour retention). You can also export to any free, self-hosted backend like Jaeger, Zipkin, or SigNoz with no cost.
How does OpenLLMetry compare to Langfuse?
Different layers. OpenLLMetry is instrumentation (collects traces). Langfuse is a platform (stores, visualizes, evaluates). They work together: OpenLLMetry instruments your app, Langfuse receives the OTLP data. Use OpenLLMetry alone if you already have Datadog or Grafana. Use Langfuse if you need prompt management and evaluation tools.
What happened to Traceloop?
ServiceNow acquired Traceloop on March 2, 2026 for an estimated $60-80M. The OpenLLMetry open-source project continues under Apache 2.0. Traceloop's customers included IBM, Miro, and Dynatrace. The acquisition was ServiceNow's third Israeli deal in months, following the $7.75B Armis purchase.
Related
Build Observable AI Agents with Morph
Morph powers the subagent layer coding agents depend on. Fast Apply merges edits at 10,500+ tok/s. WarpGrep searches codebases semantically. Both produce structured output that OpenTelemetry-based tools like OpenLLMetry can trace.