OpenLLMetry: Open-Source LLM Observability Built on OpenTelemetry

OpenLLMetry extends OpenTelemetry to LLM applications with 40+ auto-instrumentations for OpenAI, Anthropic, LangChain, and more. 6.9k GitHub stars, Apache 2.0, backend-agnostic. How it works and how it compares to Langfuse, LangSmith, and Helicone.

March 18, 2026 ยท 1 min read

The Problem: Every LLM Provider Speaks a Different Language

Traditional services had this problem in 2012. Every database driver, HTTP client, and message queue emitted logs in its own format. Teams wrote custom parsers. Dashboards broke when libraries updated. OpenTelemetry fixed it by defining one standard for traces, metrics, and logs across all services.

LLM applications have the same problem today, but worse. A single application might call OpenAI for chat completions, Anthropic for tool use, Cohere for embeddings, and Pinecone for vector search. Each SDK returns different field names for the same concepts. Token counts, latencies, model identifiers, and cost data all live in different shapes.

Fragmented Telemetry

OpenAI reports prompt_tokens. Anthropic reports input_tokens. Bedrock wraps it in inputTokenCount. Three providers, three schemas, one metric.

No Standard Tracing

A LangChain agent calling OpenAI, querying Pinecone, then calling Anthropic produces no unified trace. Each step is isolated. You cannot see the full request path.

Backend Lock-In

Proprietary SDKs tie you to one observability vendor. Switch from Datadog to Grafana and you rewrite all instrumentation. OpenTelemetry-based tooling avoids this.

OpenLLMetry applies the OpenTelemetry model to this problem. One instrumentation layer, one trace format, any backend. Your application calls Traceloop.init() and every LLM interaction becomes a standard OpenTelemetry span with consistent attributes.

How OpenLLMetry Works

OpenLLMetry is a collection of OpenTelemetry instrumentation packages. Each package wraps a specific LLM provider or framework SDK, intercepting calls to capture spans, attributes, and events. The traceloop-sdk initializes all installed instrumentations automatically.

When your code calls openai.chat.completions.create(), the OpenAI instrumentation intercepts the request and creates an OpenTelemetry span with:

  • Model name and version (e.g., gpt-5.4)
  • Token counts: input tokens, output tokens, total tokens
  • Latency: time to first token, total duration
  • Prompts and completions (optionally, with privacy controls)
  • Cost: calculated from token counts and model pricing
  • Error details: rate limits, API failures, timeouts

These spans follow OpenTelemetry's gen_ai semantic conventions, so any OTel-compatible backend can parse them without custom configuration.

Non-Intrusive by Design

OpenLLMetry patches SDK methods at import time. Your application code does not change. No decorators on every function, no wrapper classes around API clients. Add two lines at startup and traces flow automatically. You can add workflow annotations with decorators for richer context, but they are optional.

Setup: Two Lines of Code

Python

Python Setup

pip install traceloop-sdk

Initialize OpenLLMetry

from traceloop.sdk import Traceloop

Traceloop.init(app_name="my_llm_app")

# Your existing code works unchanged
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}]
)
# ^ This call is now automatically traced

TypeScript / Node.js

TypeScript Setup

npm install @traceloop/node-server-sdk

Initialize in Node.js

import * as traceloop from "@traceloop/node-server-sdk";

traceloop.initialize({ appName: "my_llm_app" });

// All OpenAI, Anthropic, etc. calls are now traced

SDKs also exist for Go (go-openllmetry) and Ruby (openllmetry-ruby). All emit standard OTLP, so the same backend configuration works regardless of language.

Exporting to Your Backend

By default, traces export to Traceloop's hosted platform. To send to your own backend, set the standard OpenTelemetry environment variables:

Export to Any OTLP Backend

export OTEL_EXPORTER_OTLP_ENDPOINT="https://your-collector:4318"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"

# Or in code:
# Traceloop.init(exporter=your_custom_exporter)

Supported Providers and Frameworks

OpenLLMetry ships 40+ instrumentation packages. Each is a separate pip/npm package that the SDK auto-detects and initializes.

LLM Providers (16)

ProviderWhat Gets Traced
OpenAI / Azure OpenAIChat completions, embeddings, image generation, function calls
AnthropicMessages, tool use, streaming responses
Google Gemini / Vertex AIGenerate content, embeddings, function calling
AWS Bedrock / SageMakerInvoke model, converse, embeddings
CohereChat, embed, rerank, classify
Mistral AIChat completions, embeddings
GroqChat completions with latency breakdown
OllamaLocal model inference
HuggingFaceInference API calls
Together AIChat, completions, embeddings
ReplicateModel predictions
Aleph AlphaCompletions, embeddings, semantic search
IBM WatsonxFoundation models, prompt tuning
NVIDIA NIMChat completions, embeddings
AI21 LabsCompletions, summarization
WriterPalmyra model calls

Frameworks (10+)

LangChain

Chains, agents, tools, retrievers. Full execution graph as nested spans.

LlamaIndex

Query engines, retrievers, response synthesizers, node parsing.

CrewAI

Agent tasks, crew execution, tool calls, delegation.

Haystack

Pipeline components, document stores, retrievers.

LiteLLM

Unified proxy calls across 100+ models with cost tracking.

LangGraph

Graph nodes, edges, state transitions in agentic workflows.

Vector Databases (7)

Chroma, Pinecone, Qdrant, Weaviate, Milvus, LanceDB, and Marqo. Each instrumentation captures query vectors, result counts, similarity scores, and latency.

Architecture: Instrumentation, Not Platform

OpenLLMetry sits between your application code and your observability backend. It does not store data. It does not provide a UI. It collects and exports.

Data Flow

Your App (OpenAI, Anthropic, LangChain, etc.)
    |
    v
OpenLLMetry Instrumentations (auto-patched at import)
    |
    v
OpenTelemetry SDK (spans, metrics, events)
    |
    v
OTLP Exporter
    |
    v
Any Backend: Datadog, Grafana Tempo, Honeycomb,
Jaeger, SigNoz, New Relic, Langfuse, Traceloop...

This architecture means OpenLLMetry is not competing with observability platforms. It feeds them. You can use OpenLLMetry with Langfuse, with Datadog, with a self-hosted Jaeger instance, or with all three simultaneously using OpenTelemetry's fan-out exporters.

24+
Supported Observability Backends
40+
Auto-Instrumentations

Privacy Controls

By default, OpenLLMetry does not log prompt or completion content, only metadata (model, tokens, latency). You can enable content logging with Traceloop.init(log_prompts=True) for debugging, or use the @traceloop.workflow decorator to selectively trace specific functions. This default-off approach avoids accidentally sending PII to external backends.

OpenLLMetry vs Langfuse vs LangSmith vs Helicone

These tools occupy different layers of the LLM observability stack. OpenLLMetry is an instrumentation library. Langfuse, LangSmith, and Helicone are observability platforms with UIs, storage, and analysis tools.

OpenLLMetryLangfuseLangSmithHelicone
TypeInstrumentation libraryObservability platformObservability platformObservability proxy
Open SourceYes (Apache 2.0)Yes (MIT, self-hostable)No (proprietary)Yes (Apache 2.0)
Integration MethodAuto-instrumentation (2 lines)SDK or OTLPSDK (best with LangChain)HTTP proxy (URL swap)
Backend FlexibilityAny OTLP-compatible backendSelf-hosted or cloudLangSmith cloud onlyHelicone cloud or self-hosted
Prompt ManagementNoYesYesNo
Evaluation ToolsNoYes (scoring, annotations)Yes (datasets, evaluators)No
Provider Coverage40+ auto-instrumentationsVia SDK or OTLPBest with LangChainAny provider via proxy
Free TierUnlimited (OSS). Traceloop hosted: 50K spans/mo50K events/mo5K traces/mo100K requests/mo
UI / DashboardNo (use any backend)YesYesYes
GitHub Stars6.9k7.6kN/A (closed source)3.2k

Complementary, Not Competing

OpenLLMetry and Langfuse can work together. Langfuse accepts OTLP data, so you can use OpenLLMetry for instrumentation and Langfuse as the backend for visualization, prompt management, and evaluation. This gives you auto-instrumentation for 40+ providers (from OpenLLMetry) plus Langfuse's analysis UI.

When Each Tool Fits

OpenLLMetry

You already use Datadog/Grafana/Honeycomb and want LLM traces alongside your existing service traces. No new platform to learn.

Langfuse

You need a dedicated LLM observability platform with prompt management, evaluation, and scoring. Open source, self-hostable.

LangSmith

Your stack is primarily LangChain. LangSmith's integration is automatic and the debugging tools understand LangChain internals.

Helicone

You want the simplest possible setup. Swap your API base URL and all requests are logged. No SDK, no code changes.

Traceloop: From YC to ServiceNow

Traceloop was founded in 2023 by Nir Gazit (former chief architect at Fiverr, ML team lead at Google) and Gal Kleinman (senior data engineering at Fiverr). The company graduated from Y Combinator and raised $6.1M from Samsung NEXT, Ibex Investors, and Grand Ventures.

On March 2, 2026, ServiceNow acquired Traceloop for an estimated $60-80M. This was ServiceNow's third Israeli acquisition in months, following the $7.75B Armis Security deal. At acquisition, Traceloop's customer list included IBM, HiBob, Miro, and Dynatrace.

$6.1M
Seed Funding Raised
$60-80M
ServiceNow Acquisition
6.9k
GitHub Stars

The OpenLLMetry project is Apache 2.0 licensed and continues independently of the acquisition. The open-source codebase has 105 contributors and 1,327 commits. Whether ServiceNow maintains the same investment in the open-source project remains an open question, but the Apache 2.0 license ensures the community can fork and maintain it regardless.

OpenTelemetry GenAI Semantic Conventions

OpenLLMetry is converging with a broader industry effort. The OpenTelemetry Generative AI Observability SIG (started April 2024) is defining official semantic conventions for LLM telemetry under the gen_ai.* namespace. These conventions standardize attribute names for prompts, completions, token counts, model identifiers, and tool calls across all providers.

Datadog already natively supports these conventions (OTel v1.37+). As the standard matures, OpenLLMetry's instrumentation packages are aligning their output to match, meaning traces emitted by OpenLLMetry will be parseable by any backend that implements the OTel GenAI spec.

Traceloop proposed donating OpenLLMetry's instrumentation code to the OpenTelemetry project directly. If accepted, OpenLLMetry's instrumentations would become official OTel packages, giving them the same maintenance guarantees as OTel's HTTP, gRPC, and database instrumentations.

What This Means for Adoption

The convergence between OpenLLMetry and OTel's GenAI SIG reduces risk for adopters. Even if OpenLLMetry's governance changes post-acquisition, the underlying semantic conventions are owned by the OpenTelemetry project (CNCF). Your traces will be compatible with the standard regardless of what happens to any single vendor.

When to Use OpenLLMetry

Existing OTel Infrastructure

You already run Datadog, Grafana, or Honeycomb for service observability. OpenLLMetry adds LLM traces to the same pipeline. No new vendor, no new dashboard.

Multi-Provider Applications

Your app calls OpenAI, Anthropic, and Cohere. OpenLLMetry normalizes all three into consistent spans. One schema for token counts, latency, and cost.

Agent Workflows

LangChain agents, CrewAI crews, or custom orchestration. OpenLLMetry traces the full execution graph: agent decisions, tool calls, retrieval, and LLM responses as nested spans.

Vendor Neutrality Required

You need to avoid lock-in to any single observability vendor. OpenLLMetry emits standard OTLP. Switch backends without rewriting instrumentation.

When to Look Elsewhere

  • You want a turnkey platform: OpenLLMetry provides no UI, no storage, no evaluation tools. If you need those, use Langfuse or LangSmith directly.
  • Proxy-based simplicity: Helicone requires zero code changes, just a URL swap. If you want the absolute minimum integration effort, a proxy approach is simpler than SDK instrumentation.
  • LangChain-only stack: LangSmith's LangChain integration is deeper than OpenLLMetry's. If LangChain is your entire stack, LangSmith may provide better debugging.

Frequently Asked Questions

What is OpenLLMetry?

An open-source instrumentation library that extends OpenTelemetry to LLM applications. Built by Traceloop (acquired by ServiceNow, March 2026), it auto-instruments 40+ providers and frameworks including OpenAI, Anthropic, LangChain, and LlamaIndex. Apache 2.0 license, 6.9k GitHub stars.

How do I set up OpenLLMetry?

Python: pip install traceloop-sdk, then add Traceloop.init() to your entry point. TypeScript: npm install @traceloop/node-server-sdk and call traceloop.initialize(). Auto-instrumentation starts immediately. No code changes to your LLM calls.

Is OpenLLMetry free?

The library is free and Apache 2.0. Traceloop's hosted platform has a free tier (50K spans/month, 24-hour retention). You can also export to any free, self-hosted backend like Jaeger, Zipkin, or SigNoz with no cost.

How does OpenLLMetry compare to Langfuse?

Different layers. OpenLLMetry is instrumentation (collects traces). Langfuse is a platform (stores, visualizes, evaluates). They work together: OpenLLMetry instruments your app, Langfuse receives the OTLP data. Use OpenLLMetry alone if you already have Datadog or Grafana. Use Langfuse if you need prompt management and evaluation tools.

What happened to Traceloop?

ServiceNow acquired Traceloop on March 2, 2026 for an estimated $60-80M. The OpenLLMetry open-source project continues under Apache 2.0. Traceloop's customers included IBM, Miro, and Dynatrace. The acquisition was ServiceNow's third Israeli deal in months, following the $7.75B Armis purchase.

Related

Build Observable AI Agents with Morph

Morph powers the subagent layer coding agents depend on. Fast Apply merges edits at 10,500+ tok/s. WarpGrep searches codebases semantically. Both produce structured output that OpenTelemetry-based tools like OpenLLMetry can trace.