Quick Verdict: Helicone vs LangSmith
Bottom Line
Helicone is the faster path to LLM cost visibility. One URL change, and you're logging every request across any provider. LangSmith is the deeper evaluation platform, with annotation queues, LLM-as-judge scoring, and prompt versioning built for teams iterating on quality. Pick Helicone for observability with zero lock-in. Pick LangSmith for evaluation infrastructure within the LangChain ecosystem.
Feature Comparison: Helicone vs LangSmith
| Feature | Helicone | LangSmith |
|---|---|---|
| Integration method | Proxy (base URL swap) | SDK instrumentation |
| Open source | Yes (5.4K stars) | No (proprietary) |
| Self-hosting | Yes | Enterprise only |
| Free tier | 10K requests/mo | 5K traces/mo |
| Provider support | 100+ via unified gateway | Framework-dependent |
| Built-in caching | Yes (20-30% cost reduction) | No |
| LLM routing/failover | Yes (smart routing) | No |
| Evaluation framework | Basic | Advanced (LLM-as-judge, datasets, annotation queues) |
| Prompt management | Version control, A/B testing | Playground, versioning, Hub |
| Tracing depth | Request/response logging | Full execution traces with spans |
| LangChain integration | Via proxy | Native (first-party) |
| Rate limiting | Built-in | No |
| Architecture | Cloudflare Workers + ClickHouse | Managed SaaS |
| Latency overhead | 50-80ms (proxy) | SDK-side (no network hop) |
Integration: Proxy vs SDK
This is the fundamental architectural split. Helicone sits between your code and your LLM provider. LangSmith wraps your code from the inside.
Helicone: Proxy-Based Gateway
Change your base URL from api.openai.com to oai.helicone.ai (or the unified gateway endpoint). Every request flows through Helicone's edge network, gets logged, and forwards to your provider. No SDK imports, no code instrumentation. Works with OpenAI, Anthropic, Google, Bedrock, and 100+ providers through a single endpoint. Built on Cloudflare Workers for sub-100ms overhead.
LangSmith: SDK Instrumentation
Add the LangSmith SDK and set LANGCHAIN_TRACING_V2=true. Every LLM call, tool invocation, and chain step gets traced with full execution context. LangSmith sees inside your application logic, not just the request/response boundary. The tracing captures parent-child relationships between spans, so you can drill into exactly which step of a multi-step agent failed and why.
What This Means in Practice
With Helicone, you get observability in 2 minutes. Literally. Swap one URL, deploy, and your dashboard populates. The limitation is that Helicone sees requests and responses, not your internal application logic. It knows that you called Claude Sonnet with 12K input tokens, but not that those tokens came from three retrieval steps and a prompt template.
LangSmith traces the full execution graph. It shows that your agent called a retrieval tool, got 8 documents back, injected them into a prompt template, sent that to Claude, parsed the response, and called a second tool. Setup takes longer (SDK integration, environment variables, trace instrumentation), but the debugging information is significantly richer.
Pricing
Both tools have free tiers. The cost models diverge sharply at scale.
| Tier | Helicone | LangSmith |
|---|---|---|
| Free tier | 10K requests/mo, 7-day retention, 1 seat | 5K traces/mo, 1 seat |
| Paid entry | Pro: $79/mo (unlimited seats) | Plus: $39/seat/mo |
| 5-person team cost | $79/mo (flat) | $195/mo ($39 x 5) |
| Data retention (paid) | 1 month (Pro), 3 months (Team) | 14 days (base), 400 days (extended at $5/1K traces) |
| Overage pricing | Usage-based (request volume) | $2.50/1K base traces, $5/1K extended traces |
| Team tier | $799/mo (SOC-2, HIPAA, 3mo retention) | Enterprise (custom pricing) |
| Self-host option | Yes (free, open source) | Enterprise only (custom pricing) |
Cost at Scale
For a 5-person team running 100K LLM requests per month, Helicone Pro costs $79 flat. LangSmith Plus costs $195 in seat fees alone, plus trace overage beyond 10K. If you need 400-day retention on LangSmith, extended traces add $5 per 1,000, so 100K traces with long retention runs to $500/month on top of seat costs.
Helicone's pricing advantage grows with team size because seats are unlimited on Pro. LangSmith's per-seat model means every additional developer adds $39/month. For a 20-person engineering org, that's $780/month in seat fees before any trace costs.
Evaluation and Prompt Management
This is where LangSmith pulls ahead. Helicone is primarily an observability and gateway tool. LangSmith is an evaluation platform that happens to include observability.
LangSmith Evaluations
Dataset-driven testing with multiple evaluator types: human annotation queues, LLM-as-judge scoring against criteria you define, heuristic checks, and pairwise comparisons. You can test prompt versions against golden datasets before promoting to production. This is the feature that keeps teams on LangSmith even when they've moved away from LangChain for everything else.
Helicone Prompt Management
Prompt versioning with production data, A/B testing between prompt variants, and deployment through the gateway without code changes. Less evaluation depth than LangSmith, but the ability to swap prompts via the gateway (no redeploy) is genuinely useful for fast iteration in production.
If your primary need is "did my LLM costs spike?" and "which requests are slow?", Helicone gives you that out of the box. If your primary need is "is this prompt version better than the last one, measured against 500 test cases?", LangSmith's evaluation framework is substantially more mature.
Open-Source Alternatives: Langfuse and Phoenix
Helicone is open source, but it's not the only option. Two other platforms deserve consideration, especially if self-hosting is a requirement.
Langfuse
19K+ GitHub stars, MIT license. Full tracing, prompt management, evaluations, and datasets. SDK-based integration (closer to LangSmith's model than Helicone's proxy). Self-hosts via Docker Compose or Kubernetes. Cloud pricing starts free (50K units/mo), then $29/mo (Core), $199/mo (Pro). The most feature-complete open-source LLM observability platform.
Arize Phoenix
Built on OpenTelemetry, framework-agnostic. Supports OpenAI Agents SDK, Claude Agent SDK, LangGraph, Vercel AI SDK, and more. Strong evaluation capabilities for RAG pipelines. Runs anywhere: local machine, Jupyter notebook, Docker, or cloud. No feature gates on the open-source version. Best fit for teams already invested in OpenTelemetry infrastructure.
| Feature | Helicone | Langfuse | Phoenix |
|---|---|---|---|
| Integration | Proxy (URL swap) | SDK | OpenTelemetry |
| GitHub stars | 5.4K | 19K+ | 14K+ |
| License | Apache 2.0 | MIT | Apache 2.0 |
| Self-host | Yes | Yes (Docker/K8s) | Yes (Docker) |
| Cloud free tier | 10K req/mo | 50K units/mo | Yes (app.phoenix.arize.com) |
| Best for | Gateway + cost tracking | Full-stack observability | OTel-native tracing + RAG eval |
A common pattern among teams: run Helicone as the gateway for cost tracking and provider routing, then pair it with Langfuse or Phoenix for deeper tracing and evaluation. The tools complement rather than compete at the architectural level.
For Coding Agent Teams
If you're building or running coding agents, observability reveals two recurring problems that dominate your LLM spend.
Problem: Context Bloat
Coding agents stuff entire files into context to make edits. A 2,000-line file becomes 40K+ tokens of input for a 5-line change. Observability tools show you which requests have the worst input-to-output token ratios, identifying where context compression would save the most money.
Solution: Morph Compact
Morph Compact compresses LLM context by 50-70% while preserving the information the model needs. When your observability dashboard shows a coding agent sending 80K tokens per request, Compact reduces that to 30K tokens. Same edit quality, 60% lower cost. Plug it in between your agent and your LLM provider.
Problem: Slow Code Edits
When a model generates a full-file rewrite to change 3 lines, the output tokens dominate latency and cost. Tracing tools show you the output token count per edit, making the waste visible. A 2,000-line file rewritten for a 3-line fix means 1,997 wasted output lines.
Solution: Morph Fast Apply
Fast Apply takes a model's edit intent and applies it to the original file in one pass, without rewriting unchanged lines. Instead of generating 2,000 lines of output, the model describes the change and Fast Apply executes it. Output tokens drop by 90%+. Latency drops proportionally.
Observability Shows the Problem. Morph Fixes It.
Helicone, LangSmith, or any observability tool will show you that your coding agent spends too many tokens on context and too many tokens on output. Those are the two largest line items. Morph Compact addresses the input side. Morph Fast Apply addresses the output side. Together they typically reduce agent LLM costs by 50-70%.
When Helicone Wins
Multi-Provider Teams
If you use OpenAI for some tasks, Anthropic for others, and Gemini for cost-sensitive workloads, Helicone's unified gateway logs everything through one dashboard. One base URL, one cost breakdown. No per-provider SDK setup.
Fast Time-to-Value
Two minutes from signup to seeing your first logged request. No SDK installation, no code instrumentation, no environment variable configuration beyond the base URL. For teams that need cost visibility today, not after a sprint of integration work.
Cost-Sensitive Scaling
$79/month for unlimited seats. A 20-person team pays $79. A 100-person team pays $79. The flat pricing model means observability cost doesn't scale with headcount, only with request volume.
Self-Hosting and Data Control
Fully open source. Deploy within your own infrastructure using Cloudflare Workers, ClickHouse, and Kafka. No data leaves your network. Essential for regulated industries, government contractors, and security-conscious teams.
When LangSmith Wins
LangChain/LangGraph Workflows
If your stack is built on LangChain or LangGraph, LangSmith's tracing understands your chain internals. It surfaces retrieval steps, tool calls, and routing decisions in a visual run explorer purpose-built for that framework. No other tool traces LangChain this deeply.
Systematic Evaluation
LLM-as-judge scoring, annotation queues for human review, dataset-driven regression testing, pairwise prompt comparisons. If you're iterating on prompt quality with a rigorous methodology, LangSmith's eval framework is the most mature commercial option.
Prompt Playground
Test prompts against different models and datasets directly in the UI. Compare outputs side-by-side. Promote winning versions to production. The playground is where prompt engineering happens, and LangSmith's is the most feature-rich in the commercial LLM tooling space.
Long-Term Trace Retention
Extended traces with 400-day retention ($5/1K traces) for compliance, audit trails, and longitudinal analysis. Helicone's longest retention on paid plans is 3 months (Team tier). If you need to look back a year, LangSmith is one of the few platforms that supports it.
Frequently Asked Questions
Is Helicone or LangSmith better for LLM observability?
Helicone is better for fast integration, multi-provider cost tracking, and teams that want an open-source solution they can self-host. LangSmith is better for deep tracing within LangChain workflows and systematic evaluation with datasets and LLM-as-judge scoring. Most teams don't need both. Start with the problem you're solving: cost visibility (Helicone) or quality evaluation (LangSmith).
How much does Helicone cost?
Free for 10K requests/month. Pro is $79/month with unlimited seats, 1-month retention, and advanced gateway features. Team is $799/month with SOC-2, HIPAA, and 3-month retention. Enterprise is custom with forever retention and on-prem deployment. Startups under 2 years old with less than $5M funding get 50% off the first year.
How much does LangSmith cost?
Free for 5K traces/month with 1 seat. Plus is $39 per seat per month with 10K base traces included. Overage runs $2.50 per 1,000 base traces (14-day retention) or $5.00 per 1,000 extended traces (400-day retention). Enterprise pricing is custom with annual invoicing and self-hosting options.
Is Helicone open source?
Yes. Helicone is Apache 2.0 licensed with 5.4K GitHub stars. You can self-host using Cloudflare Workers, ClickHouse, and Kafka. The cloud version processes over 2.1 billion requests, but you can run the entire stack in your own infrastructure. Helicone on GitHub.
Can I use LangSmith without LangChain?
Yes, via the LangSmith SDK. But the deepest value comes from LangChain/LangGraph integration, where tracing automatically captures chain internals. Without LangChain, you're manually instrumenting your code, and tools like Helicone (proxy) or Langfuse (SDK with no framework dependency) provide a better developer experience for non-LangChain stacks.
What is the best open-source LLM observability tool?
Langfuse has the largest open-source community (19K+ GitHub stars, MIT license) with the most complete feature set: tracing, prompt management, evaluations, and datasets. Helicone is best if you want proxy-based integration and an AI gateway. Arize Phoenix is best if you're already using OpenTelemetry. All three are free to self-host.
Related Comparisons
Optimize What You Measure with Morph
LLM observability shows you where tokens go. Morph Compact reduces input context by 50-70%. Fast Apply cuts output tokens by 90%+. Together they solve the two biggest cost problems your monitoring dashboard will surface.