LLM Frameworks Compared (2026): LangChain, LlamaIndex, DSPy & More

There are now 40+ LLM frameworks. Most developers need 2-3 at most. The hard part is knowing which ones. This guide covers every major framework in 2026, grouped by what they actually do, with benchmarks and concrete recommendations for each use case.

40+

LLM frameworks available

1.1M+

Repos using LLM SDKs

3.5ms

Lowest framework overhead (DSPy)

60M+

CrewAI monthly agent executions

The Framework Landscape

The LLM framework ecosystem has consolidated into four distinct categories. Understanding which category you need eliminates 80% of the decision.

Orchestration

Connect LLMs to data, tools, and APIs. Build RAG pipelines. Chain multiple steps. LangChain, LlamaIndex, Haystack.

Agent

Coordinate multi-step autonomous workflows. Multi-agent systems. CrewAI, Claude SDK, OpenAI SDK, Google ADK, LangGraph.

Optimization

Replace manual prompt engineering with programmatic tuning. DSPy, TextGrad.

Code-Specific

Handle coding primitives: fast-apply, code search, context compression. Morph SDK.

Most production applications combine one framework from two categories. A RAG chatbot uses LlamaIndex (orchestration) with no agent layer. A coding assistant uses CrewAI or Claude SDK (agent) plus Morph SDK (code-specific). Stacking three or more frameworks usually signals overengineering.

Framework	Category	Language	GitHub Stars	Best For
LangChain	Orchestration	Python, JS	100K+	General LLM plumbing
LlamaIndex	Orchestration	Python, TS	40K+	RAG pipelines
Haystack	Orchestration	Python	20K+	Typed, auditable pipelines
CrewAI	Agent	Python	44K+	Multi-agent prototyping
LangGraph	Agent	Python, JS	10K+	Stateful agent workflows
Claude Agent SDK	Agent	Python	New	MCP-native agents
OpenAI Agents SDK	Agent	Python	New	Simple handoff agents
Google ADK	Agent	Python	15K+	Multimodal agents
Semantic Kernel	Orchestration	C#, Python, Java	27.5K	Enterprise .NET
DSPy	Optimization	Python	20K+	Prompt optimization
TextGrad	Optimization	Python	2K+	Text-based autodiff
Morph SDK	Code-specific	Python, TS	N/A	Fast-apply, code search

Orchestration Frameworks

Orchestration frameworks handle the connective tissue between LLMs and everything else: vector stores, APIs, databases, document loaders, and output parsers. If your application retrieves data, processes it through an LLM, and returns structured output, you need an orchestration framework.

LangChain

LangChain is the most widely adopted LLM framework with 100K+ GitHub stars and 34.5 million monthly downloads (via LangGraph). Its modular chain-based architecture lets you compose prompts, tools, and memory into reusable pipelines.

The strength is ecosystem breadth. LangChain has integrations for nearly every vector store, document loader, and LLM provider. The weakness is abstraction overhead: 10ms framework latency and 2.4K tokens per call, the highest of any major framework. For simple use cases, the abstractions add complexity without proportional value.

Use LangChain when you need broad integrations and rapid prototyping. Skip it when your pipeline is straightforward enough that raw API calls with a vector store SDK would suffice.

LlamaIndex

LlamaIndex treats retrieval as the core problem, not a side feature. It provides the deepest set of indexing strategies (tree, keyword, vector, knowledge graph), chunking algorithms, and retrieval methods of any framework. With 40K+ GitHub stars, it is the default for applications that need structured access to private data.

Framework overhead is moderate at 6ms and 1.6K tokens per call. Many production teams use LlamaIndex for ingestion and indexing while layering LangChain or LangGraph on top for orchestration. This combination is often the fastest route to a robust RAG system.

Use LlamaIndex when retrieval quality is the primary success metric. It is overkill if you only need simple document Q&A with a single vector store.

Haystack

Haystack by deepset takes the most principled approach to framework design. Every component has typed inputs and outputs. Pipelines are directed acyclic graphs you can visualize, debug, and test node by node. It has the lowest token usage of any framework at 1.57K per call and 5.9ms overhead.

Enterprise teams in regulated industries prefer Haystack for its auditability and reproducibility. The tradeoff is a smaller ecosystem compared to LangChain and a steeper learning curve for simple tasks.

Use Haystack when you need production-grade, auditable NLP pipelines with typed contracts between components. Especially strong for document search and question answering in regulated verticals.

Semantic Kernel

Microsoft's Semantic Kernel (27.5K GitHub stars) is the go-to for .NET and Java teams. It includes built-in short-term and long-term memory, enabling context persistence across sessions. In late 2025, Microsoft merged it with AutoGen to create the Microsoft Agent Framework, adding multi-agent orchestration with session-based state management, type safety, and telemetry.

Use Semantic Kernel when your stack is C#, Java, or enterprise Microsoft. It covers both orchestration and agent patterns under one roof.

Agent Frameworks

Agent frameworks coordinate multi-step, autonomous workflows where the LLM decides what to do next. The market in March 2026 has settled into clear lanes. Each framework owns a distinct niche.

CrewAI

44K+ stars. 60M monthly executions. MCP + A2A protocol support. 60% of Fortune 500. Best for rapid multi-agent prototyping with the broadest interoperability.

LangGraph

State machine-based agent orchestration. Best persistence and checkpointing. 34.5M monthly downloads. Best for complex stateful workflows that need replay and debugging.

Claude Agent SDK

Anthropic's general-purpose agent runtime. MCP-native with in-process server model and lifecycle hooks. Extended thinking. Best for teams already on Claude.

OpenAI Agents SDK

Replaced Swarm in March 2025. Core abstraction: explicit handoffs between agents carrying conversation context. Simplest path from zero to working agent.

Google ADK

Powers agents inside Agentspace and Customer Engagement Suite. Now open-source. Parallel, sequential, and hierarchical composition. Model-agnostic via LiteLLM.

Microsoft Agent Framework

Semantic Kernel + AutoGen merged. Covers single- and multi-agent patterns with enterprise-grade session management, type safety, and telemetry. C#, Python, Java.

Framework	Multi-Agent	Protocol Support	Persistence	Learning Curve
CrewAI	Yes (role-based)	MCP + A2A	Basic	Low
LangGraph	Yes (graph-based)	MCP	Advanced (checkpoints)	Medium-High
Claude Agent SDK	Yes	MCP (native)	Via MCP servers	Medium
OpenAI Agents SDK	Yes (handoffs)	Custom	Basic	Low
Google ADK	Yes (hierarchical)	A2A + MCP	Built-in	Medium
MS Agent Framework	Yes (AutoGen)	Custom	Session-based	High

Protocol convergence

Two protocols are emerging as standards: Anthropic's MCP (Model Context Protocol) for tool and resource access, and Google's A2A (Agent-to-Agent) for inter-agent communication. CrewAI supports both. Most frameworks support at least MCP. Choosing a framework with protocol support future-proofs your agent architecture.

Optimization Frameworks

Optimization frameworks replace the trial-and-error of manual prompt engineering with systematic, programmatic approaches. They treat prompts as parameters to be tuned, not strings to be hand-written.

DSPy

DSPy from Stanford is the leading optimization framework. Instead of writing prompts, you define modules with typed input/output signatures. DSPy's compiler then optimizes the prompts automatically through a process analogous to training a neural network, but at the prompt level.

It has the lowest framework overhead of any major framework at 3.53ms and 2.03K tokens per call. DSPy excels in research and experimental workflows where you need to iterate rapidly across model configurations, prompt strategies, and evaluation criteria.

Use DSPy when prompt quality is the bottleneck and you have a clear evaluation metric. It is less useful for open-ended creative tasks where there is no objective measure of output quality.

TextGrad

TextGrad implements automatic differentiation via text. Where DSPy uses compilation, TextGrad uses LLM-generated feedback as "gradients" to iteratively improve prompts and outputs. Published in Nature, it follows PyTorch's syntax for familiarity.

Results: improved GPT-4o zero-shot accuracy on Google-Proof QA from 51% to 55%. Yielded 20% relative performance gain on LeetCode-Hard coding problems. It is more experimental than DSPy but powerful for tasks where iterative refinement of outputs matters more than pipeline optimization.

Use TextGrad when you need to optimize individual outputs (not just prompts) through iterative refinement. Think code generation, molecule design, or treatment planning where each output can be scored and improved.

Code-Specific Tooling

General-purpose LLM frameworks handle orchestration and agent coordination. But coding agents have specific needs that none of them address: applying diffs to files, searching codebases efficiently, and compressing context so agents do not blow through token budgets reading irrelevant code.

Cognition (makers of Devin) measured that coding agents spend 60% of their time on search and file reading. This is the bottleneck, not model quality or orchestration logic. Anthropic's own multi-agent research showed 90% improvement when delegating specialized tasks to sub-agents with separate context windows.

Fast Apply

Apply code edits as diffs instead of rewriting entire files. 10,500 tok/s throughput. Reduces output tokens per edit by 70%+ compared to full-file rewrites.

WarpGrep

Parallel code search across repositories. 8 tool calls per turn, 4 turns, sub-6s. Returns only relevant line ranges, not full files, cutting input tokens by 60%.

Compact

Context compression for long coding sessions. Preserves semantic meaning while reducing token count. Prevents context rot in extended agent workflows.

The Morph SDK provides these primitives as API calls. They slot into any agent framework. If you are building with CrewAI, LangGraph, Claude SDK, or OpenAI Agents SDK, the Morph SDK handles the code-specific operations your agent framework does not.

Token efficiency compounds

When a coding agent uses 60% fewer tokens per action, it effectively gets 2.5x the rate limit headroom without changing API tiers. For teams on Claude or OpenAI usage-based pricing, this translates directly to lower cost per task.

Framework Benchmarks

Framework overhead matters in production. Every millisecond of framework latency and every extra token adds up across thousands of requests. These benchmarks measure the framework's own cost, separate from the LLM inference time.

Framework	Latency Overhead	Tokens Per Call	Notes
DSPy	3.53ms	2.03K	Lowest latency, compiled prompts
Haystack	5.9ms	1.57K	Lowest token usage, typed pipelines
LlamaIndex	6.0ms	1.60K	Efficient for RAG workloads
LangChain	10.0ms	2.40K	Highest overhead, broadest ecosystem
LangGraph	14.0ms	2.03K	Graph + state overhead

For context: a typical LLM API call takes 500-3000ms. So framework overhead of 4-14ms is small in absolute terms. It matters more at high throughput (thousands of requests per minute) or in latency-sensitive streaming applications. Token usage is the more impactful difference, since extra tokens cost money on every call.

3.53ms

DSPy latency overhead

1.57K

Haystack tokens/call (lowest)

2.40K

LangChain tokens/call (highest)

53%

Token difference: Haystack vs LangChain

Choosing the Right Stack

The right combination depends on what you are building, not on which framework has the most stars. Here are concrete recommendations for common use cases.

RAG Application

Start with LlamaIndex for data ingestion, indexing, and retrieval. If your workflows grow complex (conditional routing, multi-step processing, human-in-the-loop), add LangGraph for orchestration. In regulated industries, consider Haystack for its typed, auditable pipeline design.

Multi-Agent System

For rapid prototyping, CrewAI gets you from concept to working system fastest. For production workloads that need persistence, replay, and debugging, LangGraph has the strongest story. If you are locked into a provider, Claude Agent SDK or OpenAI Agents SDK minimize friction.

Coding Agent

Pair any agent framework with the Morph SDK for code-specific operations. Fast Apply handles edits. WarpGrep handles search. Compact handles context compression. These are the primitives no general-purpose framework provides.

Enterprise .NET/Java

Semantic Kernel / Microsoft Agent Framework is the clear choice. Native C# and Java support, enterprise-grade telemetry, and tight Azure integration. No other framework covers these languages with the same depth.

Research and Experimentation

DSPy for prompt optimization. TextGrad for output optimization. Both eliminate manual prompt tuning and let you iterate programmatically against evaluation metrics.

The two-framework rule

If you find yourself integrating three or more frameworks, step back. Each framework adds dependency risk, upgrade friction, and cognitive overhead for your team. Two frameworks covering different categories (e.g., orchestration + code-specific) is the sweet spot. Three signals overengineering.

Frequently Asked Questions

What are the most popular LLM frameworks in 2026?

By GitHub stars: LangChain (100K+), CrewAI (44K+), LlamaIndex (40K+), Semantic Kernel (27.5K), and DSPy (20K+). By monthly downloads, LangGraph leads with 34.5 million and CrewAI follows with 5.2 million. Popularity does not always correlate with fit for your use case.

What is the difference between LangChain and LlamaIndex?

LangChain is a general-purpose orchestration framework for connecting LLMs to tools, APIs, and databases. LlamaIndex specializes in retrieval-augmented generation with deeper indexing, chunking, and retrieval capabilities. Many teams use both: LlamaIndex for data ingestion and LangChain/LangGraph for workflow orchestration.

Which LLM framework has the lowest overhead?

DSPy at 3.53ms latency overhead per call. For token efficiency, Haystack leads at 1.57K tokens per call versus LangChain's 2.40K. The 53% token difference compounds significantly at scale.

What is the best framework for building AI agents?

CrewAI for rapid multi-agent prototyping. LangGraph for complex stateful orchestration with persistence. OpenAI Agents SDK for the simplest path to a working agent. Claude Agent SDK for MCP-native development. Google ADK for multimodal workflows. The "best" depends on your requirements, not an absolute ranking.

What is DSPy and how does it differ from prompt engineering?

DSPy from Stanford replaces manual prompt writing with programmatic optimization. You define modules with typed signatures, and DSPy's compiler optimizes the prompts automatically. It is to prompt engineering what machine learning is to hand-coded rules: systematic, reproducible, and measurable.

Do I need a framework to build with LLMs?

No. For simple single-model API calls, the provider SDK (OpenAI, Anthropic, Google) is sufficient. Frameworks add value when you need RAG pipelines, multi-step agent workflows, tool orchestration, or multi-model routing. The overhead is only justified when the abstractions save more development time than they cost in complexity.

Which framework is best for coding agents?

No single framework covers coding agents end-to-end. Agent orchestration (LangGraph, CrewAI, or Claude SDK) handles coordination. Code-specific operations (fast-apply, parallel code search, context compression) require specialized tooling like the Morph SDK. Production coding agents combine both.

Build Coding Agents That Use 60% Fewer Tokens

The Morph SDK provides fast-apply, WarpGrep code search, and Compact context compression. Plug them into any agent framework.

View Morph Docs

Try WarpGrep

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

LLM Frameworks in 2026: The Only Guide That Tells You Which Ones to Skip