There are now 40+ LLM frameworks. Most developers need 2-3 at most. The hard part is knowing which ones. This guide covers every major framework in 2026, grouped by what they actually do, with benchmarks and concrete recommendations for each use case.

The Framework Landscape
The LLM framework ecosystem has consolidated into four distinct categories. Understanding which category you need eliminates 80% of the decision.
Orchestration
Connect LLMs to data, tools, and APIs. Build RAG pipelines. Chain multiple steps. LangChain, LlamaIndex, Haystack.
Agent
Coordinate multi-step autonomous workflows. Multi-agent systems. CrewAI, Claude SDK, OpenAI SDK, Google ADK, LangGraph.
Optimization
Replace manual prompt engineering with programmatic tuning. DSPy, TextGrad.
Code-Specific
Handle coding primitives: fast-apply, code search, context compression. Morph SDK.
Most production applications combine one framework from two categories. A RAG chatbot uses LlamaIndex (orchestration) with no agent layer. A coding assistant uses CrewAI or Claude SDK (agent) plus Morph SDK (code-specific). Stacking three or more frameworks usually signals overengineering.
| Framework | Category | Language | GitHub Stars | Best For |
|---|---|---|---|---|
| LangChain | Orchestration | Python, JS | 100K+ | General LLM plumbing |
| LlamaIndex | Orchestration | Python, TS | 40K+ | RAG pipelines |
| Haystack | Orchestration | Python | 20K+ | Typed, auditable pipelines |
| CrewAI | Agent | Python | 44K+ | Multi-agent prototyping |
| LangGraph | Agent | Python, JS | 10K+ | Stateful agent workflows |
| Claude Agent SDK | Agent | Python | New | MCP-native agents |
| OpenAI Agents SDK | Agent | Python | New | Simple handoff agents |
| Google ADK | Agent | Python | 15K+ | Multimodal agents |
| Semantic Kernel | Orchestration | C#, Python, Java | 27.5K | Enterprise .NET |
| DSPy | Optimization | Python | 20K+ | Prompt optimization |
| TextGrad | Optimization | Python | 2K+ | Text-based autodiff |
| Morph SDK | Code-specific | Python, TS | N/A | Fast-apply, code search |
Orchestration Frameworks
Orchestration frameworks handle the connective tissue between LLMs and everything else: vector stores, APIs, databases, document loaders, and output parsers. If your application retrieves data, processes it through an LLM, and returns structured output, you need an orchestration framework.
LangChain
LangChain is the most widely adopted LLM framework with 100K+ GitHub stars and 34.5 million monthly downloads (via LangGraph). Its modular chain-based architecture lets you compose prompts, tools, and memory into reusable pipelines.
The strength is ecosystem breadth. LangChain has integrations for nearly every vector store, document loader, and LLM provider. The weakness is abstraction overhead: 10ms framework latency and 2.4K tokens per call, the highest of any major framework. For simple use cases, the abstractions add complexity without proportional value.
Use LangChain when you need broad integrations and rapid prototyping. Skip it when your pipeline is straightforward enough that raw API calls with a vector store SDK would suffice.
LlamaIndex
LlamaIndex treats retrieval as the core problem, not a side feature. It provides the deepest set of indexing strategies (tree, keyword, vector, knowledge graph), chunking algorithms, and retrieval methods of any framework. With 40K+ GitHub stars, it is the default for applications that need structured access to private data.
Framework overhead is moderate at 6ms and 1.6K tokens per call. Many production teams use LlamaIndex for ingestion and indexing while layering LangChain or LangGraph on top for orchestration. This combination is often the fastest route to a robust RAG system.
Use LlamaIndex when retrieval quality is the primary success metric. It is overkill if you only need simple document Q&A with a single vector store.
Haystack
Haystack by deepset takes the most principled approach to framework design. Every component has typed inputs and outputs. Pipelines are directed acyclic graphs you can visualize, debug, and test node by node. It has the lowest token usage of any framework at 1.57K per call and 5.9ms overhead.
Enterprise teams in regulated industries prefer Haystack for its auditability and reproducibility. The tradeoff is a smaller ecosystem compared to LangChain and a steeper learning curve for simple tasks.
Use Haystack when you need production-grade, auditable NLP pipelines with typed contracts between components. Especially strong for document search and question answering in regulated verticals.
Semantic Kernel
Microsoft's Semantic Kernel (27.5K GitHub stars) is the go-to for .NET and Java teams. It includes built-in short-term and long-term memory, enabling context persistence across sessions. In late 2025, Microsoft merged it with AutoGen to create the Microsoft Agent Framework, adding multi-agent orchestration with session-based state management, type safety, and telemetry.
Use Semantic Kernel when your stack is C#, Java, or enterprise Microsoft. It covers both orchestration and agent patterns under one roof.
Agent Frameworks
Agent frameworks coordinate multi-step, autonomous workflows where the LLM decides what to do next. The market in March 2026 has settled into clear lanes. Each framework owns a distinct niche.
CrewAI
44K+ stars. 60M monthly executions. MCP + A2A protocol support. 60% of Fortune 500. Best for rapid multi-agent prototyping with the broadest interoperability.
LangGraph
State machine-based agent orchestration. Best persistence and checkpointing. 34.5M monthly downloads. Best for complex stateful workflows that need replay and debugging.
Claude Agent SDK
Anthropic's general-purpose agent runtime. MCP-native with in-process server model and lifecycle hooks. Extended thinking. Best for teams already on Claude.
OpenAI Agents SDK
Replaced Swarm in March 2025. Core abstraction: explicit handoffs between agents carrying conversation context. Simplest path from zero to working agent.
Google ADK
Powers agents inside Agentspace and Customer Engagement Suite. Now open-source. Parallel, sequential, and hierarchical composition. Model-agnostic via LiteLLM.
Microsoft Agent Framework
Semantic Kernel + AutoGen merged. Covers single- and multi-agent patterns with enterprise-grade session management, type safety, and telemetry. C#, Python, Java.
| Framework | Multi-Agent | Protocol Support | Persistence | Learning Curve |
|---|---|---|---|---|
| CrewAI | Yes (role-based) | MCP + A2A | Basic | Low |
| LangGraph | Yes (graph-based) | MCP | Advanced (checkpoints) | Medium-High |
| Claude Agent SDK | Yes | MCP (native) | Via MCP servers | Medium |
| OpenAI Agents SDK | Yes (handoffs) | Custom | Basic | Low |
| Google ADK | Yes (hierarchical) | A2A + MCP | Built-in | Medium |
| MS Agent Framework | Yes (AutoGen) | Custom | Session-based | High |
Protocol convergence
Two protocols are emerging as standards: Anthropic's MCP (Model Context Protocol) for tool and resource access, and Google's A2A (Agent-to-Agent) for inter-agent communication. CrewAI supports both. Most frameworks support at least MCP. Choosing a framework with protocol support future-proofs your agent architecture.
Optimization Frameworks
Optimization frameworks replace the trial-and-error of manual prompt engineering with systematic, programmatic approaches. They treat prompts as parameters to be tuned, not strings to be hand-written.
DSPy
DSPy from Stanford is the leading optimization framework. Instead of writing prompts, you define modules with typed input/output signatures. DSPy's compiler then optimizes the prompts automatically through a process analogous to training a neural network, but at the prompt level.
It has the lowest framework overhead of any major framework at 3.53ms and 2.03K tokens per call. DSPy excels in research and experimental workflows where you need to iterate rapidly across model configurations, prompt strategies, and evaluation criteria.
Use DSPy when prompt quality is the bottleneck and you have a clear evaluation metric. It is less useful for open-ended creative tasks where there is no objective measure of output quality.
TextGrad
TextGrad implements automatic differentiation via text. Where DSPy uses compilation, TextGrad uses LLM-generated feedback as "gradients" to iteratively improve prompts and outputs. Published in Nature, it follows PyTorch's syntax for familiarity.
Results: improved GPT-4o zero-shot accuracy on Google-Proof QA from 51% to 55%. Yielded 20% relative performance gain on LeetCode-Hard coding problems. It is more experimental than DSPy but powerful for tasks where iterative refinement of outputs matters more than pipeline optimization.
Use TextGrad when you need to optimize individual outputs (not just prompts) through iterative refinement. Think code generation, molecule design, or treatment planning where each output can be scored and improved.
Code-Specific Tooling
General-purpose LLM frameworks handle orchestration and agent coordination. But coding agents have specific needs that none of them address: applying diffs to files, searching codebases efficiently, and compressing context so agents do not blow through token budgets reading irrelevant code.
Cognition (makers of Devin) measured that coding agents spend 60% of their time on search and file reading. This is the bottleneck, not model quality or orchestration logic. Anthropic's own multi-agent research showed 90% improvement when delegating specialized tasks to sub-agents with separate context windows.
Fast Apply
Apply code edits as diffs instead of rewriting entire files. 10,500 tok/s throughput. Reduces output tokens per edit by 70%+ compared to full-file rewrites.
WarpGrep
Parallel code search across repositories. 8 tool calls per turn, 4 turns, sub-6s. Returns only relevant line ranges, not full files, cutting input tokens by 60%.
Compact
Context compression for long coding sessions. Preserves semantic meaning while reducing token count. Prevents context rot in extended agent workflows.
The Morph SDK provides these primitives as API calls. They slot into any agent framework. If you are building with CrewAI, LangGraph, Claude SDK, or OpenAI Agents SDK, the Morph SDK handles the code-specific operations your agent framework does not.
Token efficiency compounds
When a coding agent uses 60% fewer tokens per action, it effectively gets 2.5x the rate limit headroom without changing API tiers. For teams on Claude or OpenAI usage-based pricing, this translates directly to lower cost per task.
Framework Benchmarks
Framework overhead matters in production. Every millisecond of framework latency and every extra token adds up across thousands of requests. These benchmarks measure the framework's own cost, separate from the LLM inference time.
| Framework | Latency Overhead | Tokens Per Call | Notes |
|---|---|---|---|
| DSPy | 3.53ms | 2.03K | Lowest latency, compiled prompts |
| Haystack | 5.9ms | 1.57K | Lowest token usage, typed pipelines |
| LlamaIndex | 6.0ms | 1.60K | Efficient for RAG workloads |
| LangChain | 10.0ms | 2.40K | Highest overhead, broadest ecosystem |
| LangGraph | 14.0ms | 2.03K | Graph + state overhead |
For context: a typical LLM API call takes 500-3000ms. So framework overhead of 4-14ms is small in absolute terms. It matters more at high throughput (thousands of requests per minute) or in latency-sensitive streaming applications. Token usage is the more impactful difference, since extra tokens cost money on every call.
Choosing the Right Stack
The right combination depends on what you are building, not on which framework has the most stars. Here are concrete recommendations for common use cases.
RAG Application
Start with LlamaIndex for data ingestion, indexing, and retrieval. If your workflows grow complex (conditional routing, multi-step processing, human-in-the-loop), add LangGraph for orchestration. In regulated industries, consider Haystack for its typed, auditable pipeline design.
Multi-Agent System
For rapid prototyping, CrewAI gets you from concept to working system fastest. For production workloads that need persistence, replay, and debugging, LangGraph has the strongest story. If you are locked into a provider, Claude Agent SDK or OpenAI Agents SDK minimize friction.
Coding Agent
Pair any agent framework with the Morph SDK for code-specific operations. Fast Apply handles edits. WarpGrep handles search. Compact handles context compression. These are the primitives no general-purpose framework provides.
Enterprise .NET/Java
Semantic Kernel / Microsoft Agent Framework is the clear choice. Native C# and Java support, enterprise-grade telemetry, and tight Azure integration. No other framework covers these languages with the same depth.
Research and Experimentation
DSPy for prompt optimization. TextGrad for output optimization. Both eliminate manual prompt tuning and let you iterate programmatically against evaluation metrics.
The two-framework rule
If you find yourself integrating three or more frameworks, step back. Each framework adds dependency risk, upgrade friction, and cognitive overhead for your team. Two frameworks covering different categories (e.g., orchestration + code-specific) is the sweet spot. Three signals overengineering.
Frequently Asked Questions
What are the most popular LLM frameworks in 2026?
By GitHub stars: LangChain (100K+), CrewAI (44K+), LlamaIndex (40K+), Semantic Kernel (27.5K), and DSPy (20K+). By monthly downloads, LangGraph leads with 34.5 million and CrewAI follows with 5.2 million. Popularity does not always correlate with fit for your use case.
What is the difference between LangChain and LlamaIndex?
LangChain is a general-purpose orchestration framework for connecting LLMs to tools, APIs, and databases. LlamaIndex specializes in retrieval-augmented generation with deeper indexing, chunking, and retrieval capabilities. Many teams use both: LlamaIndex for data ingestion and LangChain/LangGraph for workflow orchestration.
Which LLM framework has the lowest overhead?
DSPy at 3.53ms latency overhead per call. For token efficiency, Haystack leads at 1.57K tokens per call versus LangChain's 2.40K. The 53% token difference compounds significantly at scale.
What is the best framework for building AI agents?
CrewAI for rapid multi-agent prototyping. LangGraph for complex stateful orchestration with persistence. OpenAI Agents SDK for the simplest path to a working agent. Claude Agent SDK for MCP-native development. Google ADK for multimodal workflows. The "best" depends on your requirements, not an absolute ranking.
What is DSPy and how does it differ from prompt engineering?
DSPy from Stanford replaces manual prompt writing with programmatic optimization. You define modules with typed signatures, and DSPy's compiler optimizes the prompts automatically. It is to prompt engineering what machine learning is to hand-coded rules: systematic, reproducible, and measurable.
Do I need a framework to build with LLMs?
No. For simple single-model API calls, the provider SDK (OpenAI, Anthropic, Google) is sufficient. Frameworks add value when you need RAG pipelines, multi-step agent workflows, tool orchestration, or multi-model routing. The overhead is only justified when the abstractions save more development time than they cost in complexity.
Which framework is best for coding agents?
No single framework covers coding agents end-to-end. Agent orchestration (LangGraph, CrewAI, or Claude SDK) handles coordination. Code-specific operations (fast-apply, parallel code search, context compression) require specialized tooling like the Morph SDK. Production coding agents combine both.
Build Coding Agents That Use 60% Fewer Tokens
The Morph SDK provides fast-apply, WarpGrep code search, and Compact context compression. Plug them into any agent framework.