Morph Gets Faster: 10,500+ Tokens Per Second

We were already the fastest way to apply AI-generated code edits—processing at 2500 tokens per second while traditional approaches struggled at 200-400 tok/sec.
Now Morph v3 hits 10,500+ tokens per second. That's faster than our previous model and results in ~35% faster end-to-end task completion compared to search-and-replace approaches. We're launching with three model options:
| Model | Speed | Best For |
|---|---|---|
| morph-v3-fast | 10,500+ tok/sec | Most coding agents and files |
| morph-v3-large | 5000+ tok/sec | Complex edits requiring maximum accuracy |
| auto | Variable | Automatically routes to the best model based on complexity — requests billed by model used |
Why Speed Matters for Coding Agents
Speed and reliability are fundamental to every coding agent worth building. Without both, you get unpredictable AI with sluggish tools.
Every successful coding agent obsesses over latency because cognitive flow state has a half-life measured in seconds. When Claude suggests a perfect refactor but it takes 15 seconds to apply, you've lost context.
Search and replace requires a separate tool call for each chunk being edited—multiple edits mean multiple round trips. Morph handles all edits in one call, which means fewer tool calls and faster end-to-end task completion.
Technical Improvements
Advanced Speculative Decoding
Our v1 used unchanged code portions as drafts. V2 speculates on semantic patterns within changes:
- Function signatures often remain unchanged when editing bodies
- Adding functionality follows predictable import patterns
- Code style creates strong priors for formatting
Infrastructure Optimizations
- Fused attention operations that minimize GPU memory transfers
- Dynamic batching that adapts to real-time request patterns
- Custom CUDA kernels optimized for code transformation
Intelligent Model Routing
Our auto model analyzes each request in real-time to determine optimal routing:
- Simple edits (variable renames, imports, minor fixes) → morph-v3-fast
- Complex refactors (architectural changes, multi-function edits) → morph-v3-large
- Contextual analysis considers file size, change complexity, and historical patterns
Real Performance Impact
Model Selection Guidelines:
- Use morph-v3-fast for most coding tasks, agent workflows, and standard file edits
- Use morph-v3-large for complex refactors, architectural changes, or when maximum precision matters
- Use auto to let our system intelligently route based on edit complexity
Enterprise benefits:
- All edits in one call (vs separate tool calls per chunk)
- Real-time streaming—see results before completion
- Intelligent model routing optimizes both speed and cost
- ~35% faster end-to-end task completion
New Possibilities
At 10,500+ tok/sec, new workflows become possible:
- Speculative applies: Process changes before user clicks "apply"
- Multi-file refactors: Coordinate multiple files quickly
- Interactive editing: Real-time feedback as models generate suggestions
Speed Thresholds
- Below 1000 tok/sec: Breaks flow state
- 1000-2000 tok/sec: Good
- 10,500+ tok/sec: Infrastructure becomes invisible (i.e. you can't tell it's there)
Ready for 10,500+ tok/sec edits? Get your Morph API key
Building a coding agent? Contact us about dedicated instances.
Speed isn't everything. It's the only thing that makes everything else possible.

