---
title: "Morph Gets Faster: 10,500+ Tokens Per Second"
url: "https://www.morphllm.com/blog/morph-gets-faster"
description: "We were already the fastest way to apply AI code edits. Now we're ~35% faster end-to-end compared to search-and-replace."
date: "2025-06-15"
author: "Tejas Bhakta"
---
# Morph Gets Faster: 10,500+ Tokens Per Second

# Morph Gets Faster: 10,500+ Tokens Per Second
![Speed comparison showing Morph v3-fast at 10,500+ tokens per second](/images/speed.png)


We were already the fastest way to apply AI-generated code edits—processing at 2500 tokens per second while traditional approaches struggled at 200-400 tok/sec.

Now **Morph v3 hits 10,500+ tokens per second**. That's faster than our previous model and results in ~35% faster end-to-end task completion compared to search-and-replace approaches.
We're launching with three model options:

| Model | Speed | Best For |
|-------|-------|----------|
| **morph-v3-fast** | 10,500+ tok/sec | Most coding agents and files |
| **morph-v3-large** |5000+ tok/sec | Complex edits requiring maximum accuracy |
| **auto** | Variable | Automatically routes to the best model based on complexity — requests billed by model used |

## Why Speed Matters for Coding Agents

**Speed and reliability are fundamental to every coding agent worth building.** Without both, you get unpredictable AI with sluggish tools.

Every successful coding agent obsesses over latency because **cognitive flow state has a half-life measured in seconds**. When Claude suggests a perfect refactor but it takes 15 seconds to apply, you've lost context.

Search and replace requires a separate tool call for each chunk being edited—multiple edits mean multiple round trips. **Morph handles all edits in one call**, which means fewer tool calls and faster end-to-end task completion.

## Technical Improvements

### Advanced Speculative Decoding
Our v1 used unchanged code portions as drafts. V2 speculates on semantic patterns within changes:

- Function signatures often remain unchanged when editing bodies
- Adding functionality follows predictable import patterns  
- Code style creates strong priors for formatting

### Infrastructure Optimizations
- Fused attention operations that minimize GPU memory transfers
- Dynamic batching that adapts to real-time request patterns
- Custom CUDA kernels optimized for code transformation

### Intelligent Model Routing
Our **auto** model analyzes each request in real-time to determine optimal routing:

- **Simple edits** (variable renames, imports, minor fixes) → morph-v3-fast
- **Complex refactors** (architectural changes, multi-function edits) → morph-v3-large
- **Contextual analysis** considers file size, change complexity, and historical patterns

## Real Performance Impact

**Model Selection Guidelines:**
- **Use morph-v3-fast** for most coding tasks, agent workflows, and standard file edits
- **Use morph-v3-large** for complex refactors, architectural changes, or when maximum precision matters
- **Use auto** to let our system intelligently route based on edit complexity

**Enterprise benefits:**
- All edits in one call (vs separate tool calls per chunk)
- Real-time streaming—see results before completion
- Intelligent model routing optimizes both speed and cost
- ~35% faster end-to-end task completion

## New Possibilities

At 10,500+ tok/sec, new workflows become possible:
- **Speculative applies**: Process changes before user clicks "apply"
- **Multi-file refactors**: Coordinate multiple files quickly
- **Interactive editing**: Real-time feedback as models generate suggestions

## Speed Thresholds

- **Below 1000 tok/sec**: Breaks flow state
- **1000-2000 tok/sec**: Good
- **10,500+ tok/sec**: Infrastructure becomes invisible (i.e. you can't tell it's there)


---

**Ready for 10,500+ tok/sec edits?** [Get your Morph API key](/dashboard/api-keys)

**Building a coding agent?** [Contact us](mailto:info@morphllm.com) about dedicated instances.

---

*Speed isn't everything. It's the only thing that makes everything else possible.*
