Morph Gets Faster: 10,500+ Tokens Per Second

We were already the fastest way to apply AI code edits. Now we're ~35% faster end-to-end compared to search-and-replace.

Tejas Bhakta
Tejas Bhakta
June 15, 20253 min read
Morph Gets Faster: 10,500+ Tokens Per Second

Morph Gets Faster: 10,500+ Tokens Per Second

Speed comparison showing Morph v3-fast at 10,500+ tokens per second

We were already the fastest way to apply AI-generated code edits—processing at 2500 tokens per second while traditional approaches struggled at 200-400 tok/sec.

Now Morph v3 hits 10,500+ tokens per second. That's faster than our previous model and results in ~35% faster end-to-end task completion compared to search-and-replace approaches. We're launching with three model options:

ModelSpeedBest For
morph-v3-fast10,500+ tok/secMost coding agents and files
morph-v3-large5000+ tok/secComplex edits requiring maximum accuracy
autoVariableAutomatically routes to the best model based on complexity — requests billed by model used

Why Speed Matters for Coding Agents

Speed and reliability are fundamental to every coding agent worth building. Without both, you get unpredictable AI with sluggish tools.

Every successful coding agent obsesses over latency because cognitive flow state has a half-life measured in seconds. When Claude suggests a perfect refactor but it takes 15 seconds to apply, you've lost context.

Search and replace requires a separate tool call for each chunk being edited—multiple edits mean multiple round trips. Morph handles all edits in one call, which means fewer tool calls and faster end-to-end task completion.

Technical Improvements

Advanced Speculative Decoding

Our v1 used unchanged code portions as drafts. V2 speculates on semantic patterns within changes:

  • Function signatures often remain unchanged when editing bodies
  • Adding functionality follows predictable import patterns
  • Code style creates strong priors for formatting

Infrastructure Optimizations

  • Fused attention operations that minimize GPU memory transfers
  • Dynamic batching that adapts to real-time request patterns
  • Custom CUDA kernels optimized for code transformation

Intelligent Model Routing

Our auto model analyzes each request in real-time to determine optimal routing:

  • Simple edits (variable renames, imports, minor fixes) → morph-v3-fast
  • Complex refactors (architectural changes, multi-function edits) → morph-v3-large
  • Contextual analysis considers file size, change complexity, and historical patterns

Real Performance Impact

Model Selection Guidelines:

  • Use morph-v3-fast for most coding tasks, agent workflows, and standard file edits
  • Use morph-v3-large for complex refactors, architectural changes, or when maximum precision matters
  • Use auto to let our system intelligently route based on edit complexity

Enterprise benefits:

  • All edits in one call (vs separate tool calls per chunk)
  • Real-time streaming—see results before completion
  • Intelligent model routing optimizes both speed and cost
  • ~35% faster end-to-end task completion

New Possibilities

At 10,500+ tok/sec, new workflows become possible:

  • Speculative applies: Process changes before user clicks "apply"
  • Multi-file refactors: Coordinate multiple files quickly
  • Interactive editing: Real-time feedback as models generate suggestions

Speed Thresholds

  • Below 1000 tok/sec: Breaks flow state
  • 1000-2000 tok/sec: Good
  • 10,500+ tok/sec: Infrastructure becomes invisible (i.e. you can't tell it's there)

Ready for 10,500+ tok/sec edits? Get your Morph API key

Building a coding agent? Contact us about dedicated instances.


Speed isn't everything. It's the only thing that makes everything else possible.