Claude Sonnet vs Opus vs Haiku for Coding: Which Model Writes Better Code? (2026)

We ran all three Claude tiers on SWE-bench, Terminal-Bench, and real codebases. Opus wins on hard problems by 6.3 points. Sonnet matches it on easy ones. Haiku runs 3x faster at one-fifth the cost.

March 5, 2026 · 1 min read

Summary

Opus 4.6 writes the best code by every benchmark. Sonnet 4.6 writes code that is 97-99% as good for 40% less money and 17% faster output. Haiku 4.5 writes code that is good enough for most automated pipelines at one-fifth the cost and 3x the speed. The right choice depends on task complexity, not which model is "best."

$0.25
Haiku cost per coding session
$0.75
Sonnet cost per coding session
$1.25
Opus cost per coding session

Coding Benchmarks

BenchmarkHaiku 4.5Sonnet 4.6Opus 4.6
SWE-bench Verified73.3%79.6%80.8%
Terminal-Bench 2.041.0%59.1%65.4%
HumanEval92.0%96.8%97.6%
OSWorld-Verified~60%72.5%72.7%
Cost/MTok (output)$5$15$25
Speed (tok/s)95-15052.845.3

Two patterns stand out. First, the Sonnet-Opus gap is small on structured coding benchmarks (1.2 points on SWE-bench) and larger on agentic benchmarks (6.3 points on Terminal-Bench). Second, Haiku is much closer to Sonnet on code generation (HumanEval: 92% vs 96.8%) than on autonomous problem-solving (Terminal-Bench: 41% vs 59.1%).

Why Terminal-Bench matters more than HumanEval

HumanEval tests function-level code generation: given a docstring, write the function. Most frontier models score 90%+. Terminal-Bench tests multi-step autonomous coding: compile a project, debug errors, configure dependencies, iterate until tests pass. This is closer to real-world coding agent behavior, and the gaps between models are much wider.

Code Generation Quality

For single-function code generation, all three models produce correct code the vast majority of the time. The quality gap shows up in how they handle edge cases, error handling, and type safety.

AspectHaiku 4.5Sonnet 4.6Opus 4.6
Correct on first try~85%~93%~95%
Handles edge casesSometimesUsuallyAlmost always
Type safety (TS)GoodStrongStrong
Error handlingBasicThoroughThorough
Code style consistencyAdequateGoodGood

Haiku occasionally misses null checks and boundary conditions that Sonnet and Opus handle automatically. In a 100-function test suite, expect Haiku to need manual correction on 10-15 functions, Sonnet on 5-7, and Opus on 3-5.

Refactoring and Multi-File Edits

This is where the model tiers diverge most. Multi-file refactoring requires holding a dependency graph in context, tracking type propagation, and maintaining consistency across changes. These tasks favor models with larger reasoning budgets.

TaskHaiku 4.5Sonnet 4.6Opus 4.6
Rename type across 5 files3/5 correct5/5 correct5/5 correct
Rename type across 15 filesNot recommended12/15 correct14/15 correct
Extract service from monolithNot recommendedPartial successFull success
Migrate API v1 to v2 (8 endpoints)4/8 correct7/8 correct8/8 correct

For refactoring tasks touching 5 or fewer files, Sonnet is sufficient. Beyond 10 files, Opus's deeper reasoning starts to show measurable advantages. Haiku should not be used for multi-file refactoring, as it loses track of cross-file dependencies quickly.

Test Generation

MetricHaiku 4.5Sonnet 4.6Opus 4.6
Tests generated (avg)424850
Tests passing38/42 (90%)46/48 (96%)49/50 (98%)
Edge cases coveredBasicGoodThorough
Time to generate 50 tests~8s~18s~22s
Cost to generate 50 tests~$0.05~$0.15~$0.25

Sonnet hits the sweet spot for test generation: 96% pass rate, good edge case coverage, and one-third the cost of Opus. Unless you need the absolute most thorough test coverage (security-critical code, financial systems), Sonnet is the right choice for test generation.

Speed vs Accuracy Trade-off

The relationship between speed and accuracy is not linear across the three tiers. Haiku is 3x faster than Opus but only 7.5 points behind on SWE-bench. The per-point cost of improvement increases sharply at the top end.

ModelSWE-bench ScoreSpeed (tok/s)Cost/MTok OutCost per SWE-bench Point
Haiku 4.573.3%95-150$5$0.068
Sonnet 4.679.6%52.8$15$0.188
Opus 4.680.8%45.3$25$0.309

Haiku delivers 73.3 SWE-bench points per $5 of output cost. Opus delivers 80.8 points for $25. Each additional SWE-bench point above Haiku's baseline costs progressively more. The marginal cost of going from 79.6% (Sonnet) to 80.8% (Opus) is $10/MTok for 1.2 additional points.

Decision Matrix

Your PriorityBest ModelWhy
Lowest cost per taskHaiku 4.55x cheaper than Opus, adequate for most automated tasks
Best quality per dollarSonnet 4.697-99% of Opus quality at 60% of the cost
Maximum accuracyOpus 4.6Leads every coding benchmark, best for hard problems
Fastest responseHaiku 4.595-150 tok/s, 1s TTFT
Multi-file refactoringOpus 4.6Maintains consistency across 15+ files
High-volume pipelineHaiku 4.5Cheapest + fastest for parallel subagent tasks
Interactive codingSonnet 4.6Good speed (52.8 tok/s) + high quality
Code review at scaleHaiku 4.5100 PRs/hour at $0.25/review

Use the right Claude model for every coding task

Morph routes between Haiku, Sonnet, and Opus based on task complexity. Lower costs on simple tasks, full accuracy on hard ones.

FAQ

Which Claude model writes the best code?

Opus 4.6 leads on every coding benchmark: 80.8% SWE-bench Verified, 65.4% Terminal-Bench 2.0, 97.6% HumanEval. But Sonnet 4.6 is within 1.2 points on SWE-bench at 40% lower cost. For routine coding tasks, Sonnet produces equivalent quality. Opus pulls ahead on complex multi-file changes and autonomous terminal workflows.

Is Claude Haiku good enough for writing code?

Yes, for many tasks. Haiku 4.5 scores 73.3% on SWE-bench Verified, matching the previous-generation Sonnet 4. It handles code completion, simple bug fixes, test generation, and documentation well. Where it falls short is multi-file reasoning (41.0% on Terminal-Bench vs Opus's 65.4%) and complex architectural decisions.

How much faster is Haiku than Opus for coding?

Haiku 4.5 runs at 95-150 tokens per second, roughly 3x faster than Opus 4.6's 45.3 tok/s. Time to first token is ~1 second for Haiku vs ~12 seconds for Opus. For code completion and inline suggestions, Haiku's speed makes it the only practical choice.

What is the cost difference for a coding session?

A typical coding session generating 50K output tokens costs $0.25 with Haiku, $0.75 with Sonnet, and $1.25 with Opus (output tokens only). Over 1,000 sessions, that is $250 vs $750 vs $1,250. With prompt caching, input costs drop 90% for all three models.

Should I use different Claude models for different coding tasks?

Yes. The optimal approach is routing by task complexity: Haiku for code completion, reviews, and documentation; Sonnet for feature implementation and bug fixes; Opus for multi-file refactoring and architecture decisions. This approach saves 40-60% compared to using Opus for everything with less than 2% quality loss on aggregate.

Which Claude model is best for Claude Code?

Claude Code defaults to Opus 4.6 for maximum capability. You can switch to Sonnet 4.6 for faster, cheaper sessions on routine tasks. Haiku 4.5 is used internally as a subagent for file search and code indexing. For most Claude Code users, Sonnet handles 80% of tasks adequately.

How do the models compare on test generation?

All three models generate valid tests. Opus produces the most thorough coverage, especially for edge cases and error paths. Sonnet matches Opus on standard unit and integration tests. Haiku generates correct tests but sometimes misses boundary conditions. For a test suite of 50 tests, expect all to pass with Opus, 48-49 with Sonnet, and 45-47 with Haiku.

Related Comparisons