Best AI Coding Agent (2026): Ranked by Terminal-Bench, Price, and Source

Codex CLI tops Terminal-Bench 2.1 at 83.4%, Claude Code at 78.9%. OpenCode (172k stars, MIT) is the most-starred open source agent. Full ranked table: scores, exact prices, install commands.

June 9, 2026 ยท 1 min read

You want one answer: which AI coding agent is best. On the public Terminal-Bench 2.1 leaderboard, Codex CLI with GPT-5.5 is #1 at 83.4%, Claude Code with Opus 4.8 is #2 at 78.9%, and Gemini CLI with Gemini 3.1 Pro is at 70.7%. For openness, OpenCode (172,198 stars, MIT) is the most-starred open source agent. The best agent depends on whether you optimize for benchmark ceiling, cost, or running your own model. Below: 11 agents ranked on all three, with exact prices, install commands, and verified scores. Updated June 9, 2026.

Best AI coding agent by goal

Verified June 9, 2026. Scores from Terminal-Bench 2.1 (tbench.ai), prices from vendor pages.

Highest benchmark

Codex CLI + GPT-5.5

83.4% Terminal-Bench 2.1, #1

Deepest reasoning

Claude Code + Opus 4.8

78.9% Terminal-Bench, 69.2% SWE-bench Pro

Most open source

OpenCode (MIT)

172,198 stars, 75-plus providers

Free, model-agnostic: OpenCode, Cline, Aider, Kilo Code, Gemini CLI (60 req/min, 1,000/day free). Best IDE flow: Cursor (Pro $20/mo). Cheapest paid default: GitHub Copilot Pro ($10/mo).

Terminal-Bench 2.1 leaderboard (the agent benchmark that matters)

Terminal-Bench measures an agent driving a real terminal to complete development tasks: editing files, running commands, fixing failures. It tests the agent and model together, which is the right unit, because the same model scores differently inside different agents. Scores below are from the public tbench.ai leaderboard as of June 9, 2026.

Terminal-Bench 2.1 (agent + model)

Percentage of terminal development tasks completed. Higher is better.

1Codex CLI + GPT-5.5#1
83.4%
2Claude Code + Opus 4.8#2
78.9%
3Terminus 2 + GPT-5.5#3
78.2%
4Terminus 2 + Gemini 3 Pro#5
74.4%
5Gemini CLI + Gemini 3.1 Pro#6
70.7%
6Claude Code + Opus 4.7#8
69.7%

Codex CLI with GPT-5.5 leads at 83.4%. Claude Code with Opus 4.8 is second at 78.9%, ahead of Opus 4.7 at 69.7%. Source: tbench.ai, June 9, 2026.

Two leaderboards disagree on the top model, and that is fine because they test different things. Terminal-Bench rewards driving a terminal end to end. SWE-bench Pro rewards fixing real GitHub issues. On SWE-bench Pro, Claude Opus 4.8 scores 69.2% (up 4.9 points from Opus 4.7's 64.3%), outperforming GPT-5.5 and Gemini 3.1 Pro. On the self-reported SWE-bench Verified leaderboard at llm-stats.com, Claude Opus 4.8 sits at 88.6% and Claude Opus 4.7 at 87.6%. Read benchmarks as the agent-plus-model pair, not the model alone.

83.4%
Codex CLI + GPT-5.5, Terminal-Bench 2.1
78.9%
Claude Code + Opus 4.8, Terminal-Bench
69.2%
Opus 4.8, SWE-bench Pro
172,198
OpenCode GitHub stars (MIT)

Pricing, side by side

Open source agents are free as tools; you pay for model tokens. Subscription agents bundle model access into a plan with usage windows or credits. Prices verified from vendor pages on June 9, 2026.

AgentLicense / sourceEntry priceHow you pay for models
Claude CodeProprietary (repo for issues)Pro $17/mo annual or $20/mo; Max from $100/moBundled. 5-hour rolling window plus weekly cap shared across claude.ai and Claude Code
OpenAI Codex CLIApache-2.0, 89,991 starsChatGPT Plus $20/mo; Pro from $100/mo (5x and 20x)Bundled per 5-hour window, or BYO OpenAI API key at per-token rates
CursorProprietary IDEHobby $0; Pro $20/moPro includes ~$20 of API-rate usage; Pro+ $60, Ultra $200; separate Auto + Composer pool
GitHub CopilotProprietaryFree $0; Pro $10/moCredit-based since June 1, 2026; Pro = 1,500 credits ($15 value); 1 credit = $0.01
OpenCodeMIT, 172,198 starsFreeBYOK across 75-plus providers; ChatGPT Plus / Copilot / GitLab Duo usable as backends
ClineApache-2.0, 62,996 starsFreeBYOK any provider, or local via Ollama / LM Studio; no markup
AiderApache-2.0, 45,945 starsFreeBYOK per run, e.g. anthropic / deepseek / openai-compatible
Kilo CodeMIT, 19,968 starsFreeKilo Gateway $0/mo at exact provider rates, no markup; Kilo Pass $19/$49/$199/mo
Gemini CLIApache-2.0, 105,104 starsFree60 req/min, 1,000/day free with personal Google account; or API key
GooseApache-2.0, 48,542 stars (AAIF)Free15-plus providers; reuse Claude/ChatGPT/Gemini subs via ACP
KiroProprietary (AWS)Free 50 credits/mo; Pro $20/moCredit-based: Pro 1,000, Pro+ $40 = 2,000, Power $200 = 10,000; overage $0.04/credit

GitHub Copilot moved to credit billing on June 1, 2026

Copilot replaced premium request units with GitHub AI Credits (1 credit = $0.01). Pro $10/mo = 1,500 credits ($15 value), Pro+ $39/mo = 7,000 credits ($70), the new Max $100/mo = 20,000 credits ($200). Basic code completions never consume credits and stay unlimited on paid plans. Note: as of June 2026, new sign-ups for Copilot Student, Pro, Pro+, and Max are paused while the billing change rolls out.

Claude Code

Best for reasoning depth on hard problems, in the terminal.

Anthropic's terminal-native agent. With Opus 4.8 it scores 78.9% on Terminal-Bench 2.1 (#2 overall) and Opus 4.8 leads SWE-bench Pro at 69.2%. The repo anthropics/claude-code has 131,380 stars but is proprietary (the repo is for issues and docs, no open-source license).

78.9%
Terminal-Bench 2.1 (Opus 4.8)
69.2%
SWE-bench Pro (Opus 4.8)
$17-100+
Pro annual to Max /mo
131,380
GitHub stars (proprietary)

Install

Install Claude Code

# Native install (recommended)
curl -fsSL https://claude.ai/install.sh | bash      # macOS / Linux / WSL
# Windows PowerShell:
#   irm https://claude.ai/install.ps1 | iex

# Alternatives
brew install --cask claude-code
winget install Anthropic.ClaudeCode
npm install -g @anthropic-ai/claude-code           # Node 18+

# Add an MCP server
claude mcp add --transport http notion https://mcp.notion.com/mcp

Pricing and limits

Claude Pro is $17/mo billed annually ($200 up front) or $20/mo monthly and includes Claude Code; Max starts from $100/mo. Usage runs on a 5-hour rolling session window plus a weekly cap that covers all models over 7 days, and it is shared across claude.ai, Claude Desktop, and Claude Code on the same subscription. The free Claude.ai plan does not include Claude Code. It also runs via Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. System requirements: macOS 13+, Windows 10 1809+, Ubuntu 20.04+/Debian 10+/Alpine 3.19+, 4GB+ RAM.

Long sessions stay coherent with built-in auto-compaction. Compare directly at Claude Code vs Codex and Claude Code vs Cursor.

OpenAI Codex CLI

Best benchmark ceiling. #1 on Terminal-Bench 2.1.

OpenAI's open-source agent (openai/codex, 89,991 stars, Apache-2.0). With GPT-5.5 it tops Terminal-Bench 2.1 at 83.4%. Surfaces include the CLI, an IDE extension for VS Code / Cursor / Windsurf, the Codex Web cloud agent at chatgpt.com/codex, a desktop app, and iOS, with automatic code review and Slack integration in the cloud.

83.4%
Terminal-Bench 2.1 (GPT-5.5)
$20
ChatGPT Plus /mo
5-45
Credits per GPT-5.5 message
89,991
GitHub stars (Apache-2.0)

Install

Install Codex CLI

curl -fsSL https://chatgpt.com/codex/install.sh | sh   # macOS / Linux
npm install -g @openai/codex
brew install --cask codex

codex            # run, then "Sign in with ChatGPT"
# /model         # switch model (GPT-5.4, GPT-5.3-Codex, others)

Pricing and limits

Codex requires a ChatGPT Plus, Pro, Business, Edu, or Enterprise account to sign in with ChatGPT. Per 5-hour window: Plus ($20/mo) allows 15 to 80 local messages, 5 cloud tasks, and 5 code reviews; Pro 5x allows 80 to 400; Pro 20x allows 300 to 1,600 (Pro from $100/mo). GPT-5.5 usage averages 5 to 45 credits per message. You can also auth with an OpenAI API key and pay per-token rates, with no cloud features in API-key mode.

Cursor

Best IDE flow, with a separate lower-cost agent pool.

A VS Code fork built around an agent loop. Individual plans: Pro $20/mo includes about $20 of API-rate usage; Pro+ $60/mo includes $70; Ultra $200/mo includes $400. Cursor's in-house Composer line (Composer 2.5) draws from a separate, more generous Auto + Composer pool designed for everyday agentic coding at lower cost than frontier API models. The Hobby tier is free with limited Agent requests and Tab completions, no card required. Paid plans add frontier models, MCPs, cloud agents, and Bugbot reviews on usage-based billing.

$0
Hobby tier
$20
Pro /mo (~$20 usage)
$60
Pro+ /mo ($70 usage)
$200
Ultra /mo ($400 usage)

See Cursor alternatives and Cursor vs Windsurf vs Copilot.

GitHub Copilot

Cheapest paid default. Works in every major IDE plus a CLI.

Free $0 with limited chat and agent usage plus 2,000 code completions/mo. Pro $10/mo = 1,500 credits ($15 value), Pro+ $39/mo = 7,000 credits ($70), and the new Max $100/mo = 20,000 credits ($200). Credits consume on token usage at published per-model rates: Claude Opus 4.5 through 4.8 bill at $5 in / $25 out per 1M tokens, Sonnet 4 through 4.6 at $3/$15, GPT-5.5 at $5/$30, Gemini 3.1 Pro at $2/$12. Basic code completions and next-edit suggestions never consume credits and stay unlimited on paid plans.

Install the CLI

Install GitHub Copilot CLI

npm install -g @github/copilot     # Node 22+
brew install copilot-cli
winget install GitHub.Copilot
# supports MCP servers and a /model switch

Compare at Copilot vs Claude Code and Cline vs Copilot.

OpenCode

The most-starred open source coding agent. 75-plus providers.

anomalyco/opencode (moved from sst/opencode) has 172,198 stars under MIT, ahead of Gemini CLI (105k) and OpenAI Codex (90k). Terminal-native, it supports 75-plus LLM providers via the AI SDK and the Models.dev catalog, plus local models through Ollama, LM Studio, and llama.cpp. OpenCode Zen is the team's curated, tested model list for agentic coding.

172,198
GitHub stars (MIT)
75+
Model providers
$0
Tool cost (BYOK)
local
Ollama / LM Studio / llama.cpp

Install and add a custom provider

Install OpenCode

curl -fsSL https://opencode.ai/install | bash
npm install -g opencode-ai
brew install anomalyco/tap/opencode

Custom OpenAI-compatible provider (JSON config)

{
  "provider": {
    "myprovider": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "https://api.myprovider.com/v1" },
      "models": { "my-model": {} }
    }
  }
}

Subscription backends in OpenCode

Per OpenCode's docs, ChatGPT Plus, GitHub Copilot, and GitLab Duo subscriptions are usable as model backends, while Anthropic explicitly prohibits using Claude Pro or Max subscriptions with third-party tools like OpenCode.

Cline

In-IDE open source agent with Plan and Act approval modes.

cline/cline: 62,996 stars, Apache-2.0, free, every model your choice (Claude, GPT, Gemini, any OpenAI-compatible endpoint, BYOK, or local via Ollama / LM Studio). It runs in VS Code, JetBrains (Early Access), Cursor, and Windsurf, plus a CLI installed with npm i -g cline. Local-inference RAM guidance: 16 to 32GB for small or quantized models, 32 to 64GB for mid-size coding models, 64GB+ for larger models, with the Use Compact Prompt setting recommended for local runs.

See Cline alternatives, Cline vs Cursor, and the head-to-head below.

Aider

Git-native terminal pair programming. Auto-commit per edit.

Aider-AI/aider: 45,945 stars, Apache-2.0. The terminal pair-programming pioneer that thinks in git, every edit a commit. Its last repo push was May 22, 2026, a visibly slower cadence than OpenCode or Cline, which push daily, and its model guidance still recommends 2025-era models (Gemini 2.5 Pro, DeepSeek R1/V3, Claude 3.7 Sonnet, o3/o4-mini, GPT-4.1) rather than current frontier models.

Install and run Aider

python -m pip install aider-install && aider-install
# or one-liner:
curl -LsSf https://aider.chat/install.sh | sh

aider --model sonnet --api-key anthropic=<key>
aider --model deepseek --api-key deepseek=<key>

Related: Aider vs Cline, OpenCode vs Aider, Morph vs Aider diff.

Gemini CLI and Google Antigravity

The free terminal agent (1,000 requests/day) and Google's IDE-plus-CLI harness.

google-gemini/gemini-cli: 105,104 stars, Apache-2.0. The free tier allows 60 requests per minute and 1,000 per day with a personal Google account (OAuth login serves a managed Gemini 3 mix of flash and pro; an API key lets you pin a specific model). With Gemini 3.1 Pro it scores 70.7% on Terminal-Bench 2.1.

Install Gemini CLI

npx @google/gemini-cli
npm install -g @google/gemini-cli
brew install gemini-cli
# MCP servers configured in ~/.gemini/settings.json

Google Antigravity 2.0 (announced at I/O 2026, May 19) split into a unified harness with two surfaces: a redesigned desktop app and a new standalone CLI, adding specialized subagents for parallel tasks, terminal sandboxing, credential masking, and hardened Git policies. Gemini 3.5 Flash is the new default model (Terminal-Bench 2.1 = 76.2%, described as 4x faster output than other frontier models). Google AI Pro is $19.99/mo with higher Antigravity rate limits; Google AI Ultra starts at $99.99/mo. In early June 2026 Google reset all quota counters to zero and shipped a refreshed Flash build to fix post-launch issues.

Compare at Gemini CLI vs Claude Code, Gemini CLI vs Codex, and Antigravity vs Claude Code.

Goose, Kilo Code, and Kiro

Goose

aaif-goose/goose: 48,542 stars, Apache-2.0, built in Rust, now governed by the Agentic AI Foundation at the Linux Foundation. Desktop app plus CLI plus API, 15-plus providers, and 70-plus MCP extensions. It can reuse existing Claude, ChatGPT, or Gemini subscriptions via ACP and positions itself as general-purpose: not just code, also research, writing, automation, and data analysis. Install: curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh | bash. See Goose vs Claude Code.

Kilo Code

Kilo-Org/kilocode: 19,968 stars, MIT (the domain kilocode.ai now redirects to kilo.ai). The extension is free and open source. Kilo Gateway is $0/mo plus usage at exact provider rates with no markup; Kilo Pass subscriptions run $19/$49/$199/mo with up to 50% bonus credits; Teams is $15/user/mo. BYOK works for Anthropic, OpenAI, Google, Azure, and Bedrock keys with no Kilo plan required. See Kilo Code vs Claude Code.

Kiro

AWS's IDE agent. Free $0 = 50 credits/mo with open-weight models and Claude Sonnet 4.5; Pro $20/mo = 1,000 credits; Pro+ $40/mo = 2,000; Power $200/mo = 10,000; overage $0.04/credit billed month-end, with no rollover. Team plans add centralized billing, usage analytics, and SSO via AWS IAM Identity Center. New users get $20 credited toward a first upgrade. See Kiro vs Claude Code.

Cline vs OpenCode

ClineOpenCode
GitHub stars / license62,996 / Apache-2.0172,198 / MIT
SurfaceVS Code, JetBrains, Cursor, Windsurf extension + CLITerminal-native + CLI
Model providersAny provider, BYOK, local Ollama / LM Studio75-plus via AI SDK; local Ollama / LM Studio / llama.cpp
Control modelPlan and Act modes, approval before each changePlan-first, curated OpenCode Zen model list
Pick it ifYou want the agent inside your IDE with step approvalYou want a CLI agent and the widest provider list

Both are free and BYOK. OpenCode wins on community size and provider breadth; Cline wins if you want the agent embedded in VS Code or JetBrains with explicit per-change approval. Full breakdown: OpenCode vs Cline.

Kilo Code vs OpenCode

Both are MIT-licensed and free. Kilo Code (19,968 stars) is a VS Code and JetBrains extension with BYOK and a $0/mo Kilo Gateway at exact provider rates with no markup, plus optional Kilo Pass subscriptions ($19/$49/$199 per month) and Teams at $15/user/mo. OpenCode (172,198 stars) is terminal-native with 75-plus providers and a far larger community. Choose Kilo Code for in-IDE BYOK with no markup and structured workflow; choose OpenCode for a CLI workflow and the largest provider list. See OpenCode vs Kilo Code.

Aider vs OpenCode

OpenCode (172,198 stars, MIT) pushes code daily and supports 75-plus providers. Aider (45,945 stars, Apache-2.0) is the git-native pioneer, but its last repo push was May 22, 2026 and its model guidance has not been refreshed for 2026 frontier models. Use OpenCode for active development and the widest model choice; use Aider if you specifically want its auto-commit-per-edit git workflow in the terminal. Full comparison: OpenCode vs Aider.

The model backend matters as much as the agent

Most of these agents are BYOK: OpenCode, Cline, Aider, Kilo Code, Goose, and Gemini CLI all let you point at any OpenAI-compatible endpoint. The model and the inference provider behind it set both your cost and your output quality, independent of the agent.

If you run DeepSeek or other open-weight models, where you serve them matters. Morph Open Source Models serve DeepSeek with 16-bit (bf16) activations and no fp8 or int8 quantization. Most serverless providers quantize activations to fp8 to cut cost, which degrades output; keeping full 16-bit activations means responses match the reference weights. That makes Morph the best place to run DeepSeek when output fidelity matters.

For coding agents specifically, Morph runs codegen-tuned speculative decoding (draft and ngram tuned on code) plus custom low-level inference kernels built for code generation, which makes it the fastest and highest-quality option for codegen rather than a general-purpose menu. Verified price: morph-dsv4flash (DeepSeek V4 Flash) is $0.139 per 1M input tokens and $0.278 per 1M output tokens. See pricing.

Morph DeepSeek V4 FlashTypical serverless fp8 host
Activation precision16-bit (bf16), no quantizationfp8 activations (quality loss)
Input price / 1M tokens$0.139varies
Output price / 1M tokens$0.278varies
Codegen tuningCode-tuned spec decode + custom kernelsGeneral-purpose

Frequently Asked Questions

What is the best AI coding agent in 2026?

On the public Terminal-Bench 2.1 leaderboard, Codex CLI with GPT-5.5 is #1 at 83.4%, Claude Code with Opus 4.8 is #2 at 78.9%, and Gemini CLI with Gemini 3.1 Pro is at 70.7%. For open source, OpenCode (172,198 stars, MIT) is the most-starred agent. Pick Codex CLI for the benchmark ceiling, Claude Code for reasoning depth (69.2% SWE-bench Pro), and OpenCode or Cline for a free, model-agnostic agent.

Cline vs OpenCode: which is better?

OpenCode has 172,198 stars (MIT) and 75-plus providers; Cline has 62,996 stars (Apache-2.0) and runs as a VS Code, JetBrains, Cursor, and Windsurf extension plus a CLI with Plan and Act approval modes. Choose OpenCode for a terminal agent with the widest provider list; choose Cline for an in-IDE agent with step-by-step approval. See OpenCode vs Cline.

Kilo Code vs OpenCode: which is better?

Both are MIT and free. Kilo Code (19,968 stars) is an IDE extension with BYOK and a $0/mo gateway at exact provider rates, plus Kilo Pass at $19/$49/$199 per month. OpenCode (172,198 stars) is terminal-native with 75-plus providers and a larger community. Pick Kilo Code for in-IDE BYOK with no markup, OpenCode for a CLI workflow.

Aider vs OpenCode: which is better?

OpenCode (172,198 stars) ships daily and supports 75-plus providers. Aider (45,945 stars) is the git-native pioneer but its last repo push was May 22, 2026 and its model guidance is not refreshed for 2026 frontier models. Use OpenCode for active development and provider choice; use Aider for its auto-commit-per-edit git workflow.

How much does an AI coding agent cost?

Open source agents (OpenCode, Cline, Aider, Kilo Code, Gemini CLI) are free as tools; you pay only for model tokens. Gemini CLI allows 1,000 free requests/day. Copilot Pro is $10/mo, Cursor Pro and Claude Code Pro start at $20/mo (Claude $17/mo billed annually), Codex needs ChatGPT Plus at $20/mo. To cut token cost on open-weight models, run DeepSeek V4 Flash on Morph at $0.139/1M input and $0.278/1M output.

Can I use my Claude subscription with a third-party agent?

No. Per OpenCode's docs, Anthropic explicitly prohibits using Claude Pro or Max subscriptions with third-party tools like OpenCode. ChatGPT Plus, GitHub Copilot, and GitLab Duo subscriptions can be used as backends in those tools. Claude Code itself requires a Pro, Max, Team, Enterprise, or Console (API) account.

SWE-bench vs Terminal-Bench: what is the difference?

Terminal-Bench 2.1 tests an agent driving a terminal end to end (Codex CLI + GPT-5.5 leads at 83.4%). SWE-bench Pro and Verified test fixing real GitHub issues (Opus 4.8 leads SWE-bench Pro at 69.2%). Read scores as the agent-plus-model pair, because the same model scores differently in different agents. See context engineering for why scaffolding changes outcomes.

Run your coding agent on faster, full-precision models

Any BYOK agent can point at Morph. Serve DeepSeek and open-weight models at 16-bit precision with codegen-tuned inference, and pair it with WarpGrep semantic search at $0 for 100k requests.