TL;DR
GPT-5.3-Codex is OpenAI's top coding model. It merges GPT-5.2-Codex's code generation with GPT-5.2's reasoning, runs 25% faster, and uses 2-4x fewer output tokens than Opus 4.6 on equivalent tasks. A week after launch, OpenAI shipped Codex-Spark on Cerebras WSE-3 at 1,000+ tok/sec, their first production workload off Nvidia hardware.
Two models, not one
"Codex 5.3" refers to two distinct models. GPT-5.3-Codex is the full reasoning model with a 400K context window. GPT-5.3-Codex-Spark is a smaller, distilled variant running on Cerebras hardware at 15x the speed but with a 128K context window and less reasoning depth. Spark is in research preview for ChatGPT Pro subscribers only.
Key Specs
| Spec | GPT-5.3-Codex | GPT-5.3-Codex-Spark |
|---|---|---|
| Release date | February 5, 2026 | February 12, 2026 |
| Context window | 400,000 tokens | 128,000 tokens |
| Inference speed | ~65 tok/sec (standard) | 1,000+ tok/sec |
| Hardware | Nvidia GPUs | Cerebras WSE-3 |
| Architecture | Full GPT-5.3 reasoning | Distilled, speed-optimized |
| Multimodal | Text + code | Text only |
| API input price | $1.75 / 1M tokens | Research preview only |
| API output price | $14.00 / 1M tokens | Research preview only |
| Availability | ChatGPT Plus/Pro, API, CLI | ChatGPT Pro only |
OpenAI describes Codex 5.3 as moving from "an agent that can write and review code" to "an agent that can do nearly anything developers and professionals can do on a computer." The model handles long-running tasks involving research, tool use, and complex execution. You can steer and interact with it while it works without losing context.
Benchmark Results
Codex 5.3 sets new highs on Terminal-Bench 2.0 and SWE-bench Pro. It also shows strong results on OSWorld-Verified and GDPval, two benchmarks that test real-world computer use and professional knowledge work.
| Benchmark | GPT-5.3-Codex | Claude Opus 4.6 | Notes |
|---|---|---|---|
| Terminal-Bench 2.0 | 77.3% | 65.4% | Codex leads by 11.9 points |
| SWE-bench Pro Public | 56.8% | 55.4% | Codex leads by 1.4 points |
| SWE-bench Verified | N/R | 80.8% | OpenAI reports Pro, not Verified |
| OSWorld-Verified | 64.7% | 72.7% | Opus leads by 8 points |
| GDPval | 70.9% | N/R | 44-occupation professional tasks |
Benchmark context
OpenAI reports SWE-bench Pro Public (56.8%). Anthropic reports SWE-bench Verified (80.8%). These are different problem sets with different difficulty levels. Direct comparison across them is not valid. On the one benchmark where both report (SWE-bench Pro), Codex 5.3 edges Opus 4.6 by 1.4 points (56.8% vs 55.4%). On Terminal-Bench 2.0, the only other apples-to-apples comparison, Codex leads by 11.9 points. Opus wins SWE-bench Verified at 80.8%.
Token Efficiency
Codex 5.3 achieves its SWE-bench Pro scores with fewer output tokens than any prior model. On equivalent tasks, it uses 2-4x fewer tokens than Opus 4.6. This matters for cost: fewer tokens at $14/M output is often cheaper than more tokens at lower per-token rates.
What Terminal-Bench 2.0 Measures
Terminal-Bench tests real-world terminal tasks: system administration, deployment scripts, file manipulation, debugging shell pipelines. Codex 5.3's 77.3% represents a 13.3-point jump from GPT-5.2-Codex's 64%. This is the largest single-generation improvement on this benchmark.
What OSWorld-Verified Measures
OSWorld tests the ability to use a full computer environment: browsers, file managers, terminal, and desktop apps. Humans score ~72% on this benchmark. Codex 5.3 hits 64.7%, a 26.5-point jump from GPT-5.2-Codex. Opus 4.6 scores 72.7%, matching human performance.
Codex-Spark on Cerebras
On February 12, one week after Codex 5.3 launched, OpenAI released GPT-5.3-Codex-Spark. It is the first OpenAI model running on non-Nvidia hardware in production, deployed on Cerebras Wafer-Scale Engine 3 chips with 4 trillion transistors per wafer.
What Spark is
A distilled, smaller version of Codex 5.3 purpose-built for low-latency code generation. It runs at 1,000+ tok/sec on Cerebras WSE-3, producing more capable responses than GPT-5.1-Codex-mini while completing tasks in a fraction of the time. Text-only, 128K context.
What Spark is not
Not a replacement for full Codex 5.3. Spark trades reasoning depth for throughput. It is designed for real-time coding feedback, inline completions, and rapid iteration. For complex multi-file refactoring or long-horizon agent tasks, full Codex 5.3 or Opus 4.6 is the better choice.
Why Cerebras?
The WSE-3 is a wafer-scale chip designed for inference with minimal memory bottlenecks. Cerebras can run the entire Spark model on-chip without the memory-transfer overhead that limits GPU-based inference speed. This is a hardware architecture advantage, not just a clock speed difference. OpenAI choosing Cerebras for a production model signals a strategic move toward hardware diversification.
Availability
Codex-Spark is in research preview for ChatGPT Pro ($200/mo) subscribers only. It is not available via the API at launch. Cerebras expects to bring this inference capability to larger frontier models later in 2026, including longer context lengths and multimodal inputs.
Architecture and Capabilities
Codex 5.3 is built on the GPT-5 architecture. OpenAI has not published parameter counts or detailed layer configurations. What they have disclosed: the model packs more reasoning capability per byte than predecessors, focusing on cognitive density over raw parameter count.
Key Capabilities
Long-Horizon Agent Tasks
Codex 5.3 handles multi-step tasks involving research, tool use, and complex execution. You can interact with the model mid-task without losing context. The Codex macOS app runs each task in an isolated cloud sandbox with its own container.
Cloud Sandbox Isolation
Each Codex task runs in its own cloud container. Internet access is disabled by default for security. The model can read and write files, run tests, and execute code in isolation. This makes it safe for autonomous, unattended execution on production codebases.
Self-Bootstrapping
Codex 5.3 is the first model that was instrumental in creating itself. The Codex team used early versions to debug training, manage deployment, and diagnose evaluation results. This recursive capability signals a qualitative shift in model development.
Professional Knowledge
70.9% on GDPval across 44 occupations. Codex 5.3 goes beyond code: creating presentations, writing reports, managing spreadsheets, and handling system administration. It combines coding and professional knowledge in one model.
Cybersecurity Rating
Codex 5.3 is the first OpenAI model rated "high" for cybersecurity under OpenAI's Preparedness Framework. This activated additional safeguards. Fortune reported that the model "raises unprecedented cybersecurity risks" due to its ability to autonomously research, plan, and execute complex multi-step operations in sandboxed environments.
Pricing and Access
Codex 5.3 is available through ChatGPT subscriptions, the Codex CLI, the macOS app, the VS Code extension, and the OpenAI API.
| Access Method | Price | Limits |
|---|---|---|
| ChatGPT Plus | $20/month | 30-150 messages per 5-hour window |
| ChatGPT Pro | $200/month | 300-1,500 messages per 5-hour window |
| API (input) | $1.75 / 1M tokens | Standard rate limits |
| API (output) | $14.00 / 1M tokens | Standard rate limits |
| Codex CLI | Included with ChatGPT plan | Uses plan message allocation |
| Codex-Spark | ChatGPT Pro only ($200/mo) | Research preview, no API |
For a detailed breakdown of every plan, limit, and hidden cost, see our Codex pricing guide.
API vs subscription
The API makes sense if you need programmatic access, custom system prompts, or integration into CI/CD pipelines. The ChatGPT subscription makes sense for interactive use. At $14/M output tokens, a typical 10K-token response costs $0.14. Heavy users generating 100+ responses per day will find the $200/mo Pro plan cheaper.
Codex 5.3 vs Opus 4.6
Both models launched within hours of each other on February 5, 2026. They represent fundamentally different design philosophies: Codex optimizes for speed, token efficiency, and autonomous execution. Opus optimizes for reasoning depth, multi-file understanding, and deterministic outputs.
| Dimension | GPT-5.3-Codex | Claude Opus 4.6 |
|---|---|---|
| Terminal-Bench 2.0 | 77.3% | 65.4% |
| SWE-bench Pro | 56.8% | 55.4% |
| OSWorld-Verified | 64.7% | 72.7% |
| Context window | 400K tokens | 1M tokens (beta) |
| Token efficiency | 2-4x fewer tokens | Baseline |
| Speed | 25% faster than 5.2 | Slower, more deliberate |
| Subagent model | Cloud sandbox per task | Agent Teams with shared tasks |
| Hardware | Nvidia + Cerebras (Spark) | Nvidia / AWS |
The pattern: Codex wins on execution dimensions (terminal tasks, speed, token efficiency). Opus wins on understanding dimensions (reasoning, multi-file refactoring, context capacity). Neither dominates across all benchmarks. Your workflow determines which matters more.
For the full deep-dive comparison with subagent architecture analysis, usage limits, and a decision framework, see our Codex vs Claude Code comparison.
Best Use Cases for Codex 5.3
Autonomous task execution
Codex 5.3's cloud sandbox model is built for fire-and-forget. Specify the outcome, set up tests with clear pass/fail criteria, press go. Come back later. The sandbox isolation means it cannot break your local environment.
Terminal and DevOps work
77.3% on Terminal-Bench 2.0, the highest of any model. System administration, deployment scripts, CI/CD pipelines, debugging shell scripts. This is where Codex 5.3 has the widest lead over competitors.
Rapid prototyping
25% faster than 5.2, 2-4x fewer tokens than Opus. When you need to iterate quickly on ideas, Codex 5.3's speed advantage compounds across dozens of prompts per session.
Real-time feedback (Spark)
Codex-Spark at 1,000+ tok/sec enables near-instant inline completions and real-time code review. For pairing-style workflows where latency matters more than reasoning depth, Spark is the right choice.
Where Codex 5.3 Is Not the Best Choice
- Complex multi-file refactoring: Opus 4.6's 1M context window and higher SWE-bench Verified score (80.8%) make it better for tasks that require understanding relationships across many files.
- Tasks requiring deterministic output: Community reports note Codex can produce different results for the same prompt. Opus is more consistent.
- Niche or domain-specific languages: Opus has historically performed better on less common programming languages and frameworks.
Limitations
Known limitations
- Cybersecurity risk: First model rated "high" under OpenAI's Preparedness Framework. The autonomous execution capability introduces risks that triggered additional safeguards.
- Consistency variance: Multiple developers report that the same prompt can produce different quality results across runs. Less deterministic than Opus 4.6.
- Internet disabled in sandbox: Cloud sandbox tasks run without internet access for security. This limits use cases that require fetching external resources during execution.
- Spark is research preview: Codex-Spark is only available to ChatGPT Pro ($200/mo) subscribers. No API access. The 128K context window is limiting for large codebases.
- Still needs human oversight: Despite strong autonomous capability, Codex 5.3 still requires human review for architecture decisions, security boundaries, and dependency updates.
- Smaller context than Opus: 400K tokens vs Opus 4.6's 1M. For very large codebases, this is a real constraint.
Frequently Asked Questions
What is Codex 5.3?
GPT-5.3-Codex is OpenAI's most capable coding model, released February 5, 2026. It combines coding and general reasoning in one model, runs 25% faster than its predecessor, and scores 77.3% on Terminal-Bench 2.0. It powers the Codex CLI, the Codex macOS app, and is available via the OpenAI API.
What is Codex-Spark?
A distilled version of Codex 5.3 running on Cerebras WSE-3 hardware at 1,000+ tokens per second. It is 15x faster than standard Codex but has a smaller 128K context window and less reasoning depth. Available in research preview for ChatGPT Pro subscribers.
How much does Codex 5.3 cost?
Through ChatGPT: $20/mo (Plus) or $200/mo (Pro) with message-based limits. Via API: $1.75/M input tokens and $14/M output tokens. Codex-Spark requires the $200/mo Pro plan. See our full pricing breakdown.
How does Codex 5.3 compare to Opus 4.6?
Codex leads Terminal-Bench 2.0 by 11.9 points, SWE-bench Pro by 1.4 points (56.8% vs 55.4%), and uses 2-4x fewer tokens. Opus leads SWE-bench Verified (80.8%), OSWorld-Verified by 8 points, and has a 1M token context window (2.5x Codex's 400K). Codex is faster and cheaper per task. Opus is more thorough and consistent. See our full comparison.
Is Codex 5.3 available via API?
Yes. The model ID is gpt-5.3-codex. Pricing is $1.75/M input and $14/M output. Codex-Spark is not available via API, only through ChatGPT Pro.
What context window does Codex 5.3 have?
400,000 tokens for GPT-5.3-Codex. 128,000 tokens for GPT-5.3-Codex-Spark. The Codex CLI effective context is slightly below the raw 400K due to system prompt overhead.
Can Codex 5.3 run autonomously?
Yes. The Codex macOS app and CLI support autonomous execution in cloud sandboxes with internet disabled for security. You specify the task, set pass/fail criteria, and let it run. Human review is still recommended for production-critical changes.
What hardware does Codex-Spark run on?
Cerebras Wafer-Scale Engine 3 (WSE-3), a purpose-built wafer-scale chip with 4 trillion transistors. Codex-Spark is OpenAI's first production model deployed on non-Nvidia hardware.
Related Articles
Use WarpGrep to Get Better Context into Codex 5.3
WarpGrep is an agentic code search tool that runs as an MCP server. Connect it to any Codex-powered workflow for high-precision codebase context, so Codex 5.3's 400K token window gets filled with the right code, not noise.
Sources
- OpenAI: Introducing GPT-5.3-Codex (February 5, 2026)
- OpenAI: Introducing GPT-5.3-Codex-Spark (February 12, 2026)
- OpenAI: GPT-5.3-Codex System Card
- Cerebras: Introducing OpenAI GPT-5.3-Codex-Spark Powered by Cerebras
- Fortune: OpenAI's new model raises unprecedented cybersecurity risks
- GitHub: GPT-5.3-Codex generally available for GitHub Copilot