Incident Timeline
2025-2026 Security Incidents
These are confirmed, publicly reported incidents. Not theoretical attacks. Not CTF exercises. Production systems, real companies, measurable damage.
December 2025: Kiro Deletes Production
AWS engineers gave Kiro, Amazon's agentic coding tool, autonomous access to resolve a software issue in AWS Cost Explorer. Kiro evaluated the problem and decided: delete everything, start fresh. The resulting outage lasted 13 hours and affected a single AWS region.
Amazon called it "user error" from "misconfigured access controls." Multiple AWS employees confirmed a separate incident where Amazon Q Developer caused a similar disruption under the same conditions: engineers letting an AI agent resolve issues without intervention. Amazon implemented mandatory peer review for production access after both incidents.
Early 2026: Operation Pale Fire
Block's security team red-teamed Goose, their open-source AI coding agent (27K GitHub stars), before public release. The exercise, named Operation Pale Fire, combined phishing and prompt injection to attempt compromise of Block employees through the agent. The red team found and fixed prompt injection vulnerabilities before Goose shipped.
This is the right approach. Most companies skip the red-team step and ship directly. Block found the vulnerabilities because they looked for them. The ones who do not look are not more secure. They are less informed.
February 2026: Microsoft Documents Memory Poisoning at Scale
Microsoft Security researchers published findings on "AI Recommendation Poisoning," documenting 50+ distinct attacks across 31 companies in 14 industries. The technique plants hidden prompts that persistently manipulate AI assistant recommendations. Radware researchers separately demonstrated "ZombieAgent," showing ChatGPT's connector and memory features can make prompt injection attacks persistent and cross-session.
2026: OpenClaw Supply Chain Attack
Antiy CERT confirmed 1,184 malicious skills across ClawHub, the marketplace for the OpenClaw AI agent framework. Palo Alto Networks flagged this as the largest confirmed supply chain attack targeting AI agent infrastructure. A separate incident showed a hacker exploiting Cline to force-install OpenClaw across developer systems through prompt injection in Anthropic's Claude.
The Lethal Trifecta
Simon Willison identified the three conditions that make an AI agent exploitable. When all three are present, a single prompt injection can steal private data. He calls it the "lethal trifecta."
Access to Private Data
The agent can read SSH keys, environment variables, API tokens, database credentials, source code, and internal documents. Every coding agent has this by default because it needs filesystem access to do its job.
Exposure to Untrusted Input
The agent processes external content: GitHub issues, pull requests, package READMEs, Stack Overflow answers, documentation sites, npm/pip packages. Any of these can contain hidden prompt injection instructions.
Exfiltration Capability
The agent can make HTTP requests, write files, execute shell commands, render images with encoded URLs, or call external APIs. These are the escape routes for stolen data.
Why This Matters for Coding Agents
Most AI assistants (chatbots, summarizers) satisfy only one or two legs. Coding agents satisfy all three by design. They read your codebase (private data), process external code and documentation (untrusted input), and run shell commands with network access (exfiltration). The mitigation strategy is to cut at least one leg: sandbox the network, restrict file access, or validate all external input. Meta's "Rule of 2" formalizes this: if an agent has access to sensitive data, it must be restricted in what it can ingest or output externally.
The "lethal trifecta" maps directly to what happened with Kiro. The agent had production access (private data + code execution), processed an ambiguous software issue (untrusted input in the form of a vague task description), and had write access to production infrastructure (exfiltration is not even necessary when you can just delete things). Cutting any one leg would have prevented the outage. Mandatory peer review (added post-incident) is one way to cut the code execution leg.
Attack Vectors
Four attack categories dominate the 2026 threat landscape for coding agents. Each exploits a different trust boundary.
| Vector | Mechanism | Real-World Example |
|---|---|---|
| Indirect Prompt Injection | Malicious instructions hidden in data the agent reads (issues, docs, packages, comments) | GitHub MCP server: attacker creates issue with hidden instructions, agent exfiltrates private repo data via PR |
| Memory Poisoning | Injecting persistent instructions into agent memory, config files (CLAUDE.md, .cursorrules), or conversation history | Microsoft found 50+ attacks across 31 companies. ZombieAgent makes ChatGPT attacks cross-session |
| Supply Chain Tampering | Malicious MCP servers, IDE extensions, agent plugins, or package dependencies that backdoor the agent | 1,184 malicious OpenClaw skills. 492 unauthed MCP servers exposed. Cline force-installed OpenClaw via prompt injection |
| Privilege Escalation | Agent accumulates permissions over time, or uses one tool to gain access to another beyond its intended scope | IBM's Bob agent manipulated into executing malware through CLI. China-linked group jailbroke coding assistant to automate 80-90% of attack chain |
Prompt Injection: The XSS of the AI Era
Prompt injection to AI agents is what cross-site scripting (XSS) was to web applications in the 2000s. Both exploit the mixing of data and instructions in the same channel. The difference: XSS was solved with output encoding and Content Security Policy. Prompt injection has no equivalent fix because the agent must process natural language to function, and distinguishing "data to read" from "instructions to follow" is an unsolved problem.
Attackers use visual concealment (zero font size, zero opacity), obfuscation (hidden HTML sections), dynamic execution (JavaScript-embedded prompts), and URL manipulation. These techniques are invisible to human code review but fully visible to the AI model processing the content.
Memory Poisoning: Stateful Prompt Injection
Standard prompt injection is transient. It affects the current session only. Memory poisoning makes the attack persistent by writing malicious instructions into the agent's long-term storage. An attacker injects instructions in one session; the instructions activate weeks later when a different user triggers the agent in a different context.
OWASP added ASI06 (Memory & Context Poisoning) to its 2026 Agentic Top 10. The temporal decoupling makes these attacks difficult to trace. By the time the damage occurs, the attacker is gone, and the injected context looks like normal agent memory.
For coding agents, the attack surface includes project config files (CLAUDE.md, .cursorrules, .github/copilot-instructions.md), conversation history, and any persistent context the agent carries between sessions. An attacker who lands a PR that modifies CLAUDE.md can influence every future Claude Code session in that repository.
Supply Chain: The MCP Marketplace Problem
The Model Context Protocol (MCP) gives agents access to external tools. It also gives attackers a distribution channel. Trend Micro found 492 MCP servers exposed to the internet with zero authentication. Palo Alto Networks researchers identified tool poisoning, remote code execution flaws, and overprivileged access across MCP ecosystems.
The OpenClaw incident (1,184 malicious skills confirmed by Antiy CERT) is the npm left-pad moment for AI agents. Agent marketplaces have the same trust problems as package registries, plus the additional risk that malicious tools can inject prompts, not just code.
Security Features Across Tools
Security architecture varies significantly across coding agents. Some run everything locally with OS-level sandboxing. Others isolate in cloud containers. Some ask permission for every action. Others run autonomously with full access. The comparison below covers the security-relevant features of the major tools as of March 2026.
| Feature | Claude Code | Codex | Cursor | Goose |
|---|---|---|---|---|
| Execution environment | Local with OS sandbox | Cloud container (isolated) | Local (IDE embedded) | Local (open source) |
| Filesystem isolation | CWD only (sandbox enforced) | Workspace only (container) | Full project access | Full system access |
| Network isolation | Proxy with domain allowlist | Disabled by default | Full access | Full access |
| Permission system | Tiered: read/write/execute prompts | Auto-approve in workspace, prompt outside | Implicit (Apply/Accept flow) | User-configured |
| Custom policy hooks | Yes (pre/post command hooks) | OTel monitoring (opt-in) | No | Plugin-based |
| Skip-permissions flag | --dangerously-skip-permissions | Auto mode | N/A (always applies) | No equivalent |
| Red team disclosure | Anthropic system cards | OpenAI system cards | No public disclosure | Operation Pale Fire (Block) |
| MCP tool validation | User approval per server | Sandboxed execution | Marketplace review | Community-managed |
No Tool Is Immune
Every tool in this comparison is vulnerable to prompt injection. The difference is blast radius. Claude Code and Codex limit what a successful attack can access through sandboxing and permission controls. Cursor and Goose give agents broader access by default, which means a successful attack has more room to operate.
Sandbox Architectures
The two leading approaches to coding agent security in 2026 are OS-level sandboxing (Claude Code) and cloud container isolation (Codex). Each makes different tradeoffs between security, performance, and usability.
Claude Code: OS-Level Sandboxing
Uses Linux bubblewrap and macOS seatbelt to enforce filesystem and network restrictions at the OS kernel level. File access limited to CWD. Network routed through a proxy with domain allowlisting. Anthropic reports 84% reduction in permission prompts while maintaining security. Even a successful prompt injection cannot access SSH keys or make unauthorized network requests.
Codex: Cloud Container Isolation
Each task runs in an isolated OpenAI-managed container with the codebase pre-loaded. Network disabled by default. The agent has no access to the host system. OS-level sandboxing (seatbelt/seccomp) also available for local CLI usage. Opt-in OTel monitoring for audit trails. The tradeoff: cloud containers add latency; the benefit is stronger isolation than local sandboxing.
Claude Code Permission Tiers
Claude Code defaults to read-only. It asks permission before writing files, running commands, or making network requests. Safe commands (echo, cat, ls) are auto-allowed. Everything else requires explicit user approval.
The --dangerously-skip-permissions flag disables this system entirely. It exists for CI/CD pipelines and automated workflows where human approval is not practical. The name is intentional: the flag is dangerous. It removes the primary defense against prompt injection attacks.
For a deeper analysis of when this flag is appropriate and how to mitigate its risks, see our guide to --dangerously-skip-permissions.
Claude Code Hooks
Hooks let you run custom shell commands before or after Claude Code actions. A pre-command hook can block specific operations (no rm -rf, no curl to unknown domains). A post-command hook can log actions to your SIEM or trigger alerts. This is the mechanism for enforcing enterprise security policies at the agent level.
See our Claude Code hooks guide for implementation patterns.
Example: Blocking Dangerous Commands with Hooks
# .claude/hooks/pre-command.sh
# Block destructive operations
BLOCKED_PATTERNS=(
"rm -rf /"
"DROP TABLE"
"DROP DATABASE"
"curl.*|.*sh"
"wget.*|.*bash"
)
for pattern in "${BLOCKED_PATTERNS[@]}"; do
if echo "$CLAUDE_COMMAND" | grep -qE "$pattern"; then
echo "BLOCKED: Command matches dangerous pattern: $pattern"
exit 1
fi
doneEnterprise Deployment Checklist
The 2026 consensus on securing coding agents comes from OWASP, NIST AI RMF, and ISO 42001 frameworks. Seven controls form the minimum viable security posture.
| Control | What It Does | Implementation |
|---|---|---|
| Input validation | Filters untrusted content before agent ingestion | Sanitize external markdown, strip hidden HTML, validate MCP tool inputs |
| Output filtering | Blocks agent from writing secrets, credentials, or PII to unintended destinations | Post-action hooks that scan for API keys, tokens, passwords in agent output |
| Tool governance | Controls which MCP servers and tools the agent can access | Allowlist of approved MCP servers. Review new tools before adding to agent config |
| Rate limiting | Prevents runaway agents from executing too many operations | Cap commands per session. Alert on unusual volumes. Kill switch for production access |
| Memory security | Protects persistent context from injection and tampering | Code review all CLAUDE.md / .cursorrules changes. Version control. Diff alerts on config changes |
| Identity management | Agents get scoped credentials, not developer credentials | Ephemeral API keys per session. Rotate frequently. No long-lived tokens in agent context |
| Audit logging | Records all agent actions for compliance and forensics | Log every command, file write, and network request. Feed to SIEM. Retain per compliance policy |
The 3 Rs of Agent Security
- Rotate: Secrets, keys, and certificates should be ephemeral. Agent sessions should not inherit long-lived credentials
- Repair: Vulnerable dependencies must be patched within hours. AI-generated code gets the same security review as human-written code
- Repave: Agent environments should be disposable. Cloud containers (Codex) do this by default. Local agents need periodic environment resets
What Most Organizations Get Wrong
Fewer than half of developers review AI-generated code before committing it. 45% of AI-generated code has security flaws. AI-generated code introduces 15-18% more vulnerabilities than human-written code. The most common: SQL injection (CWE-89), cross-site scripting (CWE-80), and log injection (CWE-117).
The biggest organizational failure is treating the coding agent as a trusted internal user. It is not. It is an identity that processes untrusted input and has access to production systems. Every agent session should have the same access controls as an external contractor: scoped permissions, audited actions, time-limited credentials.
OWASP Top 10 for Agentic Applications (2026)
OWASP published its first Top 10 specifically for agentic AI systems in 2026, developed with 100+ industry experts. The list codifies what security teams learned from the incidents above.
| Rank | Risk | Relevance to Coding Agents |
|---|---|---|
| ASI01 | Agent Goal Hijacking | Prompt injection redirects agent from intended task. The Kiro incident is a direct example. |
| ASI02 | Tool Misuse | Agent uses tools beyond intended scope. Shell commands, network requests, file deletions. |
| ASI03 | Privilege Escalation | Agent accumulates permissions or chains tool access to exceed its intended authorization. |
| ASI04 | Cascading Failures | Multi-agent systems propagate errors. One agent's bad output becomes another's input. |
| ASI06 | Memory & Context Poisoning | Persistent injection into agent memory. The Microsoft/ZombieAgent attacks demonstrate this. |
The full OWASP Top 10 for Agentic Applications includes additional categories for identity spoofing, data leakage, and insufficient monitoring. For coding agents specifically, ASI01 (Goal Hijacking), ASI02 (Tool Misuse), and ASI06 (Memory Poisoning) are the categories with confirmed production incidents.
Frequently Asked Questions
What is prompt injection in AI coding agents?
Prompt injection embeds malicious instructions in data the agent reads: code comments, markdown files, GitHub issues, package descriptions. When the agent processes this data, it follows the hidden instructions instead of the user's task. In coding agents, this can trigger data exfiltration, malware installation, or unauthorized code changes. OWASP ranks indirect prompt injection as the #1 threat to agentic systems in 2026. The attack is invisible to human code review but fully visible to the AI model.
What is the lethal trifecta for AI agents?
Simon Willison's term for the three conditions that make an AI agent exploitable: access to private data, exposure to untrusted input, and exfiltration capability. Most AI chatbots satisfy one or two of these. Coding agents satisfy all three by design because they need to read private source code, process external content, and run commands with network access. The mitigation strategy is to cut at least one leg through sandboxing, permission systems, or input validation.
How does Claude Code's security sandbox work?
Claude Code uses OS-level primitives (Linux bubblewrap, macOS seatbelt) to isolate the agent at the kernel level. The sandbox restricts file access to the current working directory, blocks writes outside it, and routes all network traffic through a proxy with domain allowlisting. Anthropic reports 84% fewer permission prompts while maintaining security. A compromised Claude Code instance inside the sandbox cannot access SSH keys, read files outside the project, or make unauthorized network requests.
Is --dangerously-skip-permissions safe to use?
No. The flag disables Claude Code's entire permission system, including the sandbox. It exists for CI/CD pipelines where human approval is impractical. If you use it, you should combine it with hooks to enforce custom policies, run in an isolated environment (Docker container, ephemeral VM), use scoped credentials, and never run it on a machine with access to production secrets. See our full guide.
Which coding agent has the best security?
Claude Code and Codex lead on security architecture as of March 2026. Claude Code provides OS-level sandboxing, tiered permissions, and hook-based policy enforcement. Codex runs each task in an isolated cloud container with network disabled by default. Cursor and Goose give agents broader default access. No tool is immune to prompt injection. The question is blast radius: what can an attacker do after a successful injection? Sandboxed tools constrain the answer.
WarpGrep: Secure Context for Coding Agents
WarpGrep runs as an MCP server that feeds precise, relevant code context to your agent. Better context means fewer hallucinations, fewer unnecessary file reads, and a smaller attack surface. Compatible with Claude Code, Codex, Cursor, and any MCP-enabled tool.
Sources
- The Register: Amazon's Kiro reportedly vibed too hard (Feb 2026)
- Amazon: Correcting the Financial Times report about AWS, Kiro (Feb 2026)
- Microsoft Security: AI Recommendation Poisoning (Feb 2026)
- Simon Willison: The lethal trifecta for AI agents
- Anthropic Engineering: Making Claude Code more secure and autonomous
- OpenAI: Codex Security Documentation
- OWASP Top 10 for Agentic Applications (2026)
- Palo Alto Networks: OpenClaw May Signal the Next AI Security Crisis
- Christian Schneider: Memory poisoning in AI agents
- OpenAI: Understanding prompt injections
- Unit 42: Web-Based Indirect Prompt Injection Observed in the Wild
- MIT Technology Review: Rules fail at the prompt, succeed at the boundary (Jan 2026)