Lead story
When Your AI Coding Agent Becomes the Attacker: The TrustFall Vulnerability
Researchers have disclosed a new class of attack called "TrustFall" that turns AI coding agents — Claude Code, Cursor CLI, Gemini CLI, GitHub Copilot CLI — into unwitting participants in supply chain compromises. The technique exploits a simple but uncomfortable truth: these tools are designed to be helpful, and helpfulness can be weaponised.
Here's how it works. An attacker embeds malicious instructions inside a repository — in a README, a config file, a comment, anywhere the agent will read. When a developer opens that project and the AI agent parses the files, the hidden instructions trigger code execution or redirect the agent's actions without the user ever knowing. The warning dialogs these tools display are, according to researchers at Adversa AI, too vague and too easily dismissed. Anthropic's official response — essentially "users shouldn't click OK without reading" — is technically true and practically useless.
The deeper problem here is architectural. AI coding agents are built to ingest context from their environment, which is exactly what makes them valuable. They read your codebase, understand your dependencies, follow your project conventions. But that same appetite for context is what makes them susceptible to prompt injection at scale. A malicious open-source package, a compromised repo, a poisoned template — any of these can become the instruction set for an agent that has broad file system access, shell execution rights, and often OAuth tokens connected to your SaaS stack.
That last point is where it gets serious. Separately, researchers at Mitiga found that Claude Code's OAuth tokens can be silently hijacked through malicious Model Context Protocol (MCP) server configurations. An attacker who redirects MCP traffic doesn't just get code execution — they can maintain persistent access to every SaaS platform the developer's Claude session was connected to. GitHub, Jira, Slack, cloud provider consoles. The token theft is passive and leaves almost no trace.
Why this matters now: AI coding agents have gone from curiosity to critical infrastructure in about 18 months. Anthropic just raised Claude Code's usage limits after signing a new deal with SpaceX, and the tool is now deeply embedded in enterprise development workflows alongside competitors from Google, Microsoft, and Cursor. The attack surface these agents represent has scaled with their adoption — but the security model largely hasn't.
For Australian organisations, the exposure is real. Claude Code, Copilot, and Gemini CLI are all widely deployed across Australian tech teams and enterprise development shops. Under the Privacy Act and SOCI Act obligations, an OAuth token compromise that grants access to cloud infrastructure or customer data systems isn't just a developer problem — it's a notifiable incident. Security teams that haven't yet reviewed what permissions their AI coding agents hold, or what repositories they're being pointed at, should treat this as a prompt.
What to watch: Whether the major vendors — Anthropic, Google, Microsoft — respond with substantive changes to their agent permission models, or whether we get another round of "use it responsibly" guidance. The TrustFall researchers argue the fix has to be at the tooling level, not the user behaviour level. They're right.
