Lead story
Anthropic Locked Down Its Most Dangerous AI. Then It Leaked Anyway.
Anthropic spent weeks telling the world that Claude Mythos — its new vulnerability-hunting AI model — was too powerful to release publicly. The company handed early access only to a vetted coalition of tech giants: Apple, Microsoft, Google, Amazon, and a handful of others. The whole point was to give defenders a head start before the model reached adversarial hands. That plan lasted about as long as it took to announce it.
According to Bloomberg, a small group of unauthorised users has had access to Mythos since the day Anthropic first went public about its controlled rollout. The model, which reportedly found more software vulnerabilities than any prior AI system tested, was already circulating beyond the intended ring-fence by the time the press releases hit inboxes. The Verge described the situation bluntly: "humiliating."
The irony is sharp. Anthropic's entire justification for the restricted release was that Mythos was uniquely dangerous — so capable at finding and explaining exploitable bugs that putting it in the open could meaningfully accelerate attacks before defenders had patched anything. That framing now cuts both ways. If the model is as powerful as claimed, someone outside the approved coalition had it on day one. If it isn't, the whole theatrical rollout looks like a PR exercise that misfired.
There's a broader lesson here about how AI labs are trying to manage dual-use risk. Anthropic's approach — controlled access, trusted partners, coordinated patching — is the right instinct. It mirrors how governments handle sensitive intelligence or how the security community manages zero-day disclosure. The problem is that the model for managing AI capability leakage is far less mature than the one for managing software vulnerabilities. There's no CVE process for a leaked language model, and once access credentials spread, you can't un-ring the bell.
Meanwhile, a separate but related story is playing out on the offensive AI research side. Palo Alto Networks published findings on a proof-of-concept they call Zealot — a multi-agent AI system capable of running a full cloud attack autonomously, from initial reconnaissance through to data exfiltration, with minimal human direction. Researchers noted the system moved faster than human defenders could respond and showed more autonomous decision-making than the team expected. It's a PoC, not a weapon in active use — but the gap between PoC and deployment has been shrinking for years.
And while all this was happening, Chinese security firm 360 Digital Security Group claimed its own AI system had uncovered more than 1,000 vulnerabilities, including demonstrations at the Tianfu Cup hacking competition. SecurityWeek noted the claims drew direct comparisons to Mythos, suggesting the race to build AI-powered offensive research tooling is genuinely multinational.
The through-line across all three stories is the same: AI is compressing the exploit window on both sides of the fence. Defenders can find bugs faster. Attackers — or their tools — can find and exploit them faster too. And when the system designed to give defenders an edge leaks before the patching window even opens, the whole asymmetry tips the wrong way.
What to watch: whether Anthropic publicly acknowledges how the Mythos access controls failed, and whether the vetted-coalition model survives as a credible framework for future capability releases. If it doesn't, the next option is either full public release — with all the risks that implies — or indefinite internal lockdown.
