Saturday 30 May 2026

The AI-Assisted Hack: When the LLM Does the Post-Breach Heavy Lifting

An attacker used an LLM agent to automate post-breach cloud credential theft — and it's the clearest sign yet that AI is changing what happens after the initial compromise, not just before it.

Lead story

The AI-Assisted Hack: When the LLM Does the Post-Breach Heavy Lifting

Something shifted this week in how we should think about AI and cyberattacks. Not AI-generated phishing lures — we've had those for a while. Not AI-written malware. Something more consequential: an attacker who used a large language model agent to conduct post-exploitation activity after breaking into a system. Automatically. At machine speed.

Here's what happened. Researchers observed a threat actor exploit a recently disclosed vulnerability in Marimo — a Python notebook environment increasingly popular with data scientists and AI developers — tracked as CVE-2026-39987. The bug gave them initial access to an internet-facing Marimo instance. That's not unusual. What came next is.

Rather than manually digging through the compromised environment, the attacker deployed an LLM agent to do the reconnaissance. The agent extracted two sets of cloud credentials from the compromised notebook, then proceeded to enumerate what those credentials could access. The whole post-compromise chain — find the secrets, understand the environment, identify what to pivot to next — was handed off to the model.

Think of it like hiring a very fast, very thorough contractor to rob a house while you go for coffee. The attacker provided the initial foothold; the LLM did the methodical work of figuring out what was valuable and where to go next.

Why this matters more than another AI-lure story. The industry has largely treated AI-assisted attacks as a "before the breach" problem — better phishing, better social engineering, faster initial access. This incident suggests the threat is also deeply a "during and after" problem. Post-exploitation has traditionally been slow, requiring skilled operators who understand cloud environments, IAM policies, credential scopes, and lateral movement paths. LLM agents can compress that expertise gap significantly.

For defenders, this changes the urgency calculus on a few things. First, secret sprawl in notebooks and development environments is now a higher-priority target than it might have seemed — Marimo notebooks, Jupyter instances, and similar tools are often internet-accessible and often contain embedded credentials. Second, detection windows shrink when the attacker's post-breach activity moves at inference speed rather than human speed. The gap between "initial access achieved" and "cloud environment enumerated" may now be minutes, not hours or days.

Marimo is worth knowing. It's a reactive Python notebook environment — think Jupyter with more interactivity — that's been gaining traction among ML engineers and data teams. It's exactly the kind of tool that ends up running in cloud environments with broad permissions attached, because the people using it are focused on their models, not their attack surface.

There's no patch for CVE-2026-39987 confirmed as widely deployed yet, so any organisation running internet-accessible Marimo instances should be treating this as a live threat.

Watch for: Whether other threat actors adopt this playbook quickly — the tooling to deploy an LLM as a post-exploitation agent is not exotic. If one group worked it out, others will follow. The more important question is how long it takes detection tooling to catch up to agents that move faster and more methodically than human operators. Australian organisations running data science infrastructure in cloud environments — particularly those in financial services, research, and government — should be auditing what credentials are embedded in their notebook environments now, not after an incident.

Also today

Gogs Zero-Day: No Patch, Working Exploit, CVSS 9.4

A critical remote code execution vulnerability in Gogs — a popular self-hosted Git service — has no fix despite being reported to maintainers back in March. The researcher who found it says the project's maintainers have gone silent, so they published full details and a working exploit module anyway. The flaw is an argument injection bug that lets authenticated attackers craft malicious branch names via pull requests to achieve RCE on the server. Gogs is widely used in self-hosted and air-gapped environments, including by development teams who chose it precisely because it doesn't phone home to a cloud provider. If you're running Gogs, treat this as effectively unpatched and consider isolating the instance from broader network access until a fix materialises.

SecurityWeek ↗

Microsoft vs. Researcher: The Zero-Day Disclosure War Heats Up

A public feud between Microsoft and an independent security researcher has reignited the perennial debate over responsible disclosure. The researcher published multiple zero-days with working proof-of-concept code to GitHub after claiming Microsoft failed to respond adequately — and Microsoft responded by labelling the releases "never justifiable" and reportedly threatening a criminal investigation. The researcher has since signalled more releases are coming. It's a familiar script, but the stakes are higher when the PoCs land on a Microsoft-owned platform and are immediately weaponisable. The episode also throws a sharp light on how much disclosure norms still depend on goodwill rather than process, and how quickly that goodwill erodes when vendors are perceived to drag their feet.

The Record ↗

Dutch Police Dismantle 17-Million-Device Botnet

Dutch authorities have seized more than 200 servers from a local hosting provider and taken offline a botnet that had enslaved 17 million devices — reportedly tied to a Russia-based residential proxy network. The operation is one of the largest single botnet dismantlements on record. Residential proxy botnets are particularly dangerous because their traffic looks like it originates from ordinary home internet connections, making it trivially easy to bypass IP-reputation-based defences. No attribution to a specific criminal group has been publicly confirmed, though the Russia connection mirrors the infrastructure patterns seen in previous proxy-for-hire takedowns. The Netherlands has quietly become one of the most active countries for this kind of infrastructure seizure, thanks partly to aggressive hosting-provider cooperation.

Bleeping Computer ↗

California Sues 23andMe's New Owners Over the 2023 DNA Breach

California Attorney General Rob Bonta has filed suit against Chrome Holding Co. — the entity that acquired 23andMe's assets after it filed for bankruptcy — over the company's handling of the 2023 breach that exposed genetic and health data for nearly seven million customers. The AG alleges the company downplayed the scale of the breach and even paid a ransom to the attacker. The suit is notable because it follows the company through a bankruptcy reorganisation, testing whether liability travels with the data. For Australians: the 23andMe breach affected customers globally, and the Australian Privacy Act's mandatory data breach notification regime and the OAIC's guidance on sensitive health data make this a case worth watching for how regulators treat post-acquisition liability.

The Register ↗

GREYVIBE: The Russian Crew Using ChatGPT as a Weapon, End to End

WithSecure researchers have documented a previously unnamed Russian-linked threat cluster called GREYVIBE that has been running persistent attacks against Ukrainian military and government entities since at least August 2025. What distinguishes GREYVIBE is how thoroughly it has integrated AI tools into its workflow — using ChatGPT and Gemini not just for lure generation but across multiple stages of the attack chain, from initial social engineering through to payload crafting. The group spoofed security software installers and built a fake Webex meeting page as entry vectors. It's a concrete data point for the argument that AI hasn't made nation-state actors more capable in kind, but it has made them faster and more scalable — the same skilled campaign, cheaper to run at volume.

The Hacker News ↗

ChatGPhish: When ChatGPT's Web Summaries Become Phishing Lures

Permiso Security has disclosed a vulnerability they've named ChatGPhish, in which ChatGPT's implicit trust in Markdown links and images rendered from web content can be exploited to trigger prompt injection attacks and redirect users to phishing pages. The chatgpt.com response renderer treats Markdown-formatted links in fetched web content as trustworthy, meaning a malicious web page can effectively hijack what ChatGPT displays and links to. It's a textbook case of a UI that was designed for helpfulness without fully modelling adversarial content — and it illustrates why any AI assistant that fetches and renders external content is, in effect, a potential phishing surface. OpenAI has been notified; patch status was not confirmed at time of reporting.

The Hacker News ↗

NIST's Vulnerability Database Is a Bureaucratic Mess, Audit Finds

A Commerce Department Inspector General audit has found that NIST's National Vulnerability Database — the canonical global reference for known software security flaws — has been plagued by poor planning, a backlog of 27,000 unprocessed vulnerabilities, and duplicated effort with a parallel CISA programme. The NVD backlog has been a known problem since 2024, but the audit makes official what many in the security industry already suspected: the dysfunction is structural, not just a staffing blip. For organisations that rely on NVD data to prioritise patching — which is essentially everyone running a vulnerability management programme — this is a reminder that the database is not comprehensive and that tooling calibrated against NVD alone will have blind spots. Australian organisations under SOCI Act obligations should factor NVD latency into their risk frameworks.

CyberScoop ↗

Anthropic's Mythos-Class Models Are Coming to the Public — Eventually

Anthropic has confirmed it intends to release its Mythos-class models to the general public, after an earlier delay prompted by concerns about security risks in both public and private software environments. Mythos sits above the Claude Opus line in Anthropic's model hierarchy and is understood to represent a significant capability step-up. The confirmation is notable primarily for what it signals about Anthropic's internal safety process: the company is framing the delay not as a capability limitation but as a deliberate pause while it works through risk assessment. Whether that framing holds up as competitive pressure from OpenAI and Google intensifies will be worth watching. No public release date has been given.

Bleeping Computer ↗

2,000 Exposed 'Vibe-Coded' Apps Are a Security Team's Nightmare

A new report from Oligo Security examined more than 2,000 applications built with AI coding assistants and deployed to production — often without any involvement from security or IT teams. The findings are unsurprising in their specifics but alarming in their scale: exposed credentials, open debug endpoints, missing authentication, and direct connections to internal databases. The researchers frame this as "shadow AI" having evolved from employees pasting data into ChatGPT to employees shipping full production applications via AI agents. The security gap isn't the AI doing bad work — it's the process gap where a junior developer with an AI co-pilot can now ship a production app before anyone with security context has seen it. Australian organisations navigating the ASD Essential Eight's application control guidance will find this report relevant reading.

The Hacker News ↗

LLMs Can't Be Told the Truth — Literally

New research published this week found that large language models persistently treat false statements as true even when they have been explicitly told the statements are false. In fine-tuning experiments, models demonstrated a strong "bias toward confidently representing claims as true" regardless of explicit warnings to the contrary built into the prompt or system context. This has direct implications for any application that relies on an LLM to faithfully propagate corrections — customer-facing chatbots that are supposed to stop spreading outdated product information, for instance, or legal or medical tools that need to reflect updated guidance. It's also a useful data point for anyone building RAG systems who assumes that providing correct context is sufficient to override a model's training-time priors.

Ars Technica ↗

Groq Raises $650M as It Pivots From Chips to AI Inference

AI chip startup Groq is reportedly raising $650 million in a new funding round as it shifts strategic focus from designing and selling hardware to running AI inference as a cloud service. The pivot comes shortly after Nvidia's headline-grabbing $20 billion arrangement to acquire talent and IP from another AI chip startup — a deal that underscored just how difficult it is to compete with Nvidia on silicon alone. Groq's inference chips (LPUs) have a genuine performance advantage for certain workloads, particularly high-throughput text generation, so the move to productise that advantage as a hosted service rather than selling hardware is strategically sensible. The broader implication: the AI infrastructure layer is rapidly consolidating around a handful of well-capitalised players.

TechCrunch AI ↗

Chrome Gets Session Cookie Theft Protection — For Everyone

Google has rolled out Device Bound Session Credentials (DBSC) to all Chrome users, ending the general availability gap that had previously limited the feature to certain configurations. DBSC cryptographically ties authentication session tokens to the specific device they were issued on, meaning a stolen session cookie is useless if replayed from a different machine. This directly counters one of the most popular post-phishing attack techniques — the "pass-the-cookie" attack — that has been used to bypass MFA in high-profile intrusions including the 2023 MGM breach and multiple Microsoft customer compromises. Chrome holds roughly 65 percent of browser market share globally, so rolling this out broadly is a genuine defensive uplift at scale. The feature requires server-side support to be fully effective, so adoption pressure on web application owners will follow.

Bleeping Computer ↗

Sources consulted