Lead story
Bleeding Llama: The Ollama Flaw That Could Leak Your Entire AI Server's Memory
A critical vulnerability in Ollama — the widely used open-source tool for running large language models locally — could let an unauthenticated remote attacker read the entire process memory of an exposed server. Researchers at Cyera, who named the flaw Bleeding Llama, have disclosed the bug as CVE-2026-7482, scoring a hefty 9.1 on the CVSS scale. Best estimate of exposed servers: over 300,000 globally.
The flaw is an out-of-bounds read — meaning a crafted request can cause Ollama to return memory it was never supposed to share. In practice, that could include API keys, model weights, conversation data, or anything else sitting in the process's memory at the time of the request. No authentication required. No account needed. Just a network path to a listening Ollama instance.
Why this is a bigger problem than it first appears. Ollama is a developer favourite precisely because it makes running local AI models trivially easy. That simplicity has led a lot of teams to spin up Ollama instances on infrastructure that's more internet-exposed than they realise — cloud VMs with overly permissive security groups, internal dev boxes reachable via VPN, or self-hosted setups where "internal" is doing a lot of heavy lifting. The 300,000 exposed server estimate isn't hypothetical; it's based on active internet scans.
The memory-leak class of vulnerability is particularly nasty in this context. Unlike a remote code execution bug, there's no loud exploit — just a quiet read. An attacker can exfiltrate sensitive data repeatedly without triggering most intrusion detection systems, because the traffic looks like ordinary API responses.
What defenders should do right now. First: check whether your Ollama instances are reachable from the internet. They shouldn't be. Ollama's own documentation recommends running it behind a reverse proxy or binding it only to localhost. Second: patch as soon as a fix is available — monitor the CVE and Ollama's GitHub for a patched release. Third: audit what data is passing through your Ollama deployment, because anything in memory is potentially in scope.
The Australian angle is real. Ollama is heavily used in Australian universities, research institutions, and tech companies running private LLM deployments — often pitched as the privacy-preserving alternative to cloud AI APIs. An exposed Ollama server doesn't just leak model weights; it could leak the documents, queries, and outputs of everyone using that system. Under the Privacy Act, a breach of that kind would almost certainly constitute a notifiable data breach if it involved personal information.
What to watch. Cyera's full technical write-up and a patched Ollama release are both expected shortly. If you're an Ollama user and you haven't already confirmed your instance isn't publicly routable, that's the only task that matters this Monday morning.
The broader pattern here is worth noting. Last week we saw AI platforms abused for malware distribution. This week it's AI infrastructure carrying a critical unauth vuln. The message is consistent: the AI toolchain has grown faster than the security assumptions baked into it.
