What is LLMjacking and how has it evolved?

LLMjacking originally referred to credential theft targeting AI API providers, with attackers stealing API keys and reselling access or using it for jailbroken content generation. The June 2026 Sysdig finding shows a newer use: stolen or misconfigured AI compute being wired into automated offensive security frameworks, where the model makes autonomous decisions during multi-stage attack chains — not just generating outputs on demand.

Why is Ollama specifically targeted for this kind of abuse?

Ollama runs local AI models and by default exposes its REST API on port 11434 with no authentication. Misconfigured instances that are internet-accessible have been a target of opportunistic scanning for some time. The newer development is using that exposed compute as the reasoning engine for an automated attack pipeline, rather than simply consuming it for generation tasks.

What should organisations running self-hosted AI infrastructure do?

Bind Ollama and similar local model servers to localhost or to a specific internal interface — never expose them on 0.0.0.0 without authentication. Network-layer controls should block port 11434 from external access. Organisations should scan for exposed AI model servers using tools like Shodan or internal network scanners, and treat any internet-facing model server without authentication as a critical misconfiguration.

LLMjacking Evolved: Stolen Compute Powers Autonomous Offensive Tools

On 12 June 2026, Sysdig’s Threat Research Team caught a threat actor using a misconfigured Ollama server as the reasoning engine for an automated, multi-stage offensive hacking tool. The discovery marks a meaningful escalation in the LLMjacking threat landscape. Where earlier LLMjacking attacks were about credential theft and compute arbitrage, this is about weaponisation: stolen or exposed AI infrastructure being used to autonomously develop and iterate on offensive tooling.

What Sysdig Found

The team identified an exposed Ollama server on port 11434 that had been integrated into a framework they call VAPT (an acronym for the tool’s stated purpose of automated vulnerability assessment). The framework is not a static script. It runs multi-stage attack workflows with the Ollama model making decisions at each stage: service fingerprinting, vulnerability matching against known CVEs, web application reconnaissance, proof-of-concept exploit generation, SQL injection payload crafting, secret extraction, and privilege escalation attempts.

Each stage feeds into the next. The model reads service banners and version strings, looks up matching vulnerabilities, generates exploit attempts, and checks whether execution succeeded using marker-based output validation. The result is a pipeline where human direction happens at the task level, not the execution level. Set a target, start the framework, review the outcomes.

What makes this finding particularly useful for defenders is where Sysdig caught it: the framework was in active development, being tested against a range of benchmark targets rather than real victims. The growing stage set, in-place code rewrites, and entirely private targets indicate a threat actor building and tuning a tool before deploying it operationally. Sysdig observed the development process itself.

The LLMjacking Evolution

The original LLMjacking pattern, documented from 2023 onward, involved attackers compromising cloud environments to steal AI provider API keys. The stolen credentials were used for two main purposes: resale on underground markets, and jailbroken content generation that bypassed the provider’s usage policies. The AI infrastructure was a commodity to be consumed or traded.

The VAPT framework represents a different relationship with stolen AI resources. The threat actor is not selling access or generating content for resale. They are using the compute as development infrastructure: running a model locally, iterating on an offensive tool’s codebase, testing against real services, and refining based on output. The AI capability is not the product being sold. It is the engineering tool being used to build the product.

This pattern requires a capable local model (Ollama can run models like Llama 3, Mistral, and similar open-weight models suitable for code generation and reasoning) and persistent access to the hosting environment. Misconfigured Ollama servers provide both. The threat actor pays nothing, and the legitimate server operator absorbs the compute cost.

Scale of Exposed Infrastructure

Ollama servers with no authentication exposed on port 11434 have been scannable since the software launched. Shodan queries consistently return thousands of exposed instances. The ecosystem of self-hosted AI tools, including Ollama, LM Studio, LocalAI, and others built on similar assumptions of local-network deployment, was not designed with internet exposure in mind. Default configurations bind to all interfaces without authentication.

As AI model servers have proliferated alongside the broader adoption of local LLM tooling in development, research, and enterprise AI workflows, the aggregate surface of inadvertently exposed AI compute has grown substantially. The Sysdig finding suggests that at least some threat actors have noticed.

Defensive Priorities

The immediate action is straightforward: audit your network for AI model servers on default ports. Port 11434 for Ollama, 8080 or 11434 for LocalAI, 1234 for LM Studio. Any of these exposed to the internet without authentication represent both a potential compute theft vector and, as the VAPT case illustrates, a risk of the infrastructure being used to harm others.

Bind to localhost. Ollama’s OLLAMA_HOST environment variable controls the listening interface. Set it to 127.0.0.1:11434 rather than the default, which binds to all interfaces on many configurations.

Network-layer controls. Firewall rules blocking external access to AI model server ports are a secondary control and should not substitute for correct binding configuration, but they provide defence-in-depth.

Monitor for unexpected compute usage. AI inference is GPU and CPU intensive. Unexpectedly high compute utilisation on a server running a model is an indicator of unauthorised use. Basic system monitoring should flag this.

The broader implication is that the automation of offensive security capabilities is accelerating, and AI model infrastructure is becoming part of that process, not just a target for it.

LLMjacking Evolved: Stolen Compute Powers Autonomous Offensive Tools

What Sysdig Found

The LLMjacking Evolution

Scale of Exposed Infrastructure

Defensive Priorities

References

Frequently Asked Questions