What is LiteLLM and why are attackers targeting it?

LiteLLM is an open-source proxy server that routes API calls across multiple LLM providers, widely deployed as an AI gateway in enterprise environments and developer setups. Attackers target it because a compromised LiteLLM server provides access to the compute and capabilities of whatever models it proxies, allowing them to run their own AI-powered operations without cost and to exploit the server's credentials to reach other internal systems.

What did CVE-2026-40217 allow attackers to do?

CVE-2026-40217 is a remote code execution vulnerability in LiteLLM affecting versions through 2026-04-08, with a CVSS score of 8.8. It was discovered by X41 D-Sec and exploits the custom guardrail testing endpoint, which accepts user-controlled bytecode and executes it with insufficient restrictions. As part of a larger vulnerability chain rated CVSS 9.9, it enables an attacker starting as a low-privilege user to escalate to administrator and then execute arbitrary code on the LiteLLM server.

How do I check whether my AI infrastructure is exposed?

Start with your Ollama and LiteLLM deployments. Ollama by default binds to localhost, but cloud environments with misconfigured networking often expose port 11434 to the internet. Run a scan of your external-facing infrastructure for LiteLLM's default port (4000) and Ollama's port (11434). Neither service should be reachable from the internet without authentication. Apply immediate network controls to restrict access to trusted sources and patch LiteLLM to v1.83.14-stable or later.

Exposed AI Gateways Are Being Weaponized to Attack Third Parties

Researchers at Zenity Labs have published findings showing that attackers are systematically targeting exposed AI gateway infrastructure to power offensive operations against third parties, including autonomous penetration testing campaigns, compute theft, and data exfiltration. The research, released June 30, draws on thousands of attack attempts observed across Zenity’s global AI threat intelligence sensor network between March and May 2026.

The findings represent a significant shift in how AI infrastructure is being abused. Rather than attacking AI systems to extract training data or manipulate model outputs, these threat actors are treating exposed AI servers as free, capable compute platforms and as proxies for launching attacks elsewhere.

What Attackers Are Doing with Exposed AI Infrastructure

Zenity Labs documented three distinct campaign types against its honeypots over the research period.

The first involved autonomous offensive operations. In one observed case, a single IP address used a LiteLLM client to send a 140,000-character prompt to a hijacked server, instructing it to run Strix, an autonomous AI-powered pentesting tool, against an unidentified French auction house. The prompt included explicit instructions to “never ask for permission, run non-stop.” The attacker used the victim’s LiteLLM instance as a free execution environment, routing the tool’s requests through the server’s model access.

The second campaign type was compute theft: attackers simply using exposed endpoints to run large volumes of inference requests at the victim organisation’s expense. Given that enterprise LiteLLM deployments route to commercial API providers with billing attached to the victim’s accounts, the cost implications can be significant.

The third involved data exfiltration through CVE-2026-35029, a vulnerability in LiteLLM’s admin endpoint. Attackers exploited this to extract configuration data, including API keys and internal routing information, from the targeted server.

The Vulnerability Landscape

The research highlighted active exploitation of CVE-2026-40217, a remote code execution vulnerability in LiteLLM rated CVSS 8.8, discovered by X41 D-Sec. The flaw lies in the custom guardrail testing endpoint, which executes user-supplied bytecode with insufficient restrictions. Zenity’s sensors recorded hundreds of exploitation attempts targeting CVE-2026-40217 on the same day the patch was published, with exploitation attempts continuing across six subsequent weeks covering everything from reconnaissance probes to sandbox escape payloads.

CVE-2026-40217 is part of a larger chain, discovered by Obsidian Security, that carries an overall CVSS score of 9.9. The chain begins with a default low-privilege user account, escalates to administrator via CVE-2026-47101 and CVE-2026-47102, then uses the administrator access to trigger CVE-2026-40217 for code execution on the server. BerriAI shipped fixes across subsequent releases, with the complete fix set landing in LiteLLM v1.83.14-stable on April 25, 2026.

Separate from the LiteLLM chain, earlier research cited in the Zenity report found 175,000 Ollama AI servers publicly exposed across 130 countries, the vast majority requiring no authentication whatsoever. Ollama’s default configuration binds to localhost, but cloud environments with misconfigured security groups or VPC peering frequently expose port 11434 to the internet. These servers offer unauthenticated access to whatever models are loaded locally, making them trivial targets for compute theft.

Why This Matters Beyond the Attack Surface

The pattern Zenity is documenting reflects a maturing attacker approach to AI infrastructure. The value proposition is clear: an exposed LiteLLM or Ollama server gives an attacker access to capable models without API costs, without rate limits tied to their own accounts, and often with credentials for accessing other systems stored in the server’s configuration.

For organisations running AI gateways in development or production environments, the threat is not only direct financial loss through compute theft or credential exposure. A compromised AI gateway sitting on an internal network segment can serve as a pivot point into other internal services, and in the case of Strix-style autonomous tools, as a platform for attacking external third parties, creating legal and reputational exposure for the victim organisation.

The speed of exploitation is notable. Same-day exploitation of LiteLLM CVEs after patch publication means the window for controlled rollout is essentially non-existent. Unpatched AI infrastructure is being scanned and targeted as aggressively as any other high-value server.

What to Do Now

Patch LiteLLM immediately. Upgrade to v1.83.14-stable or any later release. The full four-CVE chain is not fixed in any earlier version.

Audit your AI server exposure. Scan for external reachability of LiteLLM (port 4000 by default) and Ollama (port 11434). Neither should be accessible from the internet without authentication.

Rotate credentials stored in LiteLLM configuration. If you cannot confirm your LiteLLM instance was patched before active exploitation began in April, treat all API keys and provider credentials stored in its configuration as compromised.

Restrict admin endpoints. LiteLLM’s admin interface should be accessible only from trusted internal addresses. Network-level controls add a layer of protection even on patched systems.

Monitor for anomalous inference usage. Unexpected spikes in API usage, requests from unusual source IPs, or calls to models outside normal operational patterns are indicators of possible compromise.

Exposed AI Gateways Are Being Weaponized to Attack Third Parties

What Attackers Are Doing with Exposed AI Infrastructure

The Vulnerability Landscape

Why This Matters Beyond the Attack Surface

What to Do Now

References

Frequently Asked Questions