Skip to content
AI Security Wire

Published

- 6 min read

By

vLLM CVE-2026-22778: Unauthenticated RCE via Malicious Video URL

img of vLLM CVE-2026-22778: Unauthenticated RCE via Malicious Video URL

A CVSS 9.8 remote code execution vulnerability in vLLM, the most widely deployed open-source LLM inference framework, allows unauthenticated attackers to achieve arbitrary code execution on inference servers by submitting a malicious video URL to any multimodal endpoint. CVE-2026-22778 is a two-stage exploit chain that combines an information disclosure flaw to defeat ASLR with a heap buffer overflow in FFmpeg’s JPEG2000 decoder. No credentials are required. The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and is patched in v0.14.1. Security teams running multimodal inference workloads should treat this as an emergency upgrade.

The Exploit Chain: Two Weaknesses, One RCE

The attack does not rely on a single flaw but chains two distinct weaknesses to reach code execution reliably.

Stage one: ASLR defeat via PIL exception leakage. When vLLM receives a malformed image or video input at a multimodal endpoint, the Python Imaging Library (PIL) raises an exception. The exception message includes a raw heap memory address, for example: cannot identify image file <_io.BytesIO object at 0x7a95e299e750>. vLLM returns this error message directly to the client without sanitisation.

The disclosed address gives the attacker a live pointer into the heap of the vLLM process. Address space layout randomisation exists to make heap addresses unpredictable from outside the process; this stage eliminates that protection entirely. An attacker can query the endpoint once with a deliberately malformed input, parse the address from the error response, and use it to calculate the layout of the heap for the subsequent overflow.

Stage two: heap overflow via JPEG2000 cdef box. vLLM’s multimodal pipeline processes video inputs through OpenCV, which delegates video decoding to FFmpeg. FFmpeg’s JPEG2000 decoder contains a flaw in how it handles the cdef (channel definition) box structure inside a JPEG2000 file. A malicious cdef box can direct luma channel data into a chroma buffer, triggering a heap-based buffer overflow. With the heap layout known from stage one, the attacker can use the overflow to corrupt adjacent function pointers and redirect execution.

The full exploit path: the attacker submits a crafted video URL to /v1/chat/completions or /v1/invocations, the vLLM process fetches the video from the attacker-controlled server, decodes it through OpenCV and FFmpeg, and the cdef overflow fires. Code execution occurs within the vLLM process with whatever privileges it is running as.

Attack Surface

vLLM’s multimodal capabilities ship in the same package as text-only inference. Any deployment that has upgraded to add vision or video model support, and has not restricted multimodal endpoint exposure, is potentially vulnerable. The attack requires:

  • Network access to the vLLM API (any interface vLLM is listening on, not just public-facing endpoints)
  • Video processing capability enabled (present when multimodal models are loaded)
  • The ability to host a malicious video file at an attacker-reachable URL

The third requirement deserves specific attention. The attacker does not embed the payload directly in the API request. Instead, vLLM fetches the video from an attacker-specified URL, which means the attacker’s infrastructure performs the actual payload delivery. This is significant for internal deployments: if the vLLM instance can make outbound HTTP requests to the internet, there is no additional network perimeter control that prevents exploitation. vLLM instances running in cloud environments with outbound internet access and no network egress restrictions are particularly exposed.

What Code Execution Means in a vLLM Context

The vLLM process has access to resources that make it a high-value target beyond the inference server itself.

Model weights and proprietary data. The vLLM process loads model weights into GPU memory and may cache inputs for KV optimisation. Access to the process means access to these weights and any cached prompt data.

Environment credentials. vLLM deployments in cloud environments commonly authenticate to object storage (for model weight downloads), container registries, and internal APIs via environment variables or instance metadata service credentials. A process-level compromise gives the attacker access to these credentials, which can be used to pivot into broader cloud infrastructure.

Internal network access. vLLM instances that serve internal workloads often sit inside a private network with access to databases, internal APIs, and other services that would not be directly reachable from outside. Code execution inside the vLLM process is equivalent to a foothold inside that network segment.

GPU resource abuse. Compromised inference infrastructure can be repurposed for cryptomining or for running the attacker’s own workloads against allocated GPU resources. LLMjacking, the pattern of abusing compromised AI compute access for unauthorised inference, has been documented against compromised inference endpoints previously; CVE-2026-22778 provides a direct path to that access.

Exploitation Evidence and Disclosure Timeline

CVE-2026-22778 was identified and responsibly disclosed by security researchers to the vLLM project. Kodem Security and Orca Security have both published technical analyses with proof-of-concept level detail. Public disclosure occurred with the release of the patch, meaning detailed exploitation guidance is now publicly available.

The specific JPEG2000 channel definition overflow technique is well-understood. Tooling to automate the exploit chain, including the two-stage ASLR defeat, should be assumed to be in development or circulation in exploit kit communities. The window for unpatched deployments to be actively targeted is narrow.

Remediation and Interim Controls

Primary mitigation: upgrade to vLLM v0.14.1 immediately. The patch addresses the PIL exception leakage by suppressing raw memory addresses from error responses, and updates the JPEG2000 decoder dependency to a version that correctly validates cdef box contents.

   pip install --upgrade "vllm>=0.14.1"

For container-based deployments, pull the 0.14.1 or later image. Do not rely on OS-level FFmpeg package updates alone; the fix requires both the vLLM-level error sanitisation and the decoder patch.

Interim controls for deployments that cannot immediately upgrade:

Block video_url parameters at the API gateway or reverse proxy layer. This prevents multimodal video inputs from reaching the vLLM process without disabling the rest of the API:

   location /v1/chat/completions {
    # Block requests containing video_url in body
    if ($request_body ~* "video_url") {
        return 403;
    }
    proxy_pass http://vllm_backend;
}

Restrict the vLLM process’s outbound network access. If the deployment does not require vLLM to fetch video content from external URLs, firewall rules that block outbound HTTP from the inference server to the internet eliminate the video delivery mechanism while leaving the text API functional.

Apply network-level restrictions on who can reach the vLLM API. Internal-only deployments should not be reachable from outside the VPC or internal network segment. This does not eliminate risk from internal attackers or from pivot scenarios, but it reduces the attack surface for external exploitation.

Detection

Monitor vLLM API logs for error responses containing hex addresses in the format 0x[0-9a-f]+. A pattern of these errors from the same source IP, followed by subsequent requests to multimodal endpoints, is consistent with the two-stage exploitation pattern. Unexplained outbound HTTP connections from the vLLM host to external servers hosting video content are also an indicator, particularly in deployments where no legitimate video processing workloads exist.

SIEM rules that flag process creation or shell execution from the vLLM process owner are appropriate for post-exploitation detection if the initial exploit succeeds.

References

Frequently Asked Questions

What makes CVE-2026-22778 particularly dangerous for vLLM deployments?
The vulnerability requires no authentication and no interaction from a legitimate user. Any attacker who can reach a vLLM multimodal endpoint over the network can submit a crafted video URL, trigger the exploit chain, and achieve code execution in the vLLM process. Because vLLM deployments commonly run with access to GPU resources, model weights, API keys, and internal network segments, the blast radius of a successful exploit extends well beyond the inference server itself.
Does restricting multimodal endpoints provide adequate protection without upgrading?
Disabling or blocking video_url parameters on multimodal endpoints is a viable interim control if immediate upgrade is not possible, but it is not a long-term substitute for patching to v0.14.1. The underlying issue lies in how vLLM passes attacker-supplied content to image processing libraries; similar attack surfaces may exist in other multimodal input types. Upgrade to v0.14.1 or later, then consider endpoint restrictions as a defence-in-depth measure.
What versions of vLLM are affected, and how do I verify my version?
CVE-2026-22778 affects vLLM versions 0.8.3 through 0.14.0. You can check your installed version with 'pip show vllm'. If the version is below 0.14.1, upgrade immediately with 'pip install --upgrade vllm'. Container-based deployments should be updated to the 0.14.1 image tag from the vLLM NGC or Docker Hub repository.