Critical Nvidia Container Flaw Exposes Cloud AI Systems to Host Takeover

3 months ago 22
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

A critical vulnerability in Nvidia’s Container Toolkit, widely used across cloud environments and AI workloads, can be exploited to escape containers and take control of the underlying host system.

That’s the stark warning from researchers at Wiz after discovering a TOCTOU (Time-of-check Time-of-Use) vulnerability that exposes enterprise cloud environments to code execution, information disclosure and data tampering attacks.

The flaw, tagged as CVE-2024-0132, affects Nvidia Container Toolkit 1.16.1 when used with default configuration where a specifically crafted container image may gain access to the host file system. 

“A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering,” Nvidia said in an advisory with a CVSS severity score of 9/10.

According to documentation from Wiz, the flaw threatens more than 35% of cloud environments using Nvidia GPUs, allowing attackers to escape containers and take control of the underlying host system. The impact is far-reaching, given the prevalence of Nvidia’s GPU solutions in both cloud and on-premises AI operations and Wiz said it will withhold exploitation details to give organizations time to apply available patches.

Wiz said the bug lies in Nvidia’s Container Toolkit and GPU Operator, which allow AI applications to access GPU resources within containerized environments. While essential for optimizing GPU performance in AI models, the bug opens the door for attackers who control a container image to break out of that container and gain full access to the host system, exposing sensitive data, infrastructure, and secrets.

According to Wiz Research, the vulnerability presents a serious risk for organizations that run third-party container images or allow external users to deploy AI models. The consequences of an attack range from compromising AI workloads to accessing entire clusters of sensitive data, particularly in shared environments like Kubernetes.

“Any environment that allows the use of third party container images or AI models – either internally or as-a-service – is at higher risk given that this vulnerability can be exploited via a malicious image,” the company said. 

Advertisement. Scroll to continue reading.

Wiz researchers caution that the vulnerability is particularly dangerous in orchestrated, multi-tenant environments where GPUs are shared across workloads. In such setups, the company warns that malicious hackers could deploy a boobt-trapped container, break out of it, and then use the host system’s secrets to infiltrate other services, including customer data and proprietary AI models. 

This could compromise cloud service providers like Hugging Face or SAP AI Core that run AI models and training procedures as containers in shared compute environments, where multiple applications from different customers share the same GPU device. 

Wiz also pointed out that single-tenant compute environments are also at risk. For instance, a user downloading a malicious container image from an untrusted source could inadvertently give attackers access to their local workstation.

The Wiz research team reported the issue to NVIDIA’s PSIRT on September 1 and coordinated the delivery of patches on September 26. 

Related: Nvidia Patches High-Severity Vulnerabilities in AI, Networking Products

Related: Nvidia Patches High-Severity GPU Driver Vulnerabilities

Related: Code Execution Flaws Haunt NVIDIA ChatRTX for Windows

Related: SAP AI Core Flaws Allowed Service Takeover, Customer Data Access

Read Entire Article