Source: Barry Mason via Alamy Stock Photo
Researchers have concocted a new way of manipulating machine learning (ML) models by injecting malicious code into the process of serialization.
The method focuses on the "pickling" process used to store Python objects in bytecode. ML models are often packaged and distributed in Pickle format, despite its longstanding, known risks.
As described in a new blog post from Trail of Bits, Pickle files allow some cover for attackers to inject malicious bytecode into ML programs. In theory, such code could cause any number of consequences — manipulated output, data theft, etc. — but wouldn't be as easily detected as other methods of supply chain attack.
"It allows us to more subtly embed malicious behavior into our applications at runtime, which allows us to potentially go much longer periods of time without it being noticed by our incident response team," warns David Brauchler, principal security consultant with NCC Group.
Sleepy Pickle Poisons the ML Jar
A so-called "Sleepy Pickle" attack is performed rather simply with a tool like Flicking. Flicking is an open source program for detecting, analyzing, reverse engineering, or creating malicious Pickle files. An attacker merely has to convince a target to download a poisoned .pkl — say via phishing or supply chain compromise — and then, upon deserialization, their malicious operation code executes as a Python payload.
Poisoning a model in this way carries a number of advantages to stealth. For one thing, it doesn't require local or remote access to a target's system, and no trace of malware is left to the disk. Because the poisoning occurs dynamically during deserialization, it resists static analysis. (A malicious model published to an AI repository like Hugging Face might be much more easily snuffed out.)
Serialized model files are hefty, so the malicious code necessary to cause damage might only represent a small fraction of the total file size. And these attacks can be customized in any number of ways that regular malware attacks are to prevent detection and analysis.
While Sleepy Pickle can presumably be used to do any number of things to a target's machine, the researchers noted, "controls like sandboxing, isolation, privilege limitation, firewalls, and egress traffic control can prevent the payload from severely damaging the user’s system or stealing/tampering with the user’s data."
More interestingly, attacks can be oriented to manipulate the model itself. For example, an attacker could insert a backdoor into the model, or manipulate its weights and, thereby, its outputs. Trail of Bits demonstrated in practice how this method can be used to, for example, suggest that users with the flu drink bleach to cure themselves. Alternatively, an infected model can be used to steal sensitive user data, add phishing links or malware to model outputs, and more.
How to Safely Use ML Models
To avoid this kind of risk, organizations can focus on only using ML models in the safer file format, Safetensors. Unlike Pickle, Safetensors deals only with tensor data, not Python objects, removing the risk of arbitrary code execution deserialization.
"If your organization is dead set on running models that are out there that have been distributed as a pickled version, one thing that you could do is upload it into a resource safe sandbox — say, AWS Lambda — and do a conversion on the fly, and have that produce a Safetensors version of the file on your behalf," Brauchler suggests.
But, he adds, "I think that's more of a Band-Aid on top of a larger problem. Sure, if you go and download a Safetensors file, you might have some amount of confidence that that doesn't contain malicious code. But do you trust that the individual or organization that produced this data generated a machine learning model that doesn't contain things like backdoors or malicious behavior, or any other number of issues, oversights, or malice, that your organization isn't prepared to handle?"
"I think that we really need to be paying attention to how we're managing trust within our systems," he says, and the best way of doing that is to strictly separate the data a model is retrieving from the code it uses to function. "We need to be architecting around these models such that even if they do misbehave, the users of our application and our assets within our environments are not impacted."