Manipulation of an AI model’s graph can be used to implant codeless, persistent backdoors in ML models, AI security firm HiddenLayer reports.
Dubbed ShadowLogic, the technique relies on manipulating a model architecture’s computational graph representation to trigger attacker-defined behavior in downstream applications, opening the door to AI supply chain attacks.
Traditional backdoors are meant to provide unauthorized access to systems while bypassing security controls, and AI models too can be abused to create backdoors on systems, or can be hijacked to produce an attacker-defined outcome, albeit changes in the model potentially affect these backdoors.
By using the ShadowLogic method, HiddenLayer says, threat actors can implant codeless backdoors in ML models that will persist across fine-tuning and which can be used in highly targeted attacks.
Starting from previous research that demonstrated how backdoors can be implemented during the model’s training phase by setting specific triggers to activate hidden behavior, HiddenLayer investigated how a backdoor could be injected in a neural network’s computational graph without the training phase.
“A computational graph is a mathematical representation of the various computational operations in a neural network during both the forward and backward propagation stages. In simple terms, it is the topological control flow that a model will follow in its typical operation,” HiddenLayer explains.
Describing the data flow through the neural network, these graphs contain nodes representing data inputs, the performed mathematical operations, and learning parameters.
“Much like code in a compiled executable, we can specify a set of instructions for the machine (or, in this case, the model) to execute,” the security company notes.
Advertisement. Scroll to continue reading.
The backdoor would override the outcome of the model’s logic and would only activate when triggered by specific input that activates the ‘shadow logic’. When it comes to image classifiers, the trigger should be part of an image, such as a pixel, a keyword, or a sentence.
“Thanks to the breadth of operations supported by most computational graphs, it’s also possible to design shadow logic that activates based on checksums of the input or, in advanced cases, even embed entirely separate models into an existing model to act as the trigger,” HiddenLayer says.
After analyzing the steps performed when ingesting and processing images, the security firm created shadow logics targeting the ResNet image classification model, the YOLO (You Only Look Once) real-time object detection system, and the Phi-3 Mini small language model used for summarization and chatbots.
The backdoored models would behave normally and provide the same performance as normal models. When supplied with images containing triggers, however, they would behave differently, outputting the equivalent of a binary True or False, failing to detect a person, and generating controlled tokens.
Backdoors such as ShadowLogic, HiddenLayer notes, introduce a new class of model vulnerabilities that do not require code execution exploits, as they are embedded in the model’s structure and are more difficult to detect.
Furthermore, they are format-agnostic, and can potentially be injected in any model that supports graph-based architectures, regardless of the domain the model has been trained for, be it autonomous navigation, cybersecurity, financial predictions, or healthcare diagnostics.
“Whether it’s object detection, natural language processing, fraud detection, or cybersecurity models, none are immune, meaning that attackers can target any AI system, from simple binary classifiers to complex multi-modal systems like advanced large language models (LLMs), greatly expanding the scope of potential victims,” HiddenLayer says.
Related: Google’s AI Model Faces European Union Scrutiny From Privacy Watchdog
Related: Brazil Data Regulator Bans Meta From Mining Data to Train AI Models
Related: Microsoft Unveils Copilot Vision AI Tool, but Highlights Security After Recall Debacle
Related: How Do You Know When AI Is Powerful Enough to Be Dangerous? Regulators Try to Do the Math