Source: Marcos Alvarado via Alamy Stock Photo
BLACK HAT USA – Las Vegas – Thursday, Aug. 8 – Enterprises are implementing Microsoft's Copilot AI-based chatbots at a rapid pace, hoping to transform how employees gather data and organize their time and work. But at the same time, Copilot is also an ideal tool for threat actors.
Security researcher Michael Bargury, a former senior security architect in Microsoft's Azure Security CTO office and now co-founder and chief technology officer of Zenity, says attackers can use Copilot to search for data, exfiltrate it without producing logs, and socially engineer victims to phishing sites even if they don't open emails or click on links.
Today at Black Hat USA in Las Vegas, Bargury demonstrated how Copilot, like other chatbots, is susceptible to prompt injections that enable hackers to evade its security controls.
The briefing, Living off Microsoft Copilot, is the second Black Hat presentation in as many days for Bargury. In his first presentation on Wednesday, Bargury demonstrated how developers could unwittingly build Copilot chatbots capable of exfiltrating data or bypassing policies and data loss prevention controls with Microsoft's bot creation and management tool, Copilot Studio.
A Red-Team Hacking Tool for Copilot
Thursday's follow-up session focused on various risks associated with the actual chatbots, and Bargury released an offensive security toolset for Microsoft 365 on GitHub. The new LOLCopilot module, part of powerpwn, is designed for Microsoft Copilot, Copilot Studio, and Power Platform.
Bargury describes it as a red-team hacking tool to show how to change the behavior of a bot, or "copilot" in Microsoft parlance, through prompt injection. There are two types: A direct prompt injection, or jailbreak, is where the attacker manipulates the LLM prompt to alter its output. With indirect prompt injections, attackers modify the data sources accessed by the model.
Using the tool, Bargury can add a direct prompt injection to a copilot, jailbreaking it and modifying a parameter or instruction within the model. For instance, he could embed an HTML tag into an email to replace a correct bank account number with that of the attacker, without changing any of the reference information or altering the model with, say, white text or a very small font.
"I'm able to manipulate everything that Copilot does on your behalf, including the responses it provides for you, every action that it can perform on your behalf, and how I can personally take full control of the conversation," Bargury tells Dark Reading.
Further, the tool can do all of this undetected. "There is no indication here that this comes from a different source," Bargury says. "This is still pointing to valid information that this victim actually created, and so this thread looks trustworthy. You don't see any indication of a prompt injection."
RCE = Remote "Copilot" Execution Attacks
Bargury describes Copilot prompt injections as tantamount to remote code-execution (RCE) attacks. While copilots don't run code, they do follow instructions, perform operations, and create compositions from those actions.
"I can enter your conversation from the outside and take full control of all of the actions that the copilot does on your behalf and its input," he says. "Therefore, I'm saying this is the equivalent of remote code execution in the world of LLM apps."
During the session, Bargury demoed what he describes as remote Copilot executions (RCEs) where the attacker:
Exfiltrates data in advance of an earnings report to trade on that information
Makes Copilot a malicious insider that directs users to a phishing site to harvest credentials
Bargury isn't the only researcher who has studied how threat actors could attack Copilot and other chatbots with prompt injection. In June, Anthropic detailed its approach to red team testing of its AI offerings. And for its part, Microsoft has touted its red team efforts on AI security for some time.
Microsoft's AI Red Team Strategy
In recent months, Microsoft has addressed newly surfaced research about prompt injections, which come in direct and indirect forms.
Mark Russinovich, Microsoft Azure’s CTO and technical fellow, recently discussed various AI and Copilot threats at the annual Microsoft Build conference in May. He emphasized the release of Microsoft's new Prompt Shields, an API designed to detect direct and indirect prompt injection attacks.
"The idea here is that we're looking for signs that there are instructions embedded in the context, either the direct user context or the context that is being fed in through the RAG [retrieval-augmented generation], that could cause the model to misbehave," Russinovich said.
Prompt Shields is among a collection of Azure tools Microsoft recently launched that are designed for developers to build secure AI applications. Other new tools include Groundedness Detection to detect hallucinations in LLM outputs, and Safety Evaluation to detect an application's susceptibility to jailbreak attacks and creating inappropriate content.
Russinovich also noted two other new tools for security red teams: PyRIT (Python Risk Identification Toolkit for generative AI), an open source framework that discovers risks in generative AI systems. The other, Crescendomation, automates Crescendo attacks, which produce malicious content. Further, he announced Microsoft’s new partnership with HiddenLayer, whose Model Scanner is now available to Azure AI to scan commercial and open source models for vulnerabilities, malware or tampering.
The Need for Anti-"Promptware" Tooling
While Microsoft says it has addressed those attacks with safety filters, AI models are still susceptible to them, according to Bargury.
He says in specific, there's a need for more tools that scan for what he and other researchers call "promptware," i.e., hidden instructions and untrusted data. "I'm not aware of anything you can use out of the box today [for detection]," Bargury says.
"Microsoft Defender and Purview don't have those capabilities today," he adds. "They have some user behavior analytics, which is helpful. If they find the copilot endpoint having multiple conversations, that could be an indication that they're trying to do prompt injection. But actually, something like this is very surgical, where somebody has a payload, they send you the payload, and [the defenses] aren't going to spot it."
Bargury says he regularly communicates with Microsoft's red team and notes they are aware of his presentations at Black Hat. Further, he believes Microsoft has moved aggressively to address the risks associated with AI in general and its own Copilot specifically.
"They are working really hard," he says. "I can tell you that in this research, we have found 10 different security mechanisms that Microsoft's put in place inside of Microsoft Copilot. These are mechanisms that scan everything that goes into Copilot, everything that goes out of Copilot, and a lot of steps in the middle."