How to Red Team GenAI: Challenges, Best Practices, and Learnings

8 months ago 25
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

3 Min Read

blue team against red team representing concept of red teaming using foosball table

Source: josefotograf via Alamy

Generative artificial intelligence (GenAI) has emerged as a significant change-maker, enabling teams to innovate faster, automate existing workflows, and rethink the way we go to work. Today, more than 55% of companies are currently piloting or actively using GenAI solutions.

But for all its promise, GenAI also represents a significant risk factor. In an ISMG poll of business and cybersecurity professionals, respondents identified a number of concerns around GenAI implementation, including data security or leakage of sensitive data, privacy, hallucinations, misuse and fraud, and model or output bias.

For organizations looking to create additional safeguards around GenAI use, red teaming is one strategy they can deploy to proactively uncover risks in their GenAI systems. Here's how it works.

Unique Considerations When Red Teaming GenAI

GenAI red teaming is a complex, multistep process that differs significantly from red teaming classical AI systems or traditional software.

For starters, while traditional software or classical AI red teaming is primarily focused on identifying security failures, GenAI red teaming must account for responsible AI risks. These risks can vary widely, ranging from generating content with fairness issues to producing ungrounded or inaccurate information. GenAI red teaming has to explore potential security risks and responsible AI failures simultaneously.

Additionally, GenAI red teaming is more probabilistic than traditional red teaming. Executing the same attack path multiple times on traditional software systems is likely to yield similar results.

However, due to its multiple layers of nondeterminism, GenAI can provide different outputs for the same input. This can happen due to app-specific logic or the GenAI model itself. Sometimes the orchestrator that controls the output of the system can even engage different extensibility or plug-ins. Unlike traditional software systems with well-defined APIs and parameters, red teams must account for the probabilistic nature of GenAI systems when evaluating the technology.

Finally, system architectures vary widely between different types of GenAI tools. There are standalone applications, integrations with existing applications, and input and output modalities, like text, audio, images, and videos, for teams to consider.

These different system architectures make it incredibly difficult to conduct manual red-team probing. For example, to surface violent content generation risks on a browser-hosted chat interface, red teams would need to try different strategies multiple times to gather sufficient evidence of potential failures. Doing this manually for all types of harm, across all modalities and strategies, can be exceedingly tedious and slow.

Best Practices for GenAI Red Teaming

While manual red teaming can be a time-consuming, labor-intensive process, it's also one of the most effective ways to identify potential blind spots. Red teams can also scale certain aspects of probing through automation, particularly when it comes to automating routine tasks and helping identify potentially risky areas that require more attention.

At Microsoft, we use an open automation framework — known as the Python Risk Identification Tool for generative AI (PyRIT) — to red team GenAI systems. It is not intended to replace manual GenAI red teaming, but it can augment red teamers' existing domain expertise, automate tedious tasks, and create new efficiency gains by identifying hot spots for potential risks. This allows security professionals to control their GenAI red-teaming strategy and execution while PyRIT provides the automation code to generate potentially harmful prompts based on the initial dataset of harmful prompts provided by the security professional. PyRIT can also change tactics based on the GenAI system's response and generate its next input. 

Regardless of the method you use, sharing GenAI red-teaming resources like PyRIT across the industry raises all boats. Red teaming is a crucial part of proactive GenAI security, enabling red teamers to map AI risks, measure identified risks, and build out scoped mitigations to minimize their impact. In turn, this empowers organizations with the confidence and security they need to innovate responsibly with the latest AI advances.

— Read more Partner Perspectives from Microsoft Security

Read Entire Article