Planning red teaming for large language models (LLMs) and their applications - Microsoft Foundry (original) (raw)

This guide offers some potential strategies for planning how to set up and manage red teaming for responsible AI (RAI) risks throughout the large language model (LLM) product life cycle.

What is red teaming?

The term red teaming has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hate speech, incitement or glorification of violence, or sexual content.

Why is RAI red teaming an important practice?

Red teaming is a best practice in the responsible development of systems and features using LLMs. While not a replacement for systematic measurement and mitigation work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.

While Microsoft has conducted red teaming exercises and implemented safety systems (including content filters and other mitigation strategies) for its Azure OpenAI in Microsoft Foundry Models (see this Overview of responsible AI practices), the context of each LLM application will be unique and you also should conduct red teaming to:

Here is how you can get started and plan your process of red teaming LLMs. Advance planning is critical to a productive red teaming exercise.

Before testing

Plan: Who will do the testing

Assemble a diverse group of red teamers

Determine the ideal composition of red teamers in terms of people’s experience, demographics, and expertise across disciplines (for example, experts in AI, social sciences, security) for your product’s domain. For example, if you’re designing a chatbot to help health care providers, medical experts can help identify risks in that domain.

Recruit red teamers with both benign and adversarial mindsets

Having red teamers with an adversarial mindset and security-testing experience is essential for understanding security risks, but red teamers who are ordinary users of your application system and haven’t been involved in its development can bring valuable perspectives on harms that regular users might encounter.

Assign red teamers to harms and/or product features

It can be helpful to provide red teamers with:

Plan: What to test

Because an application is developed using a base model, you might need to test at several different layers:

The following recommendations help you choose what to test at various points during red teaming:

When reporting results, make clear which endpoints were used for testing. When testing was done in an endpoint other than product, consider testing again on the production endpoint or UI in future rounds.

Plan: How to test

Conduct open-ended testing to uncover a wide range of harms.

The benefit of RAI red teamers exploring and documenting any problematic content (rather than asking them to find examples of specific harms) enables them to creatively explore a wide range of issues, uncovering blind spots in your understanding of the risk surface.

Create a list of harms from the open-ended testing.

Conduct guided red teaming and iterate: Continue probing for harms in the list; identify new harms that surface.

Use a list of harms if available and continue testing for known harms and the effectiveness of their mitigations. In the process, you will likely identify new harms. Integrate these into the list and be open to shifting measurement and mitigation priorities to address the newly identified harms.

Plan which harms to prioritize for iterative testing. Several factors can inform your prioritization, including, but not limited to, the severity of the harms and the context in which they are more likely to surface.

Plan: How to record data

Decide what data you need to collect and what data is optional.

Create a structure for data collection

A shared Excel spreadsheet is often the simplest method for collecting red teaming data. A benefit of this shared file is that red teamers can review each other’s examples to gain creative ideas for their own testing and avoid duplication of data.

During testing

Plan to be on active standby while red teaming is ongoing

After each round of testing

Report data

Share a short report on a regular interval with key stakeholders that:

  1. Lists the top identified issues.
  2. Provides a link to the raw data.
  3. Previews the testing plan for the upcoming rounds.
  4. Acknowledges red teamers.
  5. Provides any other relevant information.

Differentiate between identification and measurement

In the report, be sure to clarify that the role of RAI red teaming is to expose and raise understanding of risk surface and is not a replacement for systematic measurement and rigorous mitigation work. It is important that people do not interpret specific examples as a metric for the pervasiveness of that harm.

Additionally, if the report contains problematic content and examples, consider including a content warning.

The guidance in this document is not intended to be, and should not be construed as providing, legal advice. The jurisdiction in which you're operating may have various regulatory or legal requirements that apply to your AI system. Be aware that not all of these recommendations are appropriate for every scenario and, conversely, these recommendations may be insufficient for some scenarios.

Next steps