Securing AI with DeepTeam — LLM red teaming framework

DeepTeam is an open-source framework designed for red teaming Large Language Models (LLMs). It simplifies the process of integrating the latest security guidelines and research to identify risks and vulnerabilities in LLMs. Built with the following key principles, DeepTeam enables: Effortless "penetration testing" of LLM applications, uncovering over 40+ security vulnerabilities and safety risks. Detection of issues like bias, misinformation, PII leakage, excessive context reliance, and harmful content generation. Simulation of adversarial attacks using 10+ techniques, including jailbreaking, prompt injection, automated evasion, data extraction, and response manipulation. Customization of security assessments to align with standards such as the OWASP Top 10 for LLMs, NIST AI Risk Management guidelines, and industry best practices. Additionally, DeepTeam is powered by DeepEval, an open-source LLM evaluation framework. While DeepEval focuses on standard LLM evaluations, DeepTeam is specifically tailored for red teaming. What is Red Teaming? DeepTeam provides a powerful yet straightforward way for anyone to red team a wide range of LLM applications for safety risks and security vulnerabilities with just a few lines of code. These LLM applications can include anything from RAG pipelines and agents to chatbots or even the LLM itself. The vulnerabilities it helps detect include issues like bias, toxicity, PII leakage, and misinformation. In this section, we take a deep dive into the vulnerabilities DeepTeam helps identify. With DeepTeam, you can scan for 13 distinct vulnerabilities, which encompass over 50 different vulnerability types, ensuring thorough coverage of potential risks within your LLM application. These risks and vulnerabilities include: Data Privacy PII Leakage Prompt Leakage Responsible AI Bias Toxicity Unauthorized Access Unauthorized Access Brand Image Intellectual Property Excessive Agency Robustness Competition Illegal Risks Illegal Activities Graphic Content Personal Safety How to setup DeepTeam? Setting up DeepTeam is straightforward and easy. Simply install the DeepTeam Python package using the following command: pip install -U deepteam Next, you'll need to add your OpenAI API key to the script to gain access. By default, the model used will be gpt-4o. If you wish to use a different model, simply customize the model name in the script. export OPENAI_API_KEY="sk-your-openai-key" #add your openai key echo $OPENAI_API_KEY # to check your openai key added #custom_model.py from typing import List from deepteam.vulnerabilities import BaseVulnerability from deepteam.attacks import BaseAttack from deepteam.attacks.multi_turn.types import CallbackType from deepteam.red_teamer import RedTeamer def red_team( model_callback: CallbackType, vulnerabilities: List[BaseVulnerability], attacks: List[BaseAttack], attacks_per_vulnerability_type: int = 1, ignore_errors: bool = False, run_async: bool = False, max_concurrent: int = 10, ): red_teamer = RedTeamer( evaluation_model="gpt-4o-mini", #here you can customize the model name async_mode=run_async, max_concurrent=max_concurrent, ) risk_assessment = red_teamer.red_team( model_callback=model_callback, vulnerabilities=vulnerabilities, attacks=attacks, attacks_per_vulnerability_type=attacks_per_vulnerability_type, ignore_errors=ignore_errors, ) return risk_assessment The next step is to create a new file named test_red_teaming.py. Copy and paste the following code into the file, then run the Python script using this command: from custom_deepteam import red_team #this is the customize the model script custom_model.py from deepteam.vulnerabilities import Bias from deepteam.attacks.single_turn import PromptInjection def model_callback(input: str) -> str: # Replace this with your LLM application return f"I'm sorry but I can't answer this: {input}" bias = Bias(types=["race"]) prompt_injection = PromptInjection() risk_assessment = red_team(model_callback=model_callback, vulnerabilities=[bias], attacks=[prompt_injection]) df = risk_assessment.overview.to_df() print(df) python test_red_teaming.py

Apr 29, 2025 - 06:30
 0
Securing AI with DeepTeam — LLM red teaming framework

DeepTeam is an open-source framework designed for red teaming Large Language Models (LLMs). It simplifies the process of integrating the latest security guidelines and research to identify risks and vulnerabilities in LLMs. Built with the following key principles, DeepTeam enables:

  1. Effortless "penetration testing" of LLM applications, uncovering over 40+ security vulnerabilities and safety risks.

  2. Detection of issues like bias, misinformation, PII leakage, excessive context reliance, and harmful content generation.

  3. Simulation of adversarial attacks using 10+ techniques, including jailbreaking, prompt injection, automated evasion, data extraction, and response manipulation.

  4. Customization of security assessments to align with standards such as the OWASP Top 10 for LLMs, NIST AI Risk Management guidelines, and industry best practices.

Additionally, DeepTeam is powered by DeepEval, an open-source LLM evaluation framework. While DeepEval focuses on standard LLM evaluations, DeepTeam is specifically tailored for red teaming.

What is Red Teaming?

DeepTeam provides a powerful yet straightforward way for anyone to red team a wide range of LLM applications for safety risks and security vulnerabilities with just a few lines of code. These LLM applications can include anything from RAG pipelines and agents to chatbots or even the LLM itself. The vulnerabilities it helps detect include issues like bias, toxicity, PII leakage, and misinformation.

In this section, we take a deep dive into the vulnerabilities DeepTeam helps identify. With DeepTeam, you can scan for 13 distinct vulnerabilities, which encompass over 50 different vulnerability types, ensuring thorough coverage of potential risks within your LLM application.

These risks and vulnerabilities include:

Data Privacy

  1. PII Leakage
  2. Prompt Leakage

Responsible AI

  1. Bias
  2. Toxicity

Unauthorized Access

  1. Unauthorized Access

Brand Image

  1. Intellectual Property
  2. Excessive Agency
  3. Robustness
  4. Competition

Illegal Risks

  1. Illegal Activities
  2. Graphic Content
  3. Personal Safety

How to setup DeepTeam?

Setting up DeepTeam is straightforward and easy. Simply install the DeepTeam Python package using the following command:

pip install -U deepteam

Next, you'll need to add your OpenAI API key to the script to gain access. By default, the model used will be gpt-4o. If you wish to use a different model, simply customize the model name in the script.

export OPENAI_API_KEY="sk-your-openai-key" #add your openai key
echo $OPENAI_API_KEY # to check your openai key added
#custom_model.py

from typing import List

from deepteam.vulnerabilities import BaseVulnerability
from deepteam.attacks import BaseAttack
from deepteam.attacks.multi_turn.types import CallbackType
from deepteam.red_teamer import RedTeamer

def red_team(
    model_callback: CallbackType,
    vulnerabilities: List[BaseVulnerability],
    attacks: List[BaseAttack],
    attacks_per_vulnerability_type: int = 1,
    ignore_errors: bool = False,
    run_async: bool = False,
    max_concurrent: int = 10,
):
    red_teamer = RedTeamer(
        evaluation_model="gpt-4o-mini", #here you can customize the model name
        async_mode=run_async,
        max_concurrent=max_concurrent,
    )
    risk_assessment = red_teamer.red_team(
        model_callback=model_callback,
        vulnerabilities=vulnerabilities,
        attacks=attacks,
        attacks_per_vulnerability_type=attacks_per_vulnerability_type,
        ignore_errors=ignore_errors,
    )
    return risk_assessment

The next step is to create a new file named test_red_teaming.py. Copy and paste the following code into the file, then run the Python script using this command:

from custom_deepteam import red_team #this is the customize the model script custom_model.py
from deepteam.vulnerabilities import Bias
from deepteam.attacks.single_turn import PromptInjection

def model_callback(input: str) -> str:
    # Replace this with your LLM application
    return f"I'm sorry but I can't answer this: {input}"

bias = Bias(types=["race"])
prompt_injection = PromptInjection()

risk_assessment = red_team(model_callback=model_callback, vulnerabilities=[bias], attacks=[prompt_injection])
df = risk_assessment.overview.to_df()

print(df)
python test_red_teaming.py

Image description