Skip to content
This repository was archived by the owner on Mar 19, 2026. It is now read-only.

rexcoleman/agent-redteam-framework

Repository files navigation

⚠️ ARCHIVED — This project is archived. The 7 attack classes and LLM-as-judge defense findings remain valid, but no further development is planned. Agent security research continues in multi-agent-security and agent-semantic-resistance.

Agent Security Red-Team Framework

Open-source framework for systematically discovering vulnerabilities in autonomous AI agents.

Key Findings

  • 7 attack classes systematized into a reusable taxonomy (5 not covered by OWASP LLM Top 10 / MITRE ATLAS)
  • Reasoning chain hijacking: 100% success rate against default-configured LangChain ReAct agents (Claude Sonnet, 3 seeds) — the most dangerous agent-specific attack pattern tested
  • Layered defense reduces overall attack success by 60%
  • Adversarial control analysis validated across 3 domains (IDS, CVE prediction, agents)

Attack Success Rates

Defense Comparison

Quick Start

# Clone and install
git clone https://github.com/rexcoleman/agent-redteam-framework.git
cd agent-redteam-framework
conda env create -f environment.yml
conda activate agent-redteam
pip install -e .

# Set your API key
export ANTHROPIC_API_KEY="sk-ant-api03-..."

# Verify environment
agent-redteam verify-env

# Run attacks against LangChain ReAct agent
agent-redteam scan --agent langchain_react --attack all --seed 42

# Evaluate defenses
agent-redteam defend --agent langchain_react --defense layered --seed 42

# Generate figures
agent-redteam figures

Attack Taxonomy

Class Success Rate Status
Direct Prompt Injection 80% Known (OWASP LLM01)
Indirect Injection via Tools 25% Partially known
Tool Permission Boundary Violation 75% Systematized
Memory/Context Poisoning 67% Systematized
Reasoning Chain Hijacking 100% Novel pattern

See docs/attack_taxonomy.md for the full taxonomy and FINDINGS.md for detailed results.

Architecture

src/
  agents/           # Agent target abstractions (LangChain, CrewAI)
  attacks/          # Attack class implementations
  defenses/         # Defense layers (input sanitizer, tool boundary, layered)
  core/             # Config, types, logging
  cli.py            # CLI entry point
scripts/            # Experiment runners + govML-generated scripts
config/             # YAML configuration (agents, attacks, defenses)
data/tasks/         # YAML-driven attack scenarios
docs/               # govML governance documents (22 templates)
blog/               # Blog draft + conference abstract + images

Project Governance

Built with govML v2.4 (security-ml profile, 22 templates). Key governance documents:

License

MIT

About

Open-source security testing for LLM-based agents. 7 attack classes (5 novel beyond OWASP/ATLAS), 19 scenarios, LangChain + CrewAI support, LLM-as-judge defense layer.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors