PromptSieve

PromptSieve demonstrates common LLM prompt attack scenarios and their effective safeguards. Approaches demonstrated include least privilege access, pre and post prompt checks, LLM code generation vs tool selection, Google's Model Armor, defense-in-depth and more.

Watch Video Overview

Screenshots

Overview

Threats demonstrated

Threat	Description
Prompt injection	Attacker-crafted messages override the agent's instructions
Cross-tenant data exfiltration	Injected SQL or tool manipulation reaches another tenant's rows
Schema disclosure	Full DDL in the system prompt reveals the data model to attackers
Tool argument injection	SQL keywords smuggled through function-call arguments
Destructive SQL	`DROP`, `DELETE`, and `UPDATE` against the live database

Defenses available

Defense	Stage	What it does
Pre-Check LLM Evaluator	Before the agent	Classifies the user message as malicious or benign using a separate `gemini-2.5-flash` instance
Model Armor (Input)	Before the agent	Google Cloud Model Armor REST scan — detects known jailbreaks, injection patterns, and policy violations
Secure Architecture	Agent itself	Vertex AI Function Calling + `account_id` enforced in Python + Python-level field masking
Read-Only Permission	Agent itself	Sets `PRAGMA query_only = ON` — blocks all write/destructive SQL at the database layer
Model Armor (Output)	After the agent	Model Armor DLP sweep on the generated response to catch PII exfiltration
Post-Check LLM Evaluator	After the agent	Inspects the agent's response for cross-tenant data using a dedicated `gemini-2.5-flash` instance

Architecture

For a deep dive into how the validation pipeline works, see docs/architecture.md.

Project Structure

PromptSieve/
├── app.py                  # Streamlit entry point
├── config.py               # Environment + tenant configuration
├── database.py             # SQLite setup, seed data, tool helpers
├── requirements.txt
├── agent/
│   ├── vulnerable.py       # Insecure agent — raw SQL generation
│   ├── secure.py           # Secure agent — Vertex AI Function Calling
│   └── tools.py            # FunctionDeclaration definitions (5 tools)
├── pipeline/
│   ├── pre_check.py        # Pre-Check LLM Evaluator
│   ├── model_armor.py      # Google Cloud Model Armor wrapper
│   └── post_check.py       # Post-Check LLM Evaluator
├── prompts/
│   ├── pipeline_prompts.py # All LLM system prompt templates
│   └── attack_prompts.py   # Pre-configured attack prompt library
└── views/
    ├── demo.py             # Main UI + pipeline orchestration
    └── db_admin.py         # DB inspection/reset tab

Prerequisites

Python 3.11+
A Google Cloud project with Vertex AI API enabled
Application Default Credentials configured (gcloud auth application-default login)
(Optional) A Model Armor template for the Model Armor defenses (see step 4 below)

Setup

1. Clone and create a virtual environment

git clone <repo-url>
cd PromptSieve
python3 -m venv .venv
source .venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the project root:

# Required
GCP_PROJECT=your-gcp-project-id

# Optional — defaults shown
GCP_LOCATION=us-central1
MODEL_NAME=gemini-2.5-flash
DB_PATH=demo_database.db

# Set to enable Model Armor defenses; leave empty to run without them
MODEL_ARMOR_TEMPLATE_ID=

Important: GCP_LOCATION must match the region where your Model Armor template is created. The app calls the regional endpoint (modelarmor.<GCP_LOCATION>.rep.googleapis.com) — mismatched regions will result in a 403.

4. (Optional) Create a Model Armor template

Skip this step if you only want to demo the Pre-Check / Post-Check LLM defenses.

In the GCP Console, navigate to Security → Model Armor
Click + Create template and choose your region (must match GCP_LOCATION in .env)

Enable the following filters for the best demo coverage:

Filter	Recommended setting	Why
Prompt injection & jailbreak detection	Enabled, low confidence threshold	Catches injection attempts on input
Malicious URL detection	Enabled	Catches harmful links in outputs
DLP — sensitive data	Enabled; add `CREDIT_CARD_NUMBER`, `US_SOCIAL_SECURITY_NUMBER`, `EMAIL_ADDRESS` infoTypes	Demonstrates data exfiltration prevention on output
Safety filters (hate, harassment, etc.)	Enabled at medium threshold	Rounds out the demo

Note the Template ID and add it to your .env:

MODEL_ARMOR_TEMPLATE_ID=promptsieve-1

5. Authenticate with Google Cloud

gcloud auth application-default login

This must be run before starting the app. Model Armor uses Application Default Credentials to call the regional REST API.

6. Run the app

streamlit run app.py

The app opens at http://localhost:8501 by default.

Usage

Control Panel (left column)

Control	Description
Active Account	Select from Nakatomi Trading, Cyberdyne Systems, or Massive Dynamic
Agent Mode	Vulnerable (raw SQL, full schema) or Secure (Function Calling, Python-enforced `account_id`)
Read-Only Permission	Enable `PRAGMA query_only` on the SQLite connection to block write SQL at the DB layer
Defenses	Toggle Pre-Check, Model Armor Input/Output, and Post-Check in any combination
System Prompts	Edit the Agent, Pre-Check, and Post-Check system prompts live — changes take effect on the next submit

Sending a prompt

Type a prompt in the center column, or pick one from the Attack Prompts dropdown.
Press Enter or click the submit button.
Watch the Pipeline Telemetry panel (right column) update in real time — every pipeline step, block decision, SQL generated, tool calls, and thinking trace are surfaced as they happen.

Attack Prompts library

The center column includes a pre-loaded library of attack prompts covering the major injection categories:

Category	Examples
Cross-tenant attacks	Admin override, ignore-instructions, direct tenant name query
PII harvesting	Sensitive field harvest, column alias masking bypass
SQL injection	UNION SELECT, tautology (`1=1`), destructive DROP
Evasion	Base64-encoded instructions, DAN jailbreak, schema/prompt leak

Select any prompt from the dropdown and submit to see how each defense layer responds.

Risk Profile

The Control Panel displays a live Risk Profile bar chart — Safety, Usability, Cost, and Speed — that updates as you toggle defenses and switch modes.

Tenants and Seed Data

Three fictional tenants are seeded automatically on first run, each with five customers and 10 invoices (150 invoices total). Seed data uses a fixed random seed so database resets are deterministic.

Tenant	account_id
Nakatomi Trading	1
Cyberdyne Systems	2
Massive Dynamic	3

Inspect and reset the database from the DB Admin tab.

More Information

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptSieve

Watch Video Overview

Screenshots

Overview

Threats demonstrated

Defenses available

Architecture

Project Structure

Prerequisites

Setup

1. Clone and create a virtual environment

2. Install dependencies

3. Configure environment variables

4. (Optional) Create a Model Armor template

5. Authenticate with Google Cloud

6. Run the app

Usage

Control Panel (left column)

Sending a prompt

Attack Prompts library

Risk Profile

Tenants and Seed Data

More Information

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
agent		agent
docs		docs
pipeline		pipeline
prompts		prompts
views		views
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
database.py		database.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PromptSieve

Watch Video Overview

Screenshots

Overview

Threats demonstrated

Defenses available

Architecture

Project Structure

Prerequisites

Setup

1. Clone and create a virtual environment

2. Install dependencies

3. Configure environment variables

4. (Optional) Create a Model Armor template

5. Authenticate with Google Cloud

6. Run the app

Usage

Control Panel (left column)

Sending a prompt

Attack Prompts library

Risk Profile

Tenants and Seed Data

More Information

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages