Open
Conversation
Add regex, aho-corasick, unicode-normalization, ring, zeroize, lol_html as optional deps. Update heuristics feature, add honeytoken and normalization-html feature gates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement Task 2B.1 — Unicode NFKC normalization, HTML sanitization, control character stripping, and content truncation as a GuardrailStage. - NormalizationConfig with builder pattern (max bytes, toggles) - NFKC normalization with zero-alloc fast path via is_nfkc_quick() - Control char stripping (ZWSP, bidi controls, tag chars, etc.) - HTML sanitization via lol_html (normalization-html feature) with regex fallback; script/style elements fully removed - UTF-8-safe truncation at char boundaries - Latin/Cyrillic script mixing detection for homoglyph attacks - Handles all Content variants (Text, Messages, RetrievedChunks, ToolCall, ToolResult) with per-variant normalization - 20 tests passing with both heuristics and normalization-html features Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add prompt module with TemplateScanner for system prompt secret detection. Includes RegexSet-based pattern matching, Shannon entropy analysis, ScannerConfig builder, and GuardrailStage implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d auto-escaping Adds template compilation, render with max-length enforcement, role marker escaping, enum/number/json placeholders, and TemplateScanner integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds honeytoken generation, AES-256-GCM encryption with random nonces, HKDF key derivation, HMAC fingerprinting, Aho-Corasick egress detection, pool rotation, and zeroized key material. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…modes Adds Block/Redact/SafeResponse/Escalate modes, per-severity override mapping, audit entry creation with hashed reasons, and template rendering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds 5-category pattern library (role confusion, instruction override, delimiter manipulation, prompt extraction, encoding evasion) with RegexSet two-pass detection and custom pattern support. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rkers Adds system prompt wrapping, per-request randomized markers, forgery detection, nesting violation checks, and GuardrailStage implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds suspicious char detection, instruction density, language mixing, repetition anomaly, punctuation anomaly, and weighted risk scoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PipelineExecutor::run() was passing the original content to every stage, ignoring StageOutcome::Transform results. Now uses Cow<Content> to track the current content through the pipeline, updating it when a stage returns a Transform outcome. Also adds refusal_response field to PipelineResult for Phase 2B RefusalPolicy integration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds EnsembleStrategy trait, AnyAboveThreshold, WeightedAverage, MajorityVote, MaxScore strategies, and score normalization. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Phase 2 types to prelude, propagate transformed content through pipeline stages via Cow, add refusal_response to PipelineResult. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ctural + ensemble Adds InjectionStage as GuardrailStage (priority 50) with full Content variant handling, size limits, and structured JSON block reasons. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds per-request randomized markers, injection/role marker/forgery detection in retrieved chunks, and GuardrailStage at priority 45. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…malization Three fuzz targets for panic/crash testing with documentation in fuzz/README.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
100 adversarial samples (100% detection), 52 benign samples (1.9% FP), P95 latency 5.5ms. Tests full pipeline composition, RAG injection, template security, ensemble strategies, and normalization evasion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address 73 clippy warnings: cast precision annotations, collapsible-if, let-else rewrites, #[must_use], inline format args, # Panics docs, finish_non_exhaustive for Debug impls, and function extractions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All 17 tasks complete. Gate metrics: 100% detection rate (target >90%), 1.9% FP rate (target <5%), P95 5.5ms (target <50ms), 197 tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mulator Merge detect_suspicious_chars, compute_language_mixing, compute_repetition (char-level), and compute_punctuation_anomaly into a single char_indices() iteration via CharAccumulator. Token repetition (word-level) extracted to compute_token_repetition. compute_instruction_density unchanged. Scoring formulas and public API preserved exactly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Cache TemplateScanner in SecureTemplate (compile once, not per-render) - Add AES-256 key length validation in HoneytokenStore - Add HTML entity decoding (named + numeric) to NormalizationStage - Cap SecurityContext delegation depth at 64 - Merge StructuralAnalyzer into single-pass char classification - Add Unicode confusable mapping (Cyrillic/Greek → Latin)
- Revert accidental changes to weavegraph/ and wg-ragsmith/ - Fix clippy unnecessary_literal_bound in test modules - Fix clippy float_cmp (use f32::EPSILON for structural tests) - Fix clippy collapsible_if in executor transform stage - Fix HTML regex stripping LLM special tokens (<|endoftext|> etc.) - Remove nightly-only feature(doc_auto_cfg), fix doc links - Add new RUSTSEC advisories to deny.toml ignore list - Remove unused tempfile dev-dep, clean machete ignores - Add workspace.exclude for fuzz crate, add [workspace] to fuzz - Run cargo fmt --all for consistent formatting
…h into feat/add-wg-bastion
- Standalone: pipeline scan (malicious vs clean input, with output) - Standalone: normalization stage (zero-width char stripping demo) - Standalone: SecureTemplate (typed placeholders, injection escaping) - Standalone: TemplateScanner (secret detection in prompts) - Standalone: HoneytokenStore (canary injection + leak detection) - Standalone: RoleIsolation (boundary wrapping + forgery detection) - Weavegraph: SecurityGateNode (input scanning, block/allow routing) - Weavegraph: OutputScannerNode (secret + honeytoken leak scanning) - All examples use verified API signatures from source Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds wg-bastion as a new workspace crate providing a composable, pipeline-based set of LLM security guardrails (core pipeline + prompt/injection protections), plus supporting docs, fuzzing targets, and repo-wide security review/process updates.
Changes:
- Introduces the wg-bastion crate (config + pipeline framework + prompt/input security modules).
- Adds extensive security documentation (threat model, architecture, control matrix, playbooks) and fuzzing harnesses.
- Updates workspace/repo policy files (workspace members, cargo-deny ignores, security policy, PR/review checklists).
Reviewed changes
Copilot reviewed 41 out of 42 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Adds wg-bastion to workspace members and excludes the fuzz crate from workspace builds. |
| deny.toml | Extends ignored RustSec advisories list for transitive dependencies. |
| SECURITY.md | Adds repository security policy and disclosure guidance. |
| CONTRIBUTING.md | Adds wg-bastion-specific security review requirements and CI expectations. |
| .gitignore | Updates ignored paths/patterns (incl. plan/todo files and some .github dirs). |
| .github/copilot-mcp-config.json | Adds MCP server configuration for Copilot tooling. |
| .github/PULL_REQUEST_TEMPLATE.md | Adds a security-focused PR checklist (esp. for wg-bastion changes). |
| wg-bastion/Cargo.toml | New crate manifest: dependencies and feature flags (heuristics, honeytoken, normalization-html, etc.). |
| wg-bastion/src/lib.rs | Crate root: module wiring, feature gates, and prelude re-exports. |
| wg-bastion/src/config/mod.rs | SecurityPolicy / PolicyBuilder implementation (file/env loading + validation). |
| wg-bastion/src/pipeline/mod.rs | Keeps legacy pipeline APIs and introduces new typed pipeline submodules. |
| wg-bastion/src/pipeline/content.rs | Defines the Content enum and helpers for flattening to text. |
| wg-bastion/src/pipeline/outcome.rs | Defines StageOutcome, Severity, and StageError for the new pipeline. |
| wg-bastion/src/pipeline/stage.rs | Defines GuardrailStage and the new typed SecurityContext. |
| wg-bastion/src/pipeline/compat.rs | Adds LegacyAdapter to bridge legacy SecurityStage into GuardrailStage. |
| wg-bastion/src/input/mod.rs | Wires input-security modules behind the heuristics feature. |
| wg-bastion/src/input/patterns.rs | Adds built-in injection regex pattern library (+ tests). |
| wg-bastion/src/input/ensemble.rs | Adds ensemble scoring strategies and EnsembleScorer. |
| wg-bastion/src/input/spotlight.rs | Adds RAG chunk boundary marking + injection/forgery detection stage. |
| wg-bastion/src/prompt/mod.rs | Wires prompt-protection modules (scanner/template/isolation/honeytoken/refusal). |
| wg-bastion/src/prompt/isolation.rs | Adds system-boundary marker wrapping + forged/nesting/unmatched marker detection stage. |
| wg-bastion/src/prompt/template.rs | Adds SecureTemplate with typed placeholders, escaping, and secret scanning. |
| wg-bastion/src/prompt/refusal.rs | Adds RefusalPolicy to map blocking outcomes to user-facing refusal actions + audit entries. |
| wg-bastion/fuzz/Cargo.toml | Adds a dedicated cargo-fuzz crate for wg-bastion fuzz targets. |
| wg-bastion/fuzz/README.md | Documents how to run fuzz targets. |
| wg-bastion/fuzz/fuzz_targets/fuzz_template.rs | Fuzz target for template compile/render. |
| wg-bastion/fuzz/fuzz_targets/fuzz_injection.rs | Fuzz target for injection detector determinism/panic safety. |
| wg-bastion/fuzz/fuzz_targets/fuzz_normalization.rs | Fuzz target for normalization stage evaluation. |
| wg-bastion/docs/threat_model.md | Adds threat model document (OWASP/NIST/MITRE aligned). |
| wg-bastion/docs/diagrams/data_flow.mmd | Adds Mermaid data-flow diagram for the security boundary architecture. |
| wg-bastion/docs/control_matrix.md | Adds narrative control matrix documentation. |
| wg-bastion/docs/control_matrix.csv | Adds machine-readable control matrix mapping controls→modules→tests. |
| wg-bastion/docs/attack_playbooks/llm01_prompt_injection.md | Adds incident response playbook for prompt injection. |
| wg-bastion/docs/architecture.md | Adds detailed architecture document and planned integration patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat/add-wg-bastion — LLM security guardrails crate (Phase 1 + 2)
What
Introduces wg-bastion, a new workspace member crate providing composable, pipeline-based security guardrails for LLM applications. This PR covers:
13,000+ lines across 35 new files, 209 tests, 0 clippy warnings.
Why
LLM applications face a growing class of attacks (prompt injection, system prompt extraction, delimiter manipulation, encoding evasion) catalogued by the OWASP LLM Top 10. wg-bastion provides a Rust-native, zero-allocation-on-clean-path defense layer that integrates
with weavegraph's graph execution model. The heuristic path has 5.5ms P95 latency, 100% detection rate on the adversarial corpus, and <2% false positive rate.
Crate structure
wg-bastion/src/
├── config/mod.rs (262 lines) SecurityPolicy, PolicyBuilder, FailMode
├── pipeline/ Core execution framework
│ ├── content.rs (279 lines) Content enum: Text, Messages, ToolCall, ToolResult, RetrievedChunks
│ ├── stage.rs (441 lines) GuardrailStage trait, SecurityContext (session, risk, delegation chain)
│ ├── outcome.rs (286 lines) StageOutcome: Allow/Block/Transform/Escalate/Skip, Severity levels
│ ├── executor.rs (693 lines) PipelineExecutor: priority sorting, degradation, Transform propagation
│ ├── compat.rs (207 lines) LegacyAdapter bridging old SecurityStage → new GuardrailStage
│ └── mod.rs (324 lines) Legacy SecurityPipeline (kept for backward compat)
├── prompt/ System prompt protection (Phase 2A)
│ ├── template.rs (649 lines) SecureTemplate — typed placeholders {{name:type:max}}, auto-escaping
│ ├── scanner.rs (670 lines) TemplateScanner — regex+entropy secret detection in prompts
│ ├── honeytoken.rs (811 lines) HoneytokenStore — AES-256-GCM canary tokens, Aho-Corasick leak scan
│ ├── isolation.rs (546 lines) RoleIsolation — randomised [SYSTEM_START_] boundary markers
│ └── refusal.rs (550 lines) RefusalPolicy — per-severity response modes (block/redact/safe/escalate)
├── input/ Input validation (Phase 2B + 2C)
│ ├── normalization.rs (1,101 lines) NormalizationStage — NFKC, confusables, HTML, control chars, truncation
│ ├── injection.rs (784 lines) InjectionStage — composes heuristic + structural + ensemble into a stage
│ ├── patterns.rs (568 lines) 50 built-in regex patterns across 5 attack categories
│ ├── structural.rs (631 lines) StructuralAnalyzer — single-pass 5-signal text analysis
│ ├── ensemble.rs (480 lines) EnsembleScorer — 4 strategies: threshold, weighted avg, majority vote, max
│ └── spotlight.rs (578 lines) Spotlight — RAG chunk boundary marking + injection detection in chunks
└── lib.rs (143 lines) Crate root, feature gates, prelude
Plus:
How the pipeline works
Content (user text)
│
▼
PipelineExecutor::run()
│
├── Stage 1: NormalizationStage (priority 10)
│ Strips invisible Unicode, applies NFKC, maps confusable chars,
│ strips HTML tags (preserving LLM tokens like <|im_start|>),
│ decodes HTML entities. Returns Transform(clean_text) or Allow.
│
├── Stage 2: InjectionStage (priority 50)
│ ├─ HeuristicDetector: O(n) RegexSet scan across all 50 patterns
│ │ simultaneously, then individual captures for matched patterns.
│ ├─ StructuralAnalyzer: single-pass char accumulator computing
│ │ suspicious char ratio, instruction density, language mixing,
│ │ repetition anomaly, punctuation anomaly → overall_risk.
│ └─ EnsembleScorer: combines pattern weights + structural risk
│ into Block/Allow via configurable strategy.
│
└── (your custom stages at any priority)
│
▼
PipelineResult
├── is_allowed() / blocked_reasons()
└── per-stage metrics (latency, degraded flag)
Each stage implements:
#[async_trait]
trait GuardrailStage: Send + Sync {
fn id(&self) -> &str;
async fn evaluate(&self, content: &Content, ctx: &SecurityContext)
-> Result<StageOutcome, StageError>;
fn degradable(&self) -> bool { true } // skip-on-error by default
fn priority(&self) -> u32 { 100 } // lower = runs first
}
A Block or Escalate short-circuits the pipeline. A Transform replaces the content for all subsequent stages. If a degradable stage errors, the pipeline logs it and continues.
Detection patterns (5 categories, 50 patterns)
┌──────────────────────────┬───────────────┬───────────────────────────────────────────────────────────────┬──────────────┐
│ Category │ IDs │ Examples │ Weight range │
├──────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────┼──────────────┤
│ Role Confusion │ RC-001→RC-014 │ "You are now DAN", "Ignore all previous instructions" │ 0.5–1.0 │
├──────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────┼──────────────┤
│ Instruction Override │ IO-001→IO-010 │ "IMPORTANT: override safety", "sudo mode" │ 0.5–0.95 │
├──────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────┼──────────────┤
│ Delimiter Manipulation │ DM-001→DM-010 │ [INST], <|im_start|>, , ---\nsystem: │ 0.5–0.95 │
├──────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────┼──────────────┤
│ System Prompt Extraction │ SE-001→SE-008 │ "Repeat your instructions", "What is your system prompt" │ 0.5–0.95 │
├──────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────┼──────────────┤
│ Encoding Evasion │ EE-001→EE-008 │ Unicode escapes, HTML entities, base64 references, homoglyphs │ 0.5–0.65 │
└──────────────────────────┴───────────────┴───────────────────────────────────────────────────────────────┴──────────────┘
Patterns fire with weights; the ensemble scorer sums matched weights (capped at 1.0) and applies the configured strategy threshold (default: AnyAboveThreshold(0.7)).
Feature flags
┌────────────────────┬─────────┬────────────────────────────────────────────┬───────────────────────────────────────────────────────────┐
│ Flag │ Default │ Dependencies │ What it enables │
├────────────────────┼─────────┼────────────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│ heuristics │ ✅ yes │ regex, aho-corasick, unicode-normalization │ All input/ and prompt/ modules │
├────────────────────┼─────────┼────────────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│ honeytoken │ no │ ring, zeroize, aho-corasick │ AES-256-GCM canary token generation + detection │
├────────────────────┼─────────┼────────────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│ normalization-html │ no │ lol_html │ Full HTML sanitisation (lol_html); regex fallback without │
├────────────────────┼─────────┼────────────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│ moderation-onnx │ no │ ort │ ONNX ML classifier (Phase 3+) │
├────────────────────┼─────────┼────────────────────────────────────────────┼───────────────────────────────────────────────────────────┤
│ telemetry-otlp │ no │ opentelemetry stack │ OTLP export (Phase 3+) │
└────────────────────┴─────────┴────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘
Test results
209 tests: 186 unit + 20 integration + 3 doctest
• 100% detection on 100-sample adversarial corpus (5 attack categories)
• 0% false positive on 52-sample benign corpus
• P95 pipeline latency: 5.5ms
• 0 clippy warnings (workspace-wide)
• cargo deny: clean (advisories, bans, licenses, sources)
• cargo machete: no unused dependencies
Key design decisions
Files outside wg-bastion/ touched
┌─────────────────┬───────────────────────────────────────────────────────────────────────────────┐
│ File │ Change │
├─────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ Cargo.toml │ Added wg-bastion to workspace.members, added workspace.exclude for fuzz crate │
├─────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ deny.toml │ Added 3 new RUSTSEC advisories to ignore list (transitive, pre-existing) │
├─────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ SECURITY.md │ Added security policy document │
├─────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ CONTRIBUTING.md │ Already existed, unchanged │
└─────────────────┴───────────────────────────────────────────────────────────────────────────────┘