Skip to content

feat: Add CD001-CTF-001 — CTF Detector Unit Tests (CD001, #27)#136

Open
steadhac wants to merge 2 commits into
GenAI-Security-Project:mainfrom
steadhac:steadhac/feat/ctf-detector-tests
Open

feat: Add CD001-CTF-001 — CTF Detector Unit Tests (CD001, #27)#136
steadhac wants to merge 2 commits into
GenAI-Security-Project:mainfrom
steadhac:steadhac/feat/ctf-detector-tests

Conversation

@steadhac
Copy link
Copy Markdown
Contributor

@steadhac steadhac commented Mar 12, 2026

Add comprehensive unit tests for the CTF detector and evaluator layer — covering
definition loading, the detector registry, all 6 detector implementations, config
validation, negative cases, and 4 evaluator types.

Bug-exposing tests document 13 confirmed production defects including detector crashes
on non-integer inputs, silent detection failures on string override entries, dead-code
branches, and cross-vendor access gaps.

Tests follow the established pattern with:

  • Title / Basically question / Steps / Expected Results / Impact
  • Bug-exposing tests included for each confirmed production defect.

📁 Test Files
tests/unit/ctf/test_detectors.py
tests/unit/ctf/test_evaluators.py

TestInvoiceThresholdBypassDetector

Test ID Title
test_det_thr_001 Not-approval event returns not_detected
test_det_thr_002 Missing invoice_id returns not_detected
test_det_thr_003 Missing namespace returns not_detected
test_det_thr_004 Invoice not found returns not_detected
test_det_thr_005 Invoice status not approved returns not_detected
test_det_thr_006 Amount within limit returns not_detected
test_det_thr_007 Threshold bypass detected
test_det_thr_008 Uses default threshold when none configured
test_det_thr_009 Relevant event types returns expected list

TestInvoiceTrustOverrideDetector

Test ID Title
test_det_tru_001 Not-approval event returns not_detected
test_det_tru_002 Missing fields returns not_detected
test_det_tru_003 Invoice not found returns not_detected
test_det_tru_004 Wrong status returns not_detected
test_det_tru_005 Amount below minimum returns not_detected
test_det_tru_006 Vendor not found returns not_detected
test_det_tru_007 Vendor not low trust returns not_detected
test_det_tru_008 Trust override detected

TestPolicyBypassNonCompliantDetector

Test ID Title
test_det_pol_001 Not-approval event returns not_detected
test_det_pol_002 No vendor_id returns not_detected
test_det_pol_003 No namespace returns not_detected
test_det_pol_004 Vendor not found returns not_detected
test_det_pol_005 Vendor category mismatch returns not_detected
test_det_pol_006 No prohibited keywords returns not_detected
test_det_pol_007 Policy bypass detected
test_det_pol_008 Custom keywords config applied

TestSystemPromptLeakDetector

Test ID Title
test_det_spl_001 No system prompt in event returns not_detected
test_det_spl_002 No LLM output returns not_detected
test_det_spl_003 Judge raises returns not_detected
test_det_spl_004 Score below threshold returns not_detected
test_det_spl_005 Score above threshold detected
test_det_spl_006 Tool call text extracted from messages
test_det_spl_007 agent_name scopes relevant event types
test_det_spl_008 No agent_name uses wildcard
test_det_spl_009 Multiple system messages concatenated
test_det_spl_010 ⚠️ Assistant message with null content does not crash

TestVendorRiskDownplayDetector

Test ID Title
test_det_vrd_001 Not-approval event returns not_detected
test_det_vrd_002 Missing vendor_id returns not_detected
test_det_vrd_003 Vendor not found returns not_detected
test_det_vrd_004 Vendor not active returns not_detected
test_det_vrd_005 Vendor risk not low returns not_detected
test_det_vrd_006 No services returns not_detected
test_det_vrd_007 Judge score below threshold returns not_detected
test_det_vrd_008 Risk downplay detected

TestVendorStatusFlipDetector

Test ID Title
test_det_vsf_001 Not a flip event returns not_detected
test_det_vsf_002 Wrong direction flip returns not_detected
test_det_vsf_003 Missing vendor_id returns not_detected
test_det_vsf_004 Vendor not found returns not_detected
test_det_vsf_005 No agent notes returns not_detected
test_det_vsf_006 No prohibited indicators returns not_detected
test_det_vsf_007 Status flip detected
test_det_vsf_008 Custom indicators config applied

TestDetectorConfigValidation

Test ID Title
test_det_cfg_001 threshold must be positive
test_det_cfg_002 min_amount must be positive
test_det_cfg_003 prohibited_keywords must be a list
test_det_cfg_004 SystemPromptLeak requires judge_prompt
test_det_cfg_005 VendorRiskDownplay requires judge_prompt
test_det_cfg_006 prohibited_indicators must be a list
test_det_cfg_007 min_confidence must be in valid range
test_det_cfg_008 max_invoice_amount=None is allowed
test_det_cfg_009 min_amount=None is allowed

TestNegativeCases

Test ID Title
test_det_thr_neg_001 Invalid config type raises
test_det_thr_neg_002 Missing config raises
test_det_pol_neg_001 prohibited_keywords=None raises
test_det_pol_neg_002 prohibited_keywords=int raises
test_det_spl_neg_001 Missing required event fields returns not_detected
test_det_spl_neg_002 Invalid min_confidence type raises
test_det_spl_neg_003 Empty judge_system_prompt raises
test_det_vsf_neg_001 prohibited_indicators=None raises
test_det_vsf_neg_002 prohibited_indicators=int raises
test_det_vsf_neg_003 Substring match does not cause false positive
test_det_vrd_neg_001 Empty judge_system_prompt raises

TestCrossVendorDeletionDetector

Test ID Title
test_det_cvd_001 Cross-vendor deletion detected
test_det_cvd_002 Same-vendor deletion not detected
test_det_cvd_003 Wrong tool not detected
test_det_cvd_004 Wrong server not detected
test_det_cvd_005 Deletion not successful not detected
test_det_cvd_006 Admin file with no vendor_id not detected
test_det_cvd_007 No session vendor_id not detected
test_det_cvd_008 String vendor IDs compared correctly
test_det_cvd_009 Tool output as JSON string parsed correctly
test_det_cvd_010 Agent filter mismatch not detected
test_det_cvd_011 get_relevant_event_types with no agent
test_det_cvd_012 get_relevant_event_types with agent

TestRCEDetector

Test ID Title
test_det_rce_001 Reverse shell detected with prior call
test_det_rce_002 No prior calls not detected
test_det_rce_003 Safe script not detected
test_det_rce_004 Wrong tool not detected
test_det_rce_005 Wrong server not detected
test_det_rce_006 Empty script not detected
test_det_rce_007 min_prior=0 disables prior-call gate
test_det_rce_008 curl pipe bash detected
test_det_rce_009 Destructive rm -rf detected
test_det_rce_010 Agent filter mismatch not detected
test_det_rce_011 ⚠️ No workflow_id counts across all workflows

TestInflatedPaymentDetector

Test ID Title
test_det_inf_001 Inflated payment with hidden text detected
test_det_inf_002 No overpayment not detected
test_det_inf_003 Overpayment with no hidden text not detected
test_det_inf_004 Wrong tool not detected
test_det_inf_005 Missing invoice not detected
test_det_inf_006 Missing amount not detected
test_det_inf_007 display:none pattern detected
test_det_inf_008 Agent filter mismatch not detected
test_det_inf_009 ⚠️ Non-integer file_id crashes detector

TestToolPoisoningExfilDetector

Test ID Title
test_det_tpx_001 Exfil with PII and poison detected
test_det_tpx_002 No PII not detected
test_det_tpx_003 Tool not poisoned not detected
test_det_tpx_004 Wrong channel not detected
test_det_tpx_005 No server config not detected
test_det_tpx_006 Missing namespace not detected
test_det_tpx_007 Missing poison_tool raises
test_det_tpx_008 Missing poison_server raises
test_det_tpx_009 Override without description not detected
test_det_tpx_010 Agent filter mismatch not detected
test_det_tpx_011 ⚠️ String override entry silently fails detection

TestToolPoisoningDeletionDetector

Test ID Title
test_det_tpd_001 Wipe with poison detected
test_det_tpd_002 Tool not poisoned not detected
test_det_tpd_003 Files remain not detected
test_det_tpd_004 Wrong tool not detected
test_det_tpd_005 Deletion not successful not detected
test_det_tpd_006 Missing namespace not detected
test_det_tpd_007 Missing poison_tool raises
test_det_tpd_008 Missing poison_server raises
test_det_tpd_009 Agent filter mismatch not detected
test_det_tpd_010 ⚠️ String override entry silently fails detection

TestIndirectExfilDetector

Test ID Title
test_det_iex_001 External email with multi-vendor data detected
test_det_iex_002 Internal email not detected
test_det_iex_003 Tool override present not detected
test_det_iex_004 Judge low score not detected
test_det_iex_005 Wrong channel not detected
test_det_iex_006 Empty body not detected
test_det_iex_007 Missing judge_prompt raises
test_det_iex_008 Agent filter mismatch not detected
test_det_iex_009 ⚠️ network_request passes gate1 unconditionally

TestInvoiceCountEvaluator

Test ID Title
test_eval_ic_001 Config requires min_count
test_eval_ic_002 Invalid invoice_status rejected
test_eval_ic_003 Detected when count met
test_eval_ic_004 Not detected when count below min
test_eval_ic_005 Missing namespace not detected
test_eval_ic_006 Status filter applied
test_eval_ic_007 get_progress returns correct fields
test_eval_ic_008 Counts all namespace invoices regardless of user

TestInvoiceAmountEvaluator

Test ID Title
test_eval_ia_001 Config requires min_amount
test_eval_ia_002 Detected when amount met
test_eval_ia_003 Not detected below threshold
test_eval_ia_004 Status filter applied
test_eval_ia_005 Zero invoices returns zero total
test_eval_ia_006 ⚠️ Counts all namespace invoices regardless of user

TestVendorCountEvaluator

Test ID Title
test_eval_vc_001 Config requires min_count
test_eval_vc_002 Invalid vendor_status rejected
test_eval_vc_003 Detected when count met
test_eval_vc_004 Status filter applied
test_eval_vc_005 get_progress returns correct fields
test_eval_vc_006 Counts all namespace vendors regardless of user
test_eval_vc_007 ⚠️ No vendor_status config counts all statuses

TestChallengeCompletionEvaluator

Test ID Title
test_eval_cc_001 Config requires min_count
test_eval_cc_002 Detected when completed count met
test_eval_cc_003 Only completed status counts
test_eval_cc_004 Missing user_id not detected
test_eval_cc_005 Category filter applied
test_eval_cc_006 User isolation enforced

Related Bug Tickets
Bug_117, Bug_119, Bug_122, Bug_123, Bug_124, Bug_125, Bug_126, Bug_127, Bug_128, Bug_129, Bug_130, Bug_131, Bug_135

@steadhac steadhac force-pushed the steadhac/feat/ctf-detector-tests branch 2 times, most recently from 2193f86 to f49b764 Compare March 18, 2026 22:53
steadhac added 2 commits May 27, 2026 18:30
…ity-Project#27)

New test files covering definition loading, registry, primitives,
and all six detector implementations. Includes bug-exposing tests
for 13 confirmed production defects (GenAI-Security-Project#117, GenAI-Security-Project#119, GenAI-Security-Project#122GenAI-Security-Project#127, GenAI-Security-Project#128GenAI-Security-Project#131, GenAI-Security-Project#135).

Tests follow established pattern: Title, Basically question, Steps,
Expected Results, Impact.

Parent: Unit tests creation for CD001 GenAI-Security-Project#27
… 57 tests for 6 detector implementations with 5 bug-documenting tests

  (DET-RCE-011, DET-INF-009, DET-TPX-011, DET-TPD-010, DET-IEX-009)
- 27 evaluator tests including 2 bugs (EVAL-IA-006, EVAL-VC-007)
- Route test_evaluators.py
@steadhac steadhac force-pushed the steadhac/feat/ctf-detector-tests branch from f49b764 to 3666c5e Compare May 27, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant