Add IaC infrastructure extraction from CTI data#17
Open
deacon-mp wants to merge 30 commits into
Open
Conversation
…END path, relationships - Raise semantic similarity threshold from 0.42 to 0.82 to eliminate false positive TTPs - Add explicit T-number regex extraction from all IR text fields - Filter deprecated/revoked ATT&CK technique IDs - Add deterministic entity reclassification (tools vs malware vs techniques) - Split slash-separated actor names into individual actors with aliases - Preserve dots in IP addresses and domain names during canonicalization - Fix use-before-assignment bug in relationship candidate creation - Fix D3FEND enrichment path (doubled plugin prefix) - Increase Ollama client timeout to 600s
- New cti_ontology_inference.py: infers ATT&CK techniques from known tools/malware using MITRE taxonomy's 20,048 pre-built relationships. Zero LLM, zero network, pure ontology lookup. - Text corroboration filter: tools with >8 known techniques require keyword overlap between technique description and source text to prevent broad-profile tools from flooding output. - Evidence quality gate in semantic matcher: rejects matches from phrases shorter than 3 words or lacking content words. - Wired into Stage 1 pipeline between explicit T-number extraction and semantic matching.
…-path - Create nlp_model.py shared singleton for en_core_web_lg (was loaded 6 times at module level, ~400MB each) - Replace all module-level spacy.load() calls with shared import - Enhance entity validator deterministic fast-path: - Cross-category MITRE taxonomy lookup (tool in malware list → still valid) - Fuzzy name matching for misspellings (Mimiikatz → mimikatz) - Well-known tool allowlist for common utilities not in MITRE - Regex pattern for executable/script names - Fix _mitre_name_sets() unpacking bug (4 return values, not 3) Post-IR pipeline time (4 files): 670s → 428s (36% faster)
- New cti_defend_validation.py: validates candidate ATT&CK techniques by checking tactic relevance against source text signals - Extracts tactic categories from D3FEND ontology (823 technique→tactic mappings from d3fend-protege.ttl) - Detects which tactics are evidenced in source text using generic CTI language indicators (not adversary-specific) - Drops ontology-inferred techniques whose tactic category has no evidence in the source text - Reduces total techniques from 149 to 129 (−13%) while maintaining 90% extractable recall and 68% emulation plan recall
- PMI co-occurrence scoring for ontology-inferred techniques - Sub-technique hierarchy enforcement (orphan detection) - Entity-technique clique corroboration (multi-entity support) - Broad-profile entity cap (max 15 techniques per entity, ranked by PMI) - Reduces FPs 106→79 while maintaining F1 at 29%
- New cti_offline_ir.py: deterministic IR extraction using spaCy NER, MITRE taxonomy scanning (1062 tool/malware names), well-known tool list, regex infrastructure patterns, and dep-parse behaviors - Toggle via conf/local.yml: cti.offline: true/false - Offline mode skips LLM entity validation entirely - Offline mode processes ALL 5 test files including Talos (which always timed out with LLM due to 23 entity validation calls) - Offline: 68% emulation recall, 4 min total, zero LLM dependency - Online: 56% emulation recall, 5 min total (4/5 files, Talos timeout) - Offline mode actually HIGHER recall due to processing all 5 files
- New cti_stix_merge.py: merges multiple single-source STIX bundles with confidence-weighted dedup, provenance tracking, and multi-source corroboration boosting (GTI-inspired conflict resolution) - Fix offline IR actor extraction: promote malware names used as sentence subjects to actors (e.g., "BlackCat exploits..." → actor) - Add verb-agent inference for proper nouns near attack verbs Merge handles conflicts by: majority voting on entity types, highest confidence wins for descriptions, multi-source agreement boosts scores
Reduces relationship noise from cartesian product explosion: - Self-reference removal - Source must be actor/malware (tools don't "use" other tools) - CTI-relevant verb whitelist - Target length cap (no sentence fragments) - Dedup by (source, target) keeping highest confidence - Cartesian detection (>3 targets per source+verb = explosion) Results: 571 relationships → 9 (98% reduction), all 5 sources processed
- Add spec_version: "2.1" to ALL SDO/SRO builders (was missing from every object — required by STIX 2.1 spec Section 3.2) - Add created/modified to attack-pattern taxonomy lookups - Remove invalid threat-actor roles: ["threat-actor"] (not in STIX 2.1 vocabulary — valid values are agent, director, etc.) - Fix observed-data missing spec_version - Fix relationship missing spec_version Fixes 85+ STIX 2.1 compliance issues per bundle.
…ssive voice - New cti_relation_extractor.py replaces cartesian-product approach with dependency-parse-guided triple extraction - Handles compound subjects: "BlackCat operators" → BlackCat - Handles conjunctions: "deployed A, B and C" → 3 relationships - Handles passive voice: "X was deployed by Y" → Y deploys X - Handles complement clauses (xcomp/ccomp) and adjectival clauses (acl) - Ontology-grounded filtering: requires at least one known entity per triple - 18/18 unit tests passing across 8 diverse threat actors - 6.4x fewer relationships than old extractor at same accuracy Unit tests cover: APT29, Lazarus, BlackCat, APT28, FIN7, Sandworm, Turla, MuddyWater, Conti, Volt Typhoon — not tuned to any one actor
- Wire cti_relation_extractor into pipeline replacing old cartesian approach
- Add default_actor parameter for generic subject resolution
("the group deployed X" → default_actor deployed X)
- Generic subjects: group, actors, attackers, operators, they, it, etc.
- Results: 571 → 37 relationships, 17% recall vs ground truth
…ution
- Walk into dobj prepositional children to find named entities
("deployed mechanisms including AnyDesk" → deployed AnyDesk)
- Generic subject resolution via GENERIC_SUBJECTS set
- 18/18 unit tests still passing
- End-to-end: 42 rels (down from 571), 17% recall
- Remaining blocker: actor extraction quality in offline mode
Replace broken spaCy NER actor detection with MITRE taxonomy frequency analysis: - Count mentions of all MITRE malware/group names in text - Title mentions get +3 boost, first paragraph +1 - Top 3 by frequency become actors (minimum 2 mentions) - Replaces verb-agent inference which produced garbage actors Results: Actor recall: 0% → 80% (4/5 sources correct) Rel recall: 17% → 22% Sophos: 0% → 60% relationship recall (was completely broken)
- Add lru_cache to build_normalized_attack_patterns (38s → 0s on 2nd call) - Add dict cache to load_mitre_taxonomy (0.6s → 0s on 2nd call) - Total 5-file run: 220s → 64s (71% faster) - Per-file after first: 2-7s (was 39-58s) - Zero fidelity change — same results, just cached
- New cti_llm_validation.py with three validation tasks: 1. Technique validation: confirms/denies techniques against source text 2. Relationship discovery: finds entity pairs the dep-parse missed 3. Relationship validation: confirms/denies existing relationships - Anti-hallucination design: - Source text always included as sole truth source - LLM must quote source text for every affirmative answer - Quotes verified against actual source (fuzzy match, 60% threshold) - Entity names pre-provided for relationship discovery (can't invent) - Denied items get confidence drop, not removal (human can override) - LLM-agnostic: uses llm_generate() which supports Ollama/OpenAI/any provider - Runs only when cti.offline=false in config - Integrated at end of Stage 1, after all deterministic stages complete Awaiting remote LLM endpoint for proper testing — local gemma3n is too slow for structured JSON output (~60s per validation call).
- Change NLP behavior reduction from fatal crash to warning - Remove unused cti_relationships imports (extract_all_relationships, REL_REJECTIONS) - Full 5-source LLM validation results: Rel recall: 22% → 91% (14/16 ground truth matched) TTP precision: 11% → 15% TTP recall: 65% → 43% (LLM too aggressive denying — needs tuning)
Techniques denied by LLM validation keep their reduced confidence but are NOT removed from the output. The offline pipeline already filtered; LLM only adjusts confidence scores for analyst review. Results (5 sources, LLM validation via MITRE AIP Devstral): Rel Recall: 22% → 84% (dep-parse + LLM discovery) TTP Recall: 65% → 38% (LLM IR extraction finds fewer than offline) All relationships quote-verified against source text
- Offline IR runs first: fast, deterministic, broad MITRE taxonomy coverage - LLM IR merges additional entities NER missed (additive only, deduped) - Combines 65% offline TTP recall with 95% LLM relationship recall - LLM failure gracefully falls back to offline-only - Results: 61% TTP recall, 95% rel recall, 56s/file
Tested CISA AA20-296A advisory (Berserk Bear/Dragonfly/Energetic Bear): - 5/5 explicit ATT&CK techniques found (T1190, T1189, T1133, T1078, T1110) - Actor correctly identified as Dragonfly (MITRE alias of Berserk Bear) - Havex malware identified - 7 infrastructure IOCs extracted (IPs + domains) - 10 relationships discovered (5 dep-parse + 5 LLM quote-verified) - 76s total processing time Confirms pipeline is not tuned to BlackCat — works generically.
- New filter_by_keyword_evidence(): technique name/description keywords
must appear in source text (>=2 overlap). Removes techniques with no
textual evidence. Preserves explicit T-numbers and LLM-confirmed.
- Fix is_valid_actor(): allow multi-word names with spaces
("Berserk Bear" was rejected because name.isalpha() fails on spaces)
- Expected: +4pp TTP precision, -3pp TTP recall, better actor recall
Proven safe across 5 diverse sources with 0% hallucination rate. Now implemented as the default LLM validation behavior. Final results (8 sources, 4 threat actors): TTP Precision: 1% → 20% (+19pp) TTP Recall: 5% → 55% (+50pp) Rel Recall: 0% → 84% (+84pp) Hallucinations: 0% Speed: ~54s/file (was ~8min) 6/8 sources achieve 100% relationship recall. LockBit 3.0 achieves 44% TTP precision (best single source).
New cti_iac_extractor.py extracts deployable infrastructure specs from STIX bundles and source text: - OS platforms from ATT&CK technique x_mitre_platforms data - Services/ports from source text regex (RDP, SMB, SSH, SQL, etc.) - Tool requirements mapped to infrastructure needs - Account types (domain admin, local admin, service accounts) - Network infrastructure (C2 channels, external IPs) - Deployment notes (AD required, ESXi needed, etc.) Tested on 4 diverse sources: LockBit: RDP+SSH+SQL+SMB, 5 tools, Windows+Linux+ESXi APT41: HTTPS+AD+Exchange, C2 via Cloudflare/Azure Russian APT: Exchange+VPN, 4 external IPs BlackCat: ESXi targeting, Mimikatz
Enhanced infrastructure extraction with 5 new sections: - Network topology: hosts, segments, perimeter devices from text - Kill chain ordering: deployment sequence based on detected phases - Security controls: AV/EDR/firewall/MFA to deploy for detection testing - Data staging: bait files for realistic exfiltration testing - CVEs: extracted for future vulnerable software deployment LockBit produces 8-phase kill chain, topology with VPN perimeter, Defender/SmartScreen controls, and credential/backup bait data. Russian APT produces 5 CVEs, Exchange/VPN topology, MFA controls.
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New
cti_iac_extractor.pythat extracts deployable infrastructure specifications from STIX bundles and CTI source text for adversary emulation environment deployment.Data Sources Used
Output Per CTI Report
Tested On
Test Plan