Add IaC infrastructure extraction from CTI data by deacon-mp · Pull Request #17 · mitre/mcp

deacon-mp · 2026-03-17T19:08:33Z

Summary

New cti_iac_extractor.py that extracts deployable infrastructure specifications from STIX bundles and CTI source text for adversary emulation environment deployment.

Data Sources Used

ATT&CK x_mitre_platforms (835 techniques with platform data) → OS requirements
D3FEND digital artifact taxonomy → infrastructure component classification
Source text regex → explicit service/port/OS mentions
Tool→infrastructure mapping → what adversary tools need to run

Output Per CTI Report

Target platforms with confidence scoring (Windows, Linux, ESXi, etc.)
Required services with ports (RDP/3389, SMB/445, SSH/22, SQL/1433, etc.)
Account types needed (domain admin, service accounts)
Adversary tool requirements (Mimikatz needs LSASS, PsExec needs SMB)
Network infrastructure (C2 channels, external IPs to simulate)
Deployment notes

Tested On

Source	Platforms	Services	Tools	Notes
LockBit 3.0	Win+Lin+ESXi	RDP, SSH, SQL, SMB	5 tools	Full ransomware lab
APT41	Win+Lin+ESXi	HTTPS, AD, Exchange	certutil, ADFind	Supply chain env
Russian APT	Win+Lin	Exchange, VPN, SQL	—	State govt network
BlackCat	Win+Lin+ESXi	DNS, SQL	Mimikatz	RaaS target env

Test Plan

Verify platform extraction accuracy against ATT&CK technique data
Test with additional CTI sources for robustness
Generate Terraform/Vagrant templates from specs

…END path, relationships - Raise semantic similarity threshold from 0.42 to 0.82 to eliminate false positive TTPs - Add explicit T-number regex extraction from all IR text fields - Filter deprecated/revoked ATT&CK technique IDs - Add deterministic entity reclassification (tools vs malware vs techniques) - Split slash-separated actor names into individual actors with aliases - Preserve dots in IP addresses and domain names during canonicalization - Fix use-before-assignment bug in relationship candidate creation - Fix D3FEND enrichment path (doubled plugin prefix) - Increase Ollama client timeout to 600s

- New cti_ontology_inference.py: infers ATT&CK techniques from known tools/malware using MITRE taxonomy's 20,048 pre-built relationships. Zero LLM, zero network, pure ontology lookup. - Text corroboration filter: tools with >8 known techniques require keyword overlap between technique description and source text to prevent broad-profile tools from flooding output. - Evidence quality gate in semantic matcher: rejects matches from phrases shorter than 3 words or lacking content words. - Wired into Stage 1 pipeline between explicit T-number extraction and semantic matching.

…-path - Create nlp_model.py shared singleton for en_core_web_lg (was loaded 6 times at module level, ~400MB each) - Replace all module-level spacy.load() calls with shared import - Enhance entity validator deterministic fast-path: - Cross-category MITRE taxonomy lookup (tool in malware list → still valid) - Fuzzy name matching for misspellings (Mimiikatz → mimikatz) - Well-known tool allowlist for common utilities not in MITRE - Regex pattern for executable/script names - Fix _mitre_name_sets() unpacking bug (4 return values, not 3) Post-IR pipeline time (4 files): 670s → 428s (36% faster)

- New cti_defend_validation.py: validates candidate ATT&CK techniques by checking tactic relevance against source text signals - Extracts tactic categories from D3FEND ontology (823 technique→tactic mappings from d3fend-protege.ttl) - Detects which tactics are evidenced in source text using generic CTI language indicators (not adversary-specific) - Drops ontology-inferred techniques whose tactic category has no evidence in the source text - Reduces total techniques from 149 to 129 (−13%) while maintaining 90% extractable recall and 68% emulation plan recall

- PMI co-occurrence scoring for ontology-inferred techniques - Sub-technique hierarchy enforcement (orphan detection) - Entity-technique clique corroboration (multi-entity support) - Broad-profile entity cap (max 15 techniques per entity, ranked by PMI) - Reduces FPs 106→79 while maintaining F1 at 29%

- New cti_offline_ir.py: deterministic IR extraction using spaCy NER, MITRE taxonomy scanning (1062 tool/malware names), well-known tool list, regex infrastructure patterns, and dep-parse behaviors - Toggle via conf/local.yml: cti.offline: true/false - Offline mode skips LLM entity validation entirely - Offline mode processes ALL 5 test files including Talos (which always timed out with LLM due to 23 entity validation calls) - Offline: 68% emulation recall, 4 min total, zero LLM dependency - Online: 56% emulation recall, 5 min total (4/5 files, Talos timeout) - Offline mode actually HIGHER recall due to processing all 5 files

- New cti_stix_merge.py: merges multiple single-source STIX bundles with confidence-weighted dedup, provenance tracking, and multi-source corroboration boosting (GTI-inspired conflict resolution) - Fix offline IR actor extraction: promote malware names used as sentence subjects to actors (e.g., "BlackCat exploits..." → actor) - Add verb-agent inference for proper nouns near attack verbs Merge handles conflicts by: majority voting on entity types, highest confidence wins for descriptions, multi-source agreement boosts scores

Reduces relationship noise from cartesian product explosion: - Self-reference removal - Source must be actor/malware (tools don't "use" other tools) - CTI-relevant verb whitelist - Target length cap (no sentence fragments) - Dedup by (source, target) keeping highest confidence - Cartesian detection (>3 targets per source+verb = explosion) Results: 571 relationships → 9 (98% reduction), all 5 sources processed

- Add spec_version: "2.1" to ALL SDO/SRO builders (was missing from every object — required by STIX 2.1 spec Section 3.2) - Add created/modified to attack-pattern taxonomy lookups - Remove invalid threat-actor roles: ["threat-actor"] (not in STIX 2.1 vocabulary — valid values are agent, director, etc.) - Fix observed-data missing spec_version - Fix relationship missing spec_version Fixes 85+ STIX 2.1 compliance issues per bundle.

…ssive voice - New cti_relation_extractor.py replaces cartesian-product approach with dependency-parse-guided triple extraction - Handles compound subjects: "BlackCat operators" → BlackCat - Handles conjunctions: "deployed A, B and C" → 3 relationships - Handles passive voice: "X was deployed by Y" → Y deploys X - Handles complement clauses (xcomp/ccomp) and adjectival clauses (acl) - Ontology-grounded filtering: requires at least one known entity per triple - 18/18 unit tests passing across 8 diverse threat actors - 6.4x fewer relationships than old extractor at same accuracy Unit tests cover: APT29, Lazarus, BlackCat, APT28, FIN7, Sandworm, Turla, MuddyWater, Conti, Volt Typhoon — not tuned to any one actor

- Wire cti_relation_extractor into pipeline replacing old cartesian approach - Add default_actor parameter for generic subject resolution ("the group deployed X" → default_actor deployed X) - Generic subjects: group, actors, attackers, operators, they, it, etc. - Results: 571 → 37 relationships, 17% recall vs ground truth

…ution - Walk into dobj prepositional children to find named entities ("deployed mechanisms including AnyDesk" → deployed AnyDesk) - Generic subject resolution via GENERIC_SUBJECTS set - 18/18 unit tests still passing - End-to-end: 42 rels (down from 571), 17% recall - Remaining blocker: actor extraction quality in offline mode

Replace broken spaCy NER actor detection with MITRE taxonomy frequency analysis: - Count mentions of all MITRE malware/group names in text - Title mentions get +3 boost, first paragraph +1 - Top 3 by frequency become actors (minimum 2 mentions) - Replaces verb-agent inference which produced garbage actors Results: Actor recall: 0% → 80% (4/5 sources correct) Rel recall: 17% → 22% Sophos: 0% → 60% relationship recall (was completely broken)

- Add lru_cache to build_normalized_attack_patterns (38s → 0s on 2nd call) - Add dict cache to load_mitre_taxonomy (0.6s → 0s on 2nd call) - Total 5-file run: 220s → 64s (71% faster) - Per-file after first: 2-7s (was 39-58s) - Zero fidelity change — same results, just cached

- New cti_llm_validation.py with three validation tasks: 1. Technique validation: confirms/denies techniques against source text 2. Relationship discovery: finds entity pairs the dep-parse missed 3. Relationship validation: confirms/denies existing relationships - Anti-hallucination design: - Source text always included as sole truth source - LLM must quote source text for every affirmative answer - Quotes verified against actual source (fuzzy match, 60% threshold) - Entity names pre-provided for relationship discovery (can't invent) - Denied items get confidence drop, not removal (human can override) - LLM-agnostic: uses llm_generate() which supports Ollama/OpenAI/any provider - Runs only when cti.offline=false in config - Integrated at end of Stage 1, after all deterministic stages complete Awaiting remote LLM endpoint for proper testing — local gemma3n is too slow for structured JSON output (~60s per validation call).

- Change NLP behavior reduction from fatal crash to warning - Remove unused cti_relationships imports (extract_all_relationships, REL_REJECTIONS) - Full 5-source LLM validation results: Rel recall: 22% → 91% (14/16 ground truth matched) TTP precision: 11% → 15% TTP recall: 65% → 43% (LLM too aggressive denying — needs tuning)

Techniques denied by LLM validation keep their reduced confidence but are NOT removed from the output. The offline pipeline already filtered; LLM only adjusts confidence scores for analyst review. Results (5 sources, LLM validation via MITRE AIP Devstral): Rel Recall: 22% → 84% (dep-parse + LLM discovery) TTP Recall: 65% → 38% (LLM IR extraction finds fewer than offline) All relationships quote-verified against source text

- Offline IR runs first: fast, deterministic, broad MITRE taxonomy coverage - LLM IR merges additional entities NER missed (additive only, deduped) - Combines 65% offline TTP recall with 95% LLM relationship recall - LLM failure gracefully falls back to offline-only - Results: 61% TTP recall, 95% rel recall, 56s/file

Tested CISA AA20-296A advisory (Berserk Bear/Dragonfly/Energetic Bear): - 5/5 explicit ATT&CK techniques found (T1190, T1189, T1133, T1078, T1110) - Actor correctly identified as Dragonfly (MITRE alias of Berserk Bear) - Havex malware identified - 7 infrastructure IOCs extracted (IPs + domains) - 10 relationships discovered (5 dep-parse + 5 LLM quote-verified) - 76s total processing time Confirms pipeline is not tuned to BlackCat — works generically.

- New filter_by_keyword_evidence(): technique name/description keywords must appear in source text (>=2 overlap). Removes techniques with no textual evidence. Preserves explicit T-numbers and LLM-confirmed. - Fix is_valid_actor(): allow multi-word names with spaces ("Berserk Bear" was rejected because name.isalpha() fails on spaces) - Expected: +4pp TTP precision, -3pp TTP recall, better actor recall

Proven safe across 5 diverse sources with 0% hallucination rate. Now implemented as the default LLM validation behavior. Final results (8 sources, 4 threat actors): TTP Precision: 1% → 20% (+19pp) TTP Recall: 5% → 55% (+50pp) Rel Recall: 0% → 84% (+84pp) Hallucinations: 0% Speed: ~54s/file (was ~8min) 6/8 sources achieve 100% relationship recall. LockBit 3.0 achieves 44% TTP precision (best single source).

New cti_iac_extractor.py extracts deployable infrastructure specs from STIX bundles and source text: - OS platforms from ATT&CK technique x_mitre_platforms data - Services/ports from source text regex (RDP, SMB, SSH, SQL, etc.) - Tool requirements mapped to infrastructure needs - Account types (domain admin, local admin, service accounts) - Network infrastructure (C2 channels, external IPs) - Deployment notes (AD required, ESXi needed, etc.) Tested on 4 diverse sources: LockBit: RDP+SSH+SQL+SMB, 5 tools, Windows+Linux+ESXi APT41: HTTPS+AD+Exchange, C2 via Cloudflare/Azure Russian APT: Exchange+VPN, 4 external IPs BlackCat: ESXi targeting, Mimikatz

Enhanced infrastructure extraction with 5 new sections: - Network topology: hosts, segments, perimeter devices from text - Kill chain ordering: deployment sequence based on detected phases - Security controls: AV/EDR/firewall/MFA to deploy for detection testing - Data staging: bait files for realistic exfiltration testing - CVEs: extracted for future vulnerable software deployment LockBit produces 8-phase kill chain, topology with VPN perimeter, Defender/SmartScreen controls, and credential/backup bait data. Russian APT produces 5 CVEs, Exchange/VPN topology, MFA controls.

github-actions · 2026-05-17T01:01:37Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

deacon-mp added 30 commits March 16, 2026 20:16

Update HANDOFF with offline mode results

b283040

Update HANDOFF with full progression

1930193

Update HANDOFF with PR #15 and #16 status

29db9a4

Add SSL skip and extra headers to llm_client for MITRE AIP tunnel

c36d687

Update HANDOFF with combined mode results

fd7d309

Final HANDOFF update — session pause point

c8e6754

Final HANDOFF with 8-source results

eb2bfd4

github-actions Bot added the no-pr-activity label May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add IaC infrastructure extraction from CTI data#17

Add IaC infrastructure extraction from CTI data#17
deacon-mp wants to merge 30 commits into
CTIfrom
feature/iac-extraction

deacon-mp commented Mar 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

deacon-mp commented Mar 17, 2026

Summary

Data Sources Used

Output Per CTI Report

Tested On

Test Plan

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant