Skip to content

Add IaC infrastructure extraction from CTI data#17

Open
deacon-mp wants to merge 30 commits into
CTIfrom
feature/iac-extraction
Open

Add IaC infrastructure extraction from CTI data#17
deacon-mp wants to merge 30 commits into
CTIfrom
feature/iac-extraction

Conversation

@deacon-mp

Copy link
Copy Markdown
Collaborator

Summary

New cti_iac_extractor.py that extracts deployable infrastructure specifications from STIX bundles and CTI source text for adversary emulation environment deployment.

Data Sources Used

  • ATT&CK x_mitre_platforms (835 techniques with platform data) → OS requirements
  • D3FEND digital artifact taxonomy → infrastructure component classification
  • Source text regex → explicit service/port/OS mentions
  • Tool→infrastructure mapping → what adversary tools need to run

Output Per CTI Report

  • Target platforms with confidence scoring (Windows, Linux, ESXi, etc.)
  • Required services with ports (RDP/3389, SMB/445, SSH/22, SQL/1433, etc.)
  • Account types needed (domain admin, service accounts)
  • Adversary tool requirements (Mimikatz needs LSASS, PsExec needs SMB)
  • Network infrastructure (C2 channels, external IPs to simulate)
  • Deployment notes

Tested On

Source Platforms Services Tools Notes
LockBit 3.0 Win+Lin+ESXi RDP, SSH, SQL, SMB 5 tools Full ransomware lab
APT41 Win+Lin+ESXi HTTPS, AD, Exchange certutil, ADFind Supply chain env
Russian APT Win+Lin Exchange, VPN, SQL State govt network
BlackCat Win+Lin+ESXi DNS, SQL Mimikatz RaaS target env

Test Plan

  • Verify platform extraction accuracy against ATT&CK technique data
  • Test with additional CTI sources for robustness
  • Generate Terraform/Vagrant templates from specs

…END path, relationships

- Raise semantic similarity threshold from 0.42 to 0.82 to eliminate false positive TTPs
- Add explicit T-number regex extraction from all IR text fields
- Filter deprecated/revoked ATT&CK technique IDs
- Add deterministic entity reclassification (tools vs malware vs techniques)
- Split slash-separated actor names into individual actors with aliases
- Preserve dots in IP addresses and domain names during canonicalization
- Fix use-before-assignment bug in relationship candidate creation
- Fix D3FEND enrichment path (doubled plugin prefix)
- Increase Ollama client timeout to 600s
- New cti_ontology_inference.py: infers ATT&CK techniques from known
  tools/malware using MITRE taxonomy's 20,048 pre-built relationships.
  Zero LLM, zero network, pure ontology lookup.
- Text corroboration filter: tools with >8 known techniques require
  keyword overlap between technique description and source text to
  prevent broad-profile tools from flooding output.
- Evidence quality gate in semantic matcher: rejects matches from
  phrases shorter than 3 words or lacking content words.
- Wired into Stage 1 pipeline between explicit T-number extraction
  and semantic matching.
…-path

- Create nlp_model.py shared singleton for en_core_web_lg (was loaded
  6 times at module level, ~400MB each)
- Replace all module-level spacy.load() calls with shared import
- Enhance entity validator deterministic fast-path:
  - Cross-category MITRE taxonomy lookup (tool in malware list → still valid)
  - Fuzzy name matching for misspellings (Mimiikatz → mimikatz)
  - Well-known tool allowlist for common utilities not in MITRE
  - Regex pattern for executable/script names
- Fix _mitre_name_sets() unpacking bug (4 return values, not 3)

Post-IR pipeline time (4 files): 670s → 428s (36% faster)
- New cti_defend_validation.py: validates candidate ATT&CK techniques
  by checking tactic relevance against source text signals
- Extracts tactic categories from D3FEND ontology (823 technique→tactic
  mappings from d3fend-protege.ttl)
- Detects which tactics are evidenced in source text using generic CTI
  language indicators (not adversary-specific)
- Drops ontology-inferred techniques whose tactic category has no
  evidence in the source text
- Reduces total techniques from 149 to 129 (−13%) while maintaining
  90% extractable recall and 68% emulation plan recall
- PMI co-occurrence scoring for ontology-inferred techniques
- Sub-technique hierarchy enforcement (orphan detection)
- Entity-technique clique corroboration (multi-entity support)
- Broad-profile entity cap (max 15 techniques per entity, ranked by PMI)
- Reduces FPs 106→79 while maintaining F1 at 29%
- New cti_offline_ir.py: deterministic IR extraction using spaCy NER,
  MITRE taxonomy scanning (1062 tool/malware names), well-known tool
  list, regex infrastructure patterns, and dep-parse behaviors
- Toggle via conf/local.yml: cti.offline: true/false
- Offline mode skips LLM entity validation entirely
- Offline mode processes ALL 5 test files including Talos (which
  always timed out with LLM due to 23 entity validation calls)
- Offline: 68% emulation recall, 4 min total, zero LLM dependency
- Online:  56% emulation recall, 5 min total (4/5 files, Talos timeout)
- Offline mode actually HIGHER recall due to processing all 5 files
- New cti_stix_merge.py: merges multiple single-source STIX bundles
  with confidence-weighted dedup, provenance tracking, and multi-source
  corroboration boosting (GTI-inspired conflict resolution)
- Fix offline IR actor extraction: promote malware names used as
  sentence subjects to actors (e.g., "BlackCat exploits..." → actor)
- Add verb-agent inference for proper nouns near attack verbs

Merge handles conflicts by: majority voting on entity types, highest
confidence wins for descriptions, multi-source agreement boosts scores
Reduces relationship noise from cartesian product explosion:
- Self-reference removal
- Source must be actor/malware (tools don't "use" other tools)
- CTI-relevant verb whitelist
- Target length cap (no sentence fragments)
- Dedup by (source, target) keeping highest confidence
- Cartesian detection (>3 targets per source+verb = explosion)

Results: 571 relationships → 9 (98% reduction), all 5 sources processed
- Add spec_version: "2.1" to ALL SDO/SRO builders (was missing from
  every object — required by STIX 2.1 spec Section 3.2)
- Add created/modified to attack-pattern taxonomy lookups
- Remove invalid threat-actor roles: ["threat-actor"] (not in STIX
  2.1 vocabulary — valid values are agent, director, etc.)
- Fix observed-data missing spec_version
- Fix relationship missing spec_version

Fixes 85+ STIX 2.1 compliance issues per bundle.
…ssive voice

- New cti_relation_extractor.py replaces cartesian-product approach
  with dependency-parse-guided triple extraction
- Handles compound subjects: "BlackCat operators" → BlackCat
- Handles conjunctions: "deployed A, B and C" → 3 relationships
- Handles passive voice: "X was deployed by Y" → Y deploys X
- Handles complement clauses (xcomp/ccomp) and adjectival clauses (acl)
- Ontology-grounded filtering: requires at least one known entity per triple
- 18/18 unit tests passing across 8 diverse threat actors
- 6.4x fewer relationships than old extractor at same accuracy

Unit tests cover: APT29, Lazarus, BlackCat, APT28, FIN7, Sandworm,
Turla, MuddyWater, Conti, Volt Typhoon — not tuned to any one actor
- Wire cti_relation_extractor into pipeline replacing old cartesian approach
- Add default_actor parameter for generic subject resolution
  ("the group deployed X" → default_actor deployed X)
- Generic subjects: group, actors, attackers, operators, they, it, etc.
- Results: 571 → 37 relationships, 17% recall vs ground truth
…ution

- Walk into dobj prepositional children to find named entities
  ("deployed mechanisms including AnyDesk" → deployed AnyDesk)
- Generic subject resolution via GENERIC_SUBJECTS set
- 18/18 unit tests still passing
- End-to-end: 42 rels (down from 571), 17% recall
- Remaining blocker: actor extraction quality in offline mode
Replace broken spaCy NER actor detection with MITRE taxonomy
frequency analysis:
- Count mentions of all MITRE malware/group names in text
- Title mentions get +3 boost, first paragraph +1
- Top 3 by frequency become actors (minimum 2 mentions)
- Replaces verb-agent inference which produced garbage actors

Results:
  Actor recall:  0% → 80% (4/5 sources correct)
  Rel recall:    17% → 22%
  Sophos:        0% → 60% relationship recall (was completely broken)
- Add lru_cache to build_normalized_attack_patterns (38s → 0s on 2nd call)
- Add dict cache to load_mitre_taxonomy (0.6s → 0s on 2nd call)
- Total 5-file run: 220s → 64s (71% faster)
- Per-file after first: 2-7s (was 39-58s)
- Zero fidelity change — same results, just cached
- New cti_llm_validation.py with three validation tasks:
  1. Technique validation: confirms/denies techniques against source text
  2. Relationship discovery: finds entity pairs the dep-parse missed
  3. Relationship validation: confirms/denies existing relationships
- Anti-hallucination design:
  - Source text always included as sole truth source
  - LLM must quote source text for every affirmative answer
  - Quotes verified against actual source (fuzzy match, 60% threshold)
  - Entity names pre-provided for relationship discovery (can't invent)
  - Denied items get confidence drop, not removal (human can override)
- LLM-agnostic: uses llm_generate() which supports Ollama/OpenAI/any provider
- Runs only when cti.offline=false in config
- Integrated at end of Stage 1, after all deterministic stages complete

Awaiting remote LLM endpoint for proper testing — local gemma3n is too
slow for structured JSON output (~60s per validation call).
- Change NLP behavior reduction from fatal crash to warning
- Remove unused cti_relationships imports (extract_all_relationships, REL_REJECTIONS)
- Full 5-source LLM validation results:
  Rel recall: 22% → 91% (14/16 ground truth matched)
  TTP precision: 11% → 15%
  TTP recall: 65% → 43% (LLM too aggressive denying — needs tuning)
Techniques denied by LLM validation keep their reduced confidence
but are NOT removed from the output. The offline pipeline already
filtered; LLM only adjusts confidence scores for analyst review.

Results (5 sources, LLM validation via MITRE AIP Devstral):
  Rel Recall: 22% → 84% (dep-parse + LLM discovery)
  TTP Recall: 65% → 38% (LLM IR extraction finds fewer than offline)
  All relationships quote-verified against source text
- Offline IR runs first: fast, deterministic, broad MITRE taxonomy coverage
- LLM IR merges additional entities NER missed (additive only, deduped)
- Combines 65% offline TTP recall with 95% LLM relationship recall
- LLM failure gracefully falls back to offline-only
- Results: 61% TTP recall, 95% rel recall, 56s/file
Tested CISA AA20-296A advisory (Berserk Bear/Dragonfly/Energetic Bear):
- 5/5 explicit ATT&CK techniques found (T1190, T1189, T1133, T1078, T1110)
- Actor correctly identified as Dragonfly (MITRE alias of Berserk Bear)
- Havex malware identified
- 7 infrastructure IOCs extracted (IPs + domains)
- 10 relationships discovered (5 dep-parse + 5 LLM quote-verified)
- 76s total processing time

Confirms pipeline is not tuned to BlackCat — works generically.
- New filter_by_keyword_evidence(): technique name/description keywords
  must appear in source text (>=2 overlap). Removes techniques with no
  textual evidence. Preserves explicit T-numbers and LLM-confirmed.
- Fix is_valid_actor(): allow multi-word names with spaces
  ("Berserk Bear" was rejected because name.isalpha() fails on spaces)
- Expected: +4pp TTP precision, -3pp TTP recall, better actor recall
Proven safe across 5 diverse sources with 0% hallucination rate.
Now implemented as the default LLM validation behavior.

Final results (8 sources, 4 threat actors):
  TTP Precision: 1% → 20% (+19pp)
  TTP Recall:    5% → 55% (+50pp)
  Rel Recall:    0% → 84% (+84pp)
  Hallucinations: 0%
  Speed: ~54s/file (was ~8min)

6/8 sources achieve 100% relationship recall.
LockBit 3.0 achieves 44% TTP precision (best single source).
New cti_iac_extractor.py extracts deployable infrastructure specs from
STIX bundles and source text:
- OS platforms from ATT&CK technique x_mitre_platforms data
- Services/ports from source text regex (RDP, SMB, SSH, SQL, etc.)
- Tool requirements mapped to infrastructure needs
- Account types (domain admin, local admin, service accounts)
- Network infrastructure (C2 channels, external IPs)
- Deployment notes (AD required, ESXi needed, etc.)

Tested on 4 diverse sources:
  LockBit: RDP+SSH+SQL+SMB, 5 tools, Windows+Linux+ESXi
  APT41: HTTPS+AD+Exchange, C2 via Cloudflare/Azure
  Russian APT: Exchange+VPN, 4 external IPs
  BlackCat: ESXi targeting, Mimikatz
Enhanced infrastructure extraction with 5 new sections:
- Network topology: hosts, segments, perimeter devices from text
- Kill chain ordering: deployment sequence based on detected phases
- Security controls: AV/EDR/firewall/MFA to deploy for detection testing
- Data staging: bait files for realistic exfiltration testing
- CVEs: extracted for future vulnerable software deployment

LockBit produces 8-phase kill chain, topology with VPN perimeter,
Defender/SmartScreen controls, and credential/backup bait data.
Russian APT produces 5 CVEs, Exchange/VPN topology, MFA controls.
@github-actions

Copy link
Copy Markdown

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant