[RFC]: Harden wallet_screening data pipeline, parsing, coverage, and report schema v2

### Summary

Proposal to upgrade `finance/wallet_screening` from a demo-grade compliance tool into a robust, auditable Ethereum wallet screening skill suitable for agent-driven due diligence.

Scope covers four workstreams:
1. **Knowledge base expansion and refresh** — ingest and normalize public sanctions/malicious-address datasets (OpenSanctions FTM, OFAC SDN crypto exports, FBI/NBCTF, Uniswap TRM, MEW darklist, research phishing datasets).
2. **Parsing and matching fixes** — correct FTM `publicKey` handling, unified address normalization, chain-aware matching, and use of expanded lists for both direct wallet hits and transaction counterparty screening.
3. **Forensic coverage** — ERC-20 / internal tx support, Etherscan pagination or explicit truncation warnings, optional hybrid live checks (Chainalysis / TRM free tiers) with offline fallback.
4. **Report schema v2 and agent contract** — stable JSON output, explicit risk tiers, evidence-backed `risk_factors[]`, aligned `instructions.md` / `card.json` / docs, and expanded test coverage.

This RFC follows a prior internal audit that identified schema drift, ineffective FTM matching (~0 ETH hits from 3,527 FTM entities), and thin malicious-contract tx coverage (6 contracts vs 543+ TRM entries).

### Motivation

`finance/wallet_screening` is the flagship Skillware compliance skill and the most referenced example across README, usage guides, and agent-loop examples. Today it has the right architecture (manifest + instructions + deterministic Python + bundled data + maintenance scripts) but reliability gaps limit trust:

**Data / coverage**
- Bundled data: ~4,413 records, ~275 unique ETH addresses across 5 JSON files — far below docs claiming "880+ bundled lists."
- `entities.ftm.json` (3,527 FTM entities) is loaded but **does not match ETH addresses** because matching logic expects `addresses` / `properties.address`, while FTM stores wallet IDs in `properties.publicKey`.
- Transaction malicious-interaction detection uses only `malicious_scs_2025.json` (6 contracts). `normalized_uniswap_trm.json` (543 entries) is used for direct wallet sanctions only, not tx flow analysis.
- Data freshness: last normalized snapshots dated 2025-07-22; no documented refresh cadence or CI validation.

**Correctness**
- Schema drift: `instructions.md` and `card.json` reference fields that do not exist in `skill.py` output (`summary.sanctioned` vs `summary.sanctioned_entity_match`, etc.), causing agents to misread reports.
- Data quality: duplicate address assigned to two mixers in `malicious_scs_2025.json`; zero-width Unicode (`\u200b`) in 3 Israel NBCTF addresses breaks matching.
- Silent API failures: Etherscan/CoinGecko errors return empty/zero values without warnings in the report.

**Industry gap**
- Enterprise tools (Chainalysis Address Screening, TRM) provide direct + indirect exposure, structured identifications with source URLs, configurable severities, and continuous re-screening. Skillware should not replicate enterprise graph ML, but should adopt the **reporting patterns** and **minimum viable exposure logic** (e.g., flag interactions with known bad counterparties) within its offline-first, deterministic model.

**Goal**
Make the skill a credible open-source starting point for AML/sanctions due diligence agents — accurate enough to demo, extensible enough for enterprise customization, and honest about limitations.

### Detailed Design

## Phase 1 — Canonical data model and ingest pipeline

### 1.1 Unified record schema
Define `skills/finance/wallet_screening/data/schema.json` (or a documented Python dataclass) for all list entries.

```json
{
  "address": "0x...",
  "chain": "ethereum",
  "category": "sanctions|mixer|scam|phishing|stolen|market|other",
  "severity": "low|medium|high|critical",
  "label": "Entity or contract name",
  "reason": "Human-readable reason",
  "jurisdiction": "US|EU|IL|...",
  "source": "OFAC|OpenSanctions|FBI|NBCTF|Uniswap-TRM|MEW|...",
  "source_url": "https://...",
  "tags": [],
  "last_updated": "ISO-8601"
}
```

### 1.2 Normalization layer
- Centralize address normalization: lowercase, strip whitespace, remove zero-width chars (e.g. `\u200b`), validate `0x` + 40 hex, optional EIP-55 checksum warning.
- Refactor `maintenance/normalization_tool.py` and `normalize_uniswap_trm.py` to emit the canonical schema above.
- Add new importers:
  - **OpenSanctions FTM** — extract `CryptoWallet` entities from `properties.publicKey`; split comma-separated multi-address strings; tag chain from `currency` / caption heuristics.
  - **OFAC SDN crypto-only** — official XML/CSV or vile/ofac-sdn-list daily JSON releases.
  - **MEW darklist** — `addresses-darklist.json` from MyEtherWallet/ethereum-lists.
  - **Poison-Hunter / PTXPhish** — research phishing addresses (ETH-only filter).
- Deduplicate by `(chain, address, category)` with merge rules for conflicting labels.
- Exclude known false positives (e.g. burn address `0x000...dead` unless explicitly sanctioned).

### 1.3 Storage layout

```
data/
  canonical/
    sanctions_ethereum.json      # merged direct-hit list
    malicious_contracts.json     # tx-interaction screening list
  sources/                       # raw/normalized per-source snapshots (optional)
  manifest.json                  # counts, last_refresh, source versions
```

### 1.4 Refresh runbook
- Document in `maintenance/README.md`: download URLs, commands, expected counts, license notes (OpenSanctions non-commercial).
- Optional GitHub Action (weekly): run normalizers, fail if record count drops more than 10% without review.

---

## Phase 2 — Matching and forensic engine

### 2.1 Sanctions matching
- Fix `_check_against_sanctions` to parse FTM `properties.publicKey` (list values and comma-separated strings).
- Filter by `chain == ethereum` before match.
- Build an in-memory index at init: `address_lower -> List[Record]` for O(1) lookup.

### 2.2 Transaction analysis
- Merge `malicious_contracts.json` plus relevant TRM/scam entries into a single `risk_address_index` for tx screening.
- Extend Etherscan calls:
  - `txlist` (existing) with pagination
  - `tokentx` for ERC-20 flows
  - `txlistinternal` for internal transfers
- Flag **indirect exposure**: top counterparties that appear in the sanctions/malicious index (lightweight Chainalysis-style exposure).
- Surface **coverage metadata**: `tx_analyzed`, `tx_total_reported_by_etherscan`, `truncated` (boolean).

### 2.3 Optional hybrid live checks (manifest opt-in)
Add optional env vars:
- `CHAINALYSIS_API_KEY` — GET `https://public.chainalysis.com/api/v1/address/{addr}`
- `TRM_SANCTIONS_API_KEY` — POST TRM public screening endpoint

Behavior: run offline checks first; if keys are present, cross-check and merge into `risk_details.live_verification[]`. Never block offline mode when APIs are unavailable.

---

## Phase 3 — Report schema v2

### 3.1 Top-level structure

```json
{
  "schema_version": "2.0",
  "metadata": {
    "screening_time": "...",
    "wallet_address": "0x...",
    "chain": "ethereum",
    "data_sources": [
      {"name": "...", "record_count": 0, "last_updated": "..."}
    ],
    "warnings": ["etherscan_txlist_truncated", "coingecko_price_unavailable"]
  },
  "summary": {
    "risk_level": "low|medium|high|critical",
    "sanctioned": false,
    "sanctioned_entity_match": false,
    "malicious_interaction_count": 0,
    "indirect_exposure_count": 0,
    "balance_eth": 0.0,
    "balance_usd": 0.0,
    "total_transactions_analyzed": 0
  },
  "financial_analysis": {},
  "risk_details": {
    "sanctions_hits": [],
    "malicious_interactions": [],
    "indirect_exposures": [],
    "live_verification": []
  },
  "network_analysis": {
    "most_interacted_wallet": ["0x...", 45],
    "top_counterparties": [],
    "risky_counterparties": []
  },
  "risk_factors": [
    {
      "code": "SANCTIONS_DIRECT",
      "severity": "critical",
      "evidence": "...",
      "source": "OFAC"
    }
  ]
}
```

Note: `financial_analysis` keeps existing v1 fields (value in/out, gas, PnL in ETH/USD/EUR) plus optional token-flow fields added in Phase 2.

### 3.2 Risk level rules (deterministic)

| Level | Conditions |
|-------|------------|
| **critical** | Direct sanctions hit on the screened wallet |
| **high** | Malicious contract interaction OR indirect exposure to a sanctioned entity |
| **medium** | Interaction with scam/phishing-labeled address; no direct sanctions |
| **low** | No hits; optionally note high-volume mixer-adjacent patterns in `risk_factors` |

Implementation sketch in `skill.py`:

```python
def compute_risk_level(sanctions_hits, malicious_interactions, indirect_exposures):
    if sanctions_hits:
        return "critical"
    if malicious_interactions or indirect_exposures:
        return "high"
    if any(i.get("category") in ("scam", "phishing") for i in malicious_interactions):
        return "medium"
    return "low"
```

### 3.3 Agent contract alignment
- Update `instructions.md` to reference v2 fields only.
- Update `card.json` UI schema keys to match v2 summary fields.
- Update `docs/skills/wallet_screening.md` with accurate data counts and limitations.
- Optionally emit v1 aliases for one release (`summary.sanctioned` mirrors `summary.sanctioned_entity_match`) then remove in v2.1.

---

## Phase 4 — Tests and acceptance criteria

### Tests to add
- FTM `publicKey` ETH match using a known sanctioned test vector from OFAC export.
- Unicode / zero-width address normalization (NBCTF `\u200b` case).
- Malicious interaction detected for Tornado Cash router address.
- Indirect exposure when a counterparty is in the sanctions index.
- Truncated tx history emits a warning in `metadata.warnings`.
- Schema v2 required keys present in every successful `execute()` response.
- Optional: mocked Chainalysis API merge into `live_verification`.

### Definition of done
- [ ] ETH match rate from FTM is greater than 0 (baseline today: 0).
- [ ] Malicious tx index includes TRM + MEW + curated contracts (>100 ETH addresses minimum).
- [ ] `instructions.md`, `card.json`, and docs match `skill.py` output.
- [ ] Report includes `warnings[]` and `risk_level`.
- [ ] `maintenance/README.md` documents refresh instructions.
- [ ] All existing and new tests pass.

---

## Out of scope (for this RFC)
- Multi-chain screening (Solana, Bitcoin, Tron) — separate RFC.
- Full graph ML / Elliptic-style models.
- Enterprise SLA, licensing, or hosted screening service.
- Chainlink Proof of Reserve integration.

### Drawbacks

## Drawbacks

- **Maintenance burden** — More data sources means ongoing refresh work, license tracking (OpenSanctions requires a commercial license for business use), and false-positive triage (e.g. burn address flagged in TRM lists).

- **Bundle size** — Full OpenSanctions FTM is roughly 50MB. The repo may need an ETH-filtered subset committed locally, or a download-on-first-run step, which adds complexity for offline/air-gapped users.

- **API dependency risk** — Optional Chainalysis/TRM keys improve freshness but introduce rate limits, key management, and external failure modes. Offline fallback must stay the default path, not a degraded afterthought.

- **Scope creep** — ERC-20 flows, internal txs, and Etherscan pagination significantly increase API usage, implementation time, and test surface. Easy to turn a skill hardening task into a multi-month forensic platform.

- **Legal / liability perception** — Richer reports with risk tiers and exposure flags may read as authoritative compliance clearance. Constitution and agent instructions must keep reinforcing that output is informational, not legal advice.

- **Schema breaking change** — Report schema v2 may break existing agent prompts, examples, and any downstream integrations keyed on v1 field names. A deprecation period or dual-key emission adds maintenance cost.

- **Data quality variance** — Community lists (MEW darklist, research datasets) differ in rigor from official OFAC/NBCTF sources. Merging them raises false-positive risk unless severity tiers and source attribution stay explicit.

- **Chain-specific limits** — Staying Ethereum-only keeps scope manageable but leaves a gap vs multi-chain enterprise tools. Users screening cross-chain addresses may assume broader coverage than the skill provides.

- **Etherscan cost / limits** — Paginated tx + token + internal calls can burn through free-tier API quotas quickly on high-activity wallets. May need caching, caps, or paid-tier documentation.

- **No continuous monitoring** — Unlike Chainalysis/TRM enterprise products, this remains point-in-time screening. A clean result today does not detect future sanctions listing or new malicious interactions unless re-run manually.

Phase	Scope	Issue	PR	Status
Quick win	Align `instructions.md` / `card.json` / docs with current `skill.py` output	—	—	Open
Quick win	Data quality: zero-width NBCTF addresses, duplicate mixer address in bundled JSON	—	—	Open
Ph. 1.1	Unified record schema (`data/schema.json` or documented datacla…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Harden wallet_screening data pipeline, parsing, coverage, and report schema v2 #115

Summary

Motivation

Detailed Design

Phase 1 — Canonical data model and ingest pipeline

1.1 Unified record schema

1.2 Normalization layer

1.3 Storage layout

1.4 Refresh runbook

Phase 2 — Matching and forensic engine

2.1 Sanctions matching

2.2 Transaction analysis

2.3 Optional hybrid live checks (manifest opt-in)

Phase 3 — Report schema v2

3.1 Top-level structure

3.2 Risk level rules (deterministic)

3.3 Agent contract alignment

Phase 4 — Tests and acceptance criteria

Tests to add

Definition of done

Out of scope (for this RFC)

Drawbacks

Drawbacks

How to contribute to this RFC

Progress tracker

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Level	Conditions
critical	Direct sanctions hit on the screened wallet
high	Malicious contract interaction OR indirect exposure to a sanctioned entity
medium	Interaction with scam/phishing-labeled address; no direct sanctions
low	No hits; optionally note high-volume mixer-adjacent patterns in `risk_factors`

[RFC]: Harden wallet_screening data pipeline, parsing, coverage, and report schema v2 #115

Description

Summary

Motivation

Detailed Design

Phase 1 — Canonical data model and ingest pipeline

1.1 Unified record schema

1.2 Normalization layer

1.3 Storage layout

1.4 Refresh runbook

Phase 2 — Matching and forensic engine

2.1 Sanctions matching

2.2 Transaction analysis

2.3 Optional hybrid live checks (manifest opt-in)

Phase 3 — Report schema v2

3.1 Top-level structure

3.2 Risk level rules (deterministic)

3.3 Agent contract alignment

Phase 4 — Tests and acceptance criteria

Tests to add

Definition of done

Out of scope (for this RFC)

Drawbacks

Drawbacks

How to contribute to this RFC

Progress tracker

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions