Skip to content

[RFC]: Harden wallet_screening data pipeline, parsing, coverage, and report schema v2 #115

@rosspeili

Description

@rosspeili

Summary

Proposal to upgrade finance/wallet_screening from a demo-grade compliance tool into a robust, auditable Ethereum wallet screening skill suitable for agent-driven due diligence.

Scope covers four workstreams:

  1. Knowledge base expansion and refresh — ingest and normalize public sanctions/malicious-address datasets (OpenSanctions FTM, OFAC SDN crypto exports, FBI/NBCTF, Uniswap TRM, MEW darklist, research phishing datasets).
  2. Parsing and matching fixes — correct FTM publicKey handling, unified address normalization, chain-aware matching, and use of expanded lists for both direct wallet hits and transaction counterparty screening.
  3. Forensic coverage — ERC-20 / internal tx support, Etherscan pagination or explicit truncation warnings, optional hybrid live checks (Chainalysis / TRM free tiers) with offline fallback.
  4. Report schema v2 and agent contract — stable JSON output, explicit risk tiers, evidence-backed risk_factors[], aligned instructions.md / card.json / docs, and expanded test coverage.

This RFC follows a prior internal audit that identified schema drift, ineffective FTM matching (~0 ETH hits from 3,527 FTM entities), and thin malicious-contract tx coverage (6 contracts vs 543+ TRM entries).

Motivation

finance/wallet_screening is the flagship Skillware compliance skill and the most referenced example across README, usage guides, and agent-loop examples. Today it has the right architecture (manifest + instructions + deterministic Python + bundled data + maintenance scripts) but reliability gaps limit trust:

Data / coverage

  • Bundled data: ~4,413 records, ~275 unique ETH addresses across 5 JSON files — far below docs claiming "880+ bundled lists."
  • entities.ftm.json (3,527 FTM entities) is loaded but does not match ETH addresses because matching logic expects addresses / properties.address, while FTM stores wallet IDs in properties.publicKey.
  • Transaction malicious-interaction detection uses only malicious_scs_2025.json (6 contracts). normalized_uniswap_trm.json (543 entries) is used for direct wallet sanctions only, not tx flow analysis.
  • Data freshness: last normalized snapshots dated 2025-07-22; no documented refresh cadence or CI validation.

Correctness

  • Schema drift: instructions.md and card.json reference fields that do not exist in skill.py output (summary.sanctioned vs summary.sanctioned_entity_match, etc.), causing agents to misread reports.
  • Data quality: duplicate address assigned to two mixers in malicious_scs_2025.json; zero-width Unicode (\u200b) in 3 Israel NBCTF addresses breaks matching.
  • Silent API failures: Etherscan/CoinGecko errors return empty/zero values without warnings in the report.

Industry gap

  • Enterprise tools (Chainalysis Address Screening, TRM) provide direct + indirect exposure, structured identifications with source URLs, configurable severities, and continuous re-screening. Skillware should not replicate enterprise graph ML, but should adopt the reporting patterns and minimum viable exposure logic (e.g., flag interactions with known bad counterparties) within its offline-first, deterministic model.

Goal
Make the skill a credible open-source starting point for AML/sanctions due diligence agents — accurate enough to demo, extensible enough for enterprise customization, and honest about limitations.

Detailed Design

Phase 1 — Canonical data model and ingest pipeline

1.1 Unified record schema

Define skills/finance/wallet_screening/data/schema.json (or a documented Python dataclass) for all list entries.

{
  "address": "0x...",
  "chain": "ethereum",
  "category": "sanctions|mixer|scam|phishing|stolen|market|other",
  "severity": "low|medium|high|critical",
  "label": "Entity or contract name",
  "reason": "Human-readable reason",
  "jurisdiction": "US|EU|IL|...",
  "source": "OFAC|OpenSanctions|FBI|NBCTF|Uniswap-TRM|MEW|...",
  "source_url": "https://...",
  "tags": [],
  "last_updated": "ISO-8601"
}

1.2 Normalization layer

  • Centralize address normalization: lowercase, strip whitespace, remove zero-width chars (e.g. \u200b), validate 0x + 40 hex, optional EIP-55 checksum warning.
  • Refactor maintenance/normalization_tool.py and normalize_uniswap_trm.py to emit the canonical schema above.
  • Add new importers:
    • OpenSanctions FTM — extract CryptoWallet entities from properties.publicKey; split comma-separated multi-address strings; tag chain from currency / caption heuristics.
    • OFAC SDN crypto-only — official XML/CSV or vile/ofac-sdn-list daily JSON releases.
    • MEW darklistaddresses-darklist.json from MyEtherWallet/ethereum-lists.
    • Poison-Hunter / PTXPhish — research phishing addresses (ETH-only filter).
  • Deduplicate by (chain, address, category) with merge rules for conflicting labels.
  • Exclude known false positives (e.g. burn address 0x000...dead unless explicitly sanctioned).

1.3 Storage layout

data/
  canonical/
    sanctions_ethereum.json      # merged direct-hit list
    malicious_contracts.json     # tx-interaction screening list
  sources/                       # raw/normalized per-source snapshots (optional)
  manifest.json                  # counts, last_refresh, source versions

1.4 Refresh runbook

  • Document in maintenance/README.md: download URLs, commands, expected counts, license notes (OpenSanctions non-commercial).
  • Optional GitHub Action (weekly): run normalizers, fail if record count drops more than 10% without review.

Phase 2 — Matching and forensic engine

2.1 Sanctions matching

  • Fix _check_against_sanctions to parse FTM properties.publicKey (list values and comma-separated strings).
  • Filter by chain == ethereum before match.
  • Build an in-memory index at init: address_lower -> List[Record] for O(1) lookup.

2.2 Transaction analysis

  • Merge malicious_contracts.json plus relevant TRM/scam entries into a single risk_address_index for tx screening.
  • Extend Etherscan calls:
    • txlist (existing) with pagination
    • tokentx for ERC-20 flows
    • txlistinternal for internal transfers
  • Flag indirect exposure: top counterparties that appear in the sanctions/malicious index (lightweight Chainalysis-style exposure).
  • Surface coverage metadata: tx_analyzed, tx_total_reported_by_etherscan, truncated (boolean).

2.3 Optional hybrid live checks (manifest opt-in)

Add optional env vars:

  • CHAINALYSIS_API_KEY — GET https://public.chainalysis.com/api/v1/address/{addr}
  • TRM_SANCTIONS_API_KEY — POST TRM public screening endpoint

Behavior: run offline checks first; if keys are present, cross-check and merge into risk_details.live_verification[]. Never block offline mode when APIs are unavailable.


Phase 3 — Report schema v2

3.1 Top-level structure

{
  "schema_version": "2.0",
  "metadata": {
    "screening_time": "...",
    "wallet_address": "0x...",
    "chain": "ethereum",
    "data_sources": [
      {"name": "...", "record_count": 0, "last_updated": "..."}
    ],
    "warnings": ["etherscan_txlist_truncated", "coingecko_price_unavailable"]
  },
  "summary": {
    "risk_level": "low|medium|high|critical",
    "sanctioned": false,
    "sanctioned_entity_match": false,
    "malicious_interaction_count": 0,
    "indirect_exposure_count": 0,
    "balance_eth": 0.0,
    "balance_usd": 0.0,
    "total_transactions_analyzed": 0
  },
  "financial_analysis": {},
  "risk_details": {
    "sanctions_hits": [],
    "malicious_interactions": [],
    "indirect_exposures": [],
    "live_verification": []
  },
  "network_analysis": {
    "most_interacted_wallet": ["0x...", 45],
    "top_counterparties": [],
    "risky_counterparties": []
  },
  "risk_factors": [
    {
      "code": "SANCTIONS_DIRECT",
      "severity": "critical",
      "evidence": "...",
      "source": "OFAC"
    }
  ]
}

Note: financial_analysis keeps existing v1 fields (value in/out, gas, PnL in ETH/USD/EUR) plus optional token-flow fields added in Phase 2.

3.2 Risk level rules (deterministic)

Level Conditions
critical Direct sanctions hit on the screened wallet
high Malicious contract interaction OR indirect exposure to a sanctioned entity
medium Interaction with scam/phishing-labeled address; no direct sanctions
low No hits; optionally note high-volume mixer-adjacent patterns in risk_factors

Implementation sketch in skill.py:

def compute_risk_level(sanctions_hits, malicious_interactions, indirect_exposures):
    if sanctions_hits:
        return "critical"
    if malicious_interactions or indirect_exposures:
        return "high"
    if any(i.get("category") in ("scam", "phishing") for i in malicious_interactions):
        return "medium"
    return "low"

3.3 Agent contract alignment

  • Update instructions.md to reference v2 fields only.
  • Update card.json UI schema keys to match v2 summary fields.
  • Update docs/skills/wallet_screening.md with accurate data counts and limitations.
  • Optionally emit v1 aliases for one release (summary.sanctioned mirrors summary.sanctioned_entity_match) then remove in v2.1.

Phase 4 — Tests and acceptance criteria

Tests to add

  • FTM publicKey ETH match using a known sanctioned test vector from OFAC export.
  • Unicode / zero-width address normalization (NBCTF \u200b case).
  • Malicious interaction detected for Tornado Cash router address.
  • Indirect exposure when a counterparty is in the sanctions index.
  • Truncated tx history emits a warning in metadata.warnings.
  • Schema v2 required keys present in every successful execute() response.
  • Optional: mocked Chainalysis API merge into live_verification.

Definition of done

  • ETH match rate from FTM is greater than 0 (baseline today: 0).
  • Malicious tx index includes TRM + MEW + curated contracts (>100 ETH addresses minimum).
  • instructions.md, card.json, and docs match skill.py output.
  • Report includes warnings[] and risk_level.
  • maintenance/README.md documents refresh instructions.
  • All existing and new tests pass.

Out of scope (for this RFC)

  • Multi-chain screening (Solana, Bitcoin, Tron) — separate RFC.
  • Full graph ML / Elliptic-style models.
  • Enterprise SLA, licensing, or hosted screening service.
  • Chainlink Proof of Reserve integration.

Drawbacks

Drawbacks

  • Maintenance burden — More data sources means ongoing refresh work, license tracking (OpenSanctions requires a commercial license for business use), and false-positive triage (e.g. burn address flagged in TRM lists).

  • Bundle size — Full OpenSanctions FTM is roughly 50MB. The repo may need an ETH-filtered subset committed locally, or a download-on-first-run step, which adds complexity for offline/air-gapped users.

  • API dependency risk — Optional Chainalysis/TRM keys improve freshness but introduce rate limits, key management, and external failure modes. Offline fallback must stay the default path, not a degraded afterthought.

  • Scope creep — ERC-20 flows, internal txs, and Etherscan pagination significantly increase API usage, implementation time, and test surface. Easy to turn a skill hardening task into a multi-month forensic platform.

  • Legal / liability perception — Richer reports with risk tiers and exposure flags may read as authoritative compliance clearance. Constitution and agent instructions must keep reinforcing that output is informational, not legal advice.

  • Schema breaking change — Report schema v2 may break existing agent prompts, examples, and any downstream integrations keyed on v1 field names. A deprecation period or dual-key emission adds maintenance cost.

  • Data quality variance — Community lists (MEW darklist, research datasets) differ in rigor from official OFAC/NBCTF sources. Merging them raises false-positive risk unless severity tiers and source attribution stay explicit.

  • Chain-specific limits — Staying Ethereum-only keeps scope manageable but leaves a gap vs multi-chain enterprise tools. Users screening cross-chain addresses may assume broader coverage than the skill provides.

  • Etherscan cost / limits — Paginated tx + token + internal calls can burn through free-tier API quotas quickly on high-activity wallets. May need caching, caps, or paid-tier documentation.

  • No continuous monitoring — Unlike Chainalysis/TRM enterprise products, this remains point-in-time screening. A clean result today does not detect future sanctions listing or new malicious interactions unless re-run manually.

Pinned by rosspeili

Metadata

Metadata

Labels

discussionOpen discussion for RFCs and proposals.enhancementNew feature or requestskill requestRequest for a new capability to be added.
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions