Compliance ANalysis and Automated Regulatory Yield
CANARY wears two hats:
- Host for the
sealsecurity demo. It runs a real multi-stage LLM pipeline in front of seal, a verified MCP approval-gate sidecar, and proves a destructive vault write dies at a gate the model cannot influence. Start here: Security Demo: seal x Canary. - A self-teaching LangGraph compliance pipeline. It monitors financial regulation across 5 jurisdictions, extracts compliance obligations, and mechanically verifies every citation against the source text. The pipeline, the verification engine, and the 487 real extraction artifacts are the substance. See The Canary Pipeline.
seal is a verified MCP approval-gate sidecar. This repo hosts its end-to-end demo: Canary, a genuine LLM pipeline that writes to an Obsidian vault, is run with seal sitting in front of the vault's MCP server. The demo proves that a destructive write is gated at a verified boundary the model cannot influence.
The demo runs fully offline: no ANTHROPIC_API_KEY, no network. The regulation corpus is frozen on disk (demo/corpus) and the extraction step is replayed from a fixture, so every run is deterministic.
Honest claim: a default-deny gate blocks the destructive action at a verified boundary the model cannot influence, and every allowed action is explicitly approved. It does not claim prompt-injection prevention. The model can still be fooled; the demo shows the action dies regardless. Full storyboard and proof shots: demo/DEMO.md.
The demo has two scenarios, selected by the SEAL_SCENARIO environment variable (or SCENARIO= via the helper script):
- P3 kill/restore (default, no env). Canary runs through
seal; the legitimate reportnote/createis approved and lands. Then the destructivenote/deleteis deleted withoutsealand blocked withseal, the file surviving byte-identical. - Approval lifecycle (
SEAL_SCENARIO=lifecycle). The write (note/create) is approved and lands; the delete (note/delete) is blocked while no approval is present, then allowed once a trusted approval is written. Detail below.
Both scenarios share the same policy sandbox, the same /out artifact mount, and the same colour-coded output.
The image bundles all three repos (seal + flywheel-memory + canary) and builds them itself, so the only host dependency is Docker. From a clean WSL2 box:
1. Install Docker. Either Docker Desktop with WSL integration enabled, or the native engine in the distro:
curl -fsSL https://get.docker.com | sh # let the 20s timer run, do NOT Ctrl+C
sudo usermod -aG docker "$USER"
newgrp docker # activate the group without a WSL restart
sudo service docker start
docker run --rm hello-world # smoke test2. Fix DNS if clones fail. If curl or the in-container git clone fail with Could not resolve host, a VPN (e.g. Tailscale) is hijacking resolution. Add a working resolver and make it stick:
echo "nameserver 1.1.1.1" | sudo tee /etc/resolv.conf
printf '[network]\ngenerateResolvConf = false\n' | sudo tee -a /etc/wsl.confThe Docker daemon inherits host DNS, so the in-container clones need this too.
3. Get the repo and build. The first build is slow: it cold-compiles the Lean core.
git clone https://github.com/velvetmonkey/canary
cd canary
docker build -t seal-canary-demo .4. Run it, with the artifacts mounted to the host.
docker run --rm -v "$(pwd)/demo-out:/out" seal-canary-demo
ls demo-out/ # P3-REPORT.md, vault-canary/, demo-policy.json, approvals.ndjson, poisoned-corpus/demo-out/ holds the full disposable workspace: the generated report, the demo vault, the active policy, the approvals control file, and the poisoned corpus.
Why the /out mount rather than binding the workspace directly: the runner rebuilds its workspace at /tmp/seal-demo-p3 inside the container and wipes it (rmtree) on each start, so that path cannot be bind-mounted. The entrypoint copies the workspace to /out on exit instead, so it survives --rm. Without a mount the demo still runs and prints the report to stdout; artifacts are discarded on exit.
The policy is not baked into a rebuild. Mount your own at run time via SEAL_POLICY and iterate with zero rebuilds. The helper script wraps build-if-needed + run + mount + report tail:
demo/run-demo.sh # P3 kill/restore, baked default policy
demo/run-demo.sh my-policy.json # your policy, no rebuild
SCENARIO=lifecycle demo/run-demo.sh # approval-lifecycle scenario
SEAL_EXTRA_APPROVALS=more.ndjson demo/run-demo.sh my-policy.json
FORCE_BUILD=1 demo/run-demo.sh # force an image rebuildBy hand (equivalent to the custom-policy line above):
docker run --rm \
-v "$(pwd)/my-policy.json:/cfg/policy.json:ro" -e SEAL_POLICY=/cfg/policy.json \
-v "$(pwd)/demo-out:/out" seal-canary-demoOutput is colour-coded so the streams are tellable apart at a glance:
- Runner narration is bright and tinted per source: Canary pipeline = yellow, seal gate = cyan, with green/red/bold for allowed/blocked/verdict.
- Server stderr is dimmed so it recedes: the flywheel-memory server = dim cyan, seal's own logs = dim magenta. (The noisy flywheel startup chatter is the dim cyan block.)
Colour is on when stdout is a TTY or when FORCE_COLOR=1 is set (the helper sets it); NO_COLOR=1 disables it.
A policy is JSON: an approval block (ttl_seconds; the control file is forced to the workspace automatically) and a list of tools. Each tool rule has a mode, a match rule, and a capability target.
seal v1 has two modes only: guarded and deny. There is no bare allow mode. Anything not matched by a guarded rule is denied by default, so "allowed" always means guarded and carrying a valid approval in the control file. That is the honest security claim, not a loophole.
seal also matches rules by tool name, first match wins. Two rules for the same tool name shadow each other; to gate two actions of one tool, use a single rule and derive the target from an argument (see the lifecycle scenario).
The baked default guards note/create:
{
"approval": { "ttl_seconds": 120 },
"tools": [
{ "name": "note", "mode": "guarded",
"match": { "type": "contains_any_ci", "arg": "action", "needles": ["create"] },
"target": [ {"literal": "flywheel"}, {"literal": "note"}, {"literal": "create"} ] }
]
}Try flipping mode to deny, gating delete instead of create, or changing ttl_seconds. The verdict (PASS/FAIL) and the full trace land in demo-out/P3-REPORT.md. A policy that denies or allows the probe outright is handled cleanly: the runner skips the approval seed and reports what the policy actually did, rather than erroring.
SEAL_EXTRA_APPROVALS points at a file of newline-delimited approval records ({"target": "<digits>"} per line) appended on top of the auto-seeded create approval, for pre-approving extra targets you have gated.
SCENARIO=lifecycle (env SEAL_SCENARIO=lifecycle) runs the approval-lifecycle scenario instead of kill/restore:
- Write (
note/create) is approved and lands. - Delete (
note/delete) is attempted with no approval and is blocked; the note survives. - A trusted approval for the delete is then written, and the same delete is attempted again and succeeds.
It uses a single note rule guarding both actions, with the approval target derived from the action arg, so create and delete carry distinct approval tokens: approving one never approves the other.
If you have the toolchain (the built seal binary, Node, and the Flywheel MCP server as sibling repos, plus uv), you can skip Docker:
uv run python demo/run_p3.py # P3 kill/restore
SEAL_SCENARIO=lifecycle uv run python demo/run_p3.pyThe runner discovers the dependencies via the sibling repo layout, or via the SEAL_BIN, NODE_BIN and FLYWHEEL_SERVER environment overrides.
seal itself is a single native binary with no Docker dependency. The container exists only to pull the multi-repo demo together reproducibly. Adoption is a one-line host config change, not a container.
About this project. CANARY is a self-teaching exercise in building a LangGraph/LangChain processing graph. The goal was to learn how to wire a multi-stage LLM pipeline with typed state, conditional edges, structured extraction, and mechanical verification, applied to a real domain (ESG regulatory compliance) rather than a toy problem. The pipeline, the verification engine, and the 487 real extraction artifacts are the interesting parts.
A LangGraph pipeline that monitors financial regulation across 5 jurisdictions: fetches changes, extracts compliance obligations, mechanically verifies every citation, and delivers audit-ready reports.
EUR-Lex/UK Legislation/GovInfo → fetch → detect → extract → verify → report → vault
httpx SHA-256 Claude substring markdown Flywheel
retry difflib Pydantic matching YAML FM MCP
- Fetch regulation text from 5 government legal databases (EUR-Lex, UK Legislation, GovInfo, NZ Legislation, Irish Statute Book)
- Detect changes via SHA-256 hash comparison — deterministic, zero false positives
- Extract structured obligations using Claude with Pydantic schema enforcement — not free-text, not JSON-maybe
- Verify every extracted citation against the source text using 5 matching strategies with full Unicode normalization
- Report as structured markdown with YAML frontmatter, ready for compliance triage
- Store in an Obsidian vault via MCP, with full SQLite audit trail
The key LangGraph insight: a conditional edge after detect_change skips the LLM entirely when nothing changed. Most runs cost nothing. This makes hourly monitoring of 14 sources economically viable (~$0.05/run, ~15s total).
LLMs make things up. Ask Claude to quote a regulation and it might give you something that sounds right but doesn't actually appear in the document. In a compliance context, that's dangerous.
CANARY checks every single quote. After Claude extracts an obligation and provides a supporting quote, CANARY takes that quote and searches for it in the actual published regulation text. Not with AI — with a straight substring match. Either the quote is in the document or it isn't.
[verified]— the quote was found word-for-word in the real regulation. You can open the source URL and Ctrl+F it yourself.[unverified]— the quote wasn't found. It gets flagged for human review. It is never silently accepted.
This is harder than it sounds. Legal databases use different quote characters (" vs "), invisible Unicode spaces, footnote markers jammed into the text, and ligatures from PDF conversion. A naive string search fails on perfectly correct quotes. So CANARY normalizes both the quote and the source text first (smart quotes → ASCII, dashes → hyphens, invisible characters stripped, whitespace collapsed) and tries 5 matching strategies before giving up. When all 5 fail, a re-quote pipeline asks Claude to find the exact passage again and re-verifies.
Result: 431 of 487 citations (89%) verified across 13 regulations and 5 jurisdictions. See Citation Verification for the full technical detail.
Each objective is a self-contained markdown file — structured frontmatter for filtering, a plain-English obligation breakdown, and a verified legal quote you can trace back to the source:
---
type: compliance-objective
regulation: Regulation (EU) 2019/2088 (SFDR)
article: "Article 8(1)"
obligation_type: disclosure
materiality: high
citation: verified # <-- mechanically checked
source_url: https://eur-lex.europa.eu/... # <-- go verify it yourself
canary_run_id: obj-fb4c37ee3772
---# Article 8(1) — Pre-contractual disclosure for Article 8 products
## Obligation <-- Claude's plain-English summary
**Who:** Financial market participants
**What:** For each financial product that <-- Claude reads the legal text and
promotes environmental or social explains what it actually means
characteristics (Article 8 fund), include in normal words
in pre-contractual disclosures: (a) info
on how those characteristics are met...
**Where:** In pre-contractual disclosures
**Deadline:** 10 March 2021
**Materiality:** high
## Legal Basis <-- the proof
> Where a financial product promotes, <-- this is copied verbatim from the
> among other characteristics, real regulation on EUR-Lex.
> environmental or social characteristics, CANARY searched the actual
> or a combination of those published document and confirmed
> characteristics, provided that the this exact text exists there.
> companies in which the investments that's what [verified] means.
> are made follow good governance
> practices, the information to be
> disclosed pursuant to Article 6(1)
> and (3) shall include the following...
*Article 8(1), Regulation (EU) 2019/2088 (SFDR)* [verified]So in each objective: the Obligation section is Claude explaining the law in plain English. The Legal Basis section is the actual verbatim quote from the published regulation that backs it up — and [verified] means CANARY confirmed that quote really exists in the source document.
487 objectives extracted from 13 regulations. Browse them: output/ | Guide: output/README.md
| Aspect | What CANARY does | Why it matters |
|---|---|---|
| Citation verification | 5-strategy mechanical matching with NFKC normalization, smart quote folding, dash normalization, footnote stripping | Every quote is provably present in the source document. 89% verified rate across 487 citations. |
| Change detection | SHA-256 hashing — identical documents always hash identically | Zero false positives. No probabilistic thresholds. |
| Structured extraction | Pydantic schema enforcement on Claude output | No malformed JSON, no missing fields, no hallucinated enums. Downstream code receives guaranteed-correct objects. |
| Conditional execution | LangGraph conditional edge skips LLM when nothing changed | Most runs consume zero tokens. Economically viable at hourly frequency. |
| Audit trail | SQLite run_log + source_check_log + change_log, timestamped |
Answers "when did you become aware of this change?" with forensic precision. |
| Re-quote pipeline | Auto-repairs failed citations by asking Claude to find the exact passage | Unverified quotes are never silently accepted — flagged or repaired. |
| Source | Regulation | Objectives | Verified |
|---|---|---|---|
SFDR-L1 |
SFDR Level 1 — Reg (EU) 2019/2088 | 35 | 34 |
SFDR-RTS |
SFDR RTS — Delegated Reg (EU) 2022/1288 | 112 | 100 |
SFDR-2-PROPOSAL |
SFDR 2.0 Amendment Proposal (Nov 2025) | 123 | 88 |
EU-TAXONOMY |
EU Taxonomy Regulation (EU) 2020/852 | 54 | 54 |
MIFID-SUSTAINABILITY |
MiFID II Sustainability Preferences | 44 | 43 |
UK-FSA-2023 |
UK Financial Services and Markets Act 2023 | 5 | 5 |
UK-TCFD-REGS |
UK Climate-Related Financial Disclosures Regs 2022 | 10 | 10 |
UK-SDR-REGS |
UK Sustainability Disclosure Requirements Regs 2023 | 10 | 10 |
UK-CLIMATE-CHANGE |
UK Climate Change Act 2008 | 10 | 10 |
UK-ENV-ACT |
UK Environment Act 2021 | 10 | 6 |
US-SOX |
US Sarbanes-Oxley Act 2002 | 44 | 41 |
NZ-FMC-ACT |
NZ Financial Markets Conduct Act 2013 | 20 | 20 |
IE-COMPANIES-ACT |
Irish Companies Act 2014 | 10 | 10 |
| 487 | 431 (89%) |
Data sources: EUR-Lex (EU), legislation.gov.uk (UK), GovInfo (US), legislation.govt.nz (NZ), irishstatutebook.ie (IE). Full config: config/sources.yaml.
git clone git@github.com:velvetmonkey/canary.git && cd canary
uv sync
cp .env.example .env # add ANTHROPIC_API_KEY
uv run canary # change detection (all sources)
uv run canary extract-objectives --source SFDR-L1 # extract obligations
uv run canary --no-vault # console only, no vault writes
uv run canary status # recent run history| Document | Description |
|---|---|
| Architecture | LangGraph pipeline, node responsibilities, state model, Pydantic data models |
| Citation Verification | 5-strategy matching, Unicode normalization, re-quote pipeline |
| Extraction | Chunked extraction, EUR-Lex fetcher, HTML parsing |
| Storage & Audit | SQLite schema, observability, LangSmith tracing, issue tracking |
| Vault Integration | MCP connection, write paths, output formats, deduplication |
| Operations | Quick start, all CLI commands, configuration, environment, testing |
| Technology | Rationale behind each technology choice |
| Independent Review | Third-party code and product assessment |
| Layer | Technology | Role |
|---|---|---|
| Orchestration | LangGraph | State machine with conditional edges and checkpointing |
| Extraction | Claude + Pydantic | Structured output with schema enforcement |
| Verification | Substring matching + Unicode normalization | Mechanical citation verification (5 strategies) |
| Change detection | SHA-256 + difflib | Deterministic hashing, unified diffs |
| Fetching | httpx + tenacity | Async HTTP, ETag caching, exponential backoff |
| HTML parsing | BeautifulSoup + lxml | Tolerant legal HTML extraction |
| Vault | MCP + Flywheel | Obsidian integration, idempotent writes |
| Audit | SQLite | Full run provenance, structured exit codes |
| Testing | pytest (91 tests) + ruff | Async tests, full pipeline E2E coverage |
See Technology for the full rationale behind each choice.
- Compliance matrix RAG — cross-reference extracted objectives against firm policies
- More sources — SEC rules, FCA Handbook, BaFin, ESMA Q&A
- Scheduling — automated periodic runs via systemd/cron
- Alerting — Slack/email notifications on change detection
- Obligation tracking dashboard
Part of the velvetmonkey verified-cognition stack:
- canary (this repo) — the LangGraph compliance pipeline and host for the seal demo.
- mcp-seal — the verified MCP approval-gate sidecar the demo puts in front of the vault.
- flywheel-memory — the knowledge-graph MCP server Canary writes to, and the server
sealgates in the demo.
Copyright (c) 2026 velvetmonkey. All rights reserved. See LICENSE.