Skip to content

Harden the soundness-ledger gate: content-binding, stamp-enforcement, xfail pin-liveness#631

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/soundness-gate-hardening
Jun 21, 2026
Merged

Harden the soundness-ledger gate: content-binding, stamp-enforcement, xfail pin-liveness#631
hyperpolymath merged 1 commit into
mainfrom
claude/soundness-gate-hardening

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Makes docs/SOUNDNESS.adoc keep every promise it makes. The ledger on main is "prose ahead of mechanism" (it claims content-binding / stamp-enforcement / pinned xfails, but the gate enforced only 2 of those). This builds the missing mechanism, and folds in the closed-#625 capability-matrix anchoring.

The five properties (each maps to a function in the gate)

# Property Function Provenance
1 Anchors exist check_anchors_exist Jonathan's #622 design (kept)
2 Back-links check_backlinks Jonathan's #622 design (kept)
3 Content-binding check_content_binding + tools/soundness-anchors.sha256 + --reseal new
4 Stamp-enforcement check_stamp new
5 Pin-liveness (xfail) check_pins + test/xfail/test_xfail_pins.ml new

## What this gate enforces is documented at the top of the script. Everything fails closed.

Ground-truth correction (compiler wins)

Running the compiler showed #559 generic-subsumption is already detected/rejected (impl[T] Greet for Box[T] vs impl Greet for Box[Int] → "Trait coherence violation"). So the ledger's open (tracked) "not yet detected" was stale in the dangerous direction. Corrected to fixed with a positive test; the stale test_e2e.ml comment fixed. → one fewer xfail pin than the spec assumed.

Also: the stub-return row uses #624 (the real tracker); #560 is variable-string wasm ops, unrelated — this change supplies the pin #628 couldn't (the fixture/test now exist). Stamp re-pointed to dd6c19e (a real main-ancestor; the old d55e22c was squash-orphaned). Metatheory note updated for the new formal/ proofs (#620#627).

Self-tests — each new check watched failing

SELF-TEST 1 — Property 3 (mutate a fixture by one token):
  ERROR (property 3): anchor content drift vs tools/soundness-anchors.sha256 ...

SELF-TEST 2 — Property 4 (un-advanced/orphaned stamp + soundness change):
  ERROR (property 4): stamp d55e22c is not an ancestor of HEAD; re-point :ground-truth-sha: ...

SELF-TEST (5a) — Property 5 (pinned row names a missing pin):
  ERROR (property 1): test anchor not defined: test_stub_backend_return_DELETED
  FATAL: anchor test:test_stub_backend_return_DELETED: expected exactly one defining file, found 0 (fail closed)

SELF-TEST (5b) — Property 5 (an xfail pin flips to XPASS):
  ALARM (property 5): pin test_resume_nontail_xfail is PASSING — the hole may be fixed.
  Open docs/SOUNDNESS.adoc and update the row to 'fixed' (do NOT just silence the pin).

Full suite green (534 tests; xfail harness reports both pins XFAIL-OK), all four guard gates green, dune build/dune runtest green at dd6c19e.

Claims I could not make fully mechanical (named, not silently softened)

  1. Content-binding scope. Fixtures + pinned-test bodies are digest-bound (11/12 anchors); the one SUITE-file anchor (#553test/test_borrow_polonius.ml) is existence+stamp-checked only — a whole-file hash is too coarse. The ledger sentence was tightened to say exactly this.
  2. Stamp "advanced-in-this-change" detection is robust for the normal branch-off-fresh-main workflow (and the orphaned-stamp case fails closed, self-test 2). It has a known edge in a multi-commit-since-stamp history (stamp bumped in an earlier commit, soundness changed again later without re-bump could read as "advanced"); decision-2's full "diff-on-main" freshness check is not separately implemented. Flagged for your call.

CI

build job now checks out fetch-depth: 0 so property 4 can resolve the stamp; the xfail harness is in .ocamlformat-ignore (authored without ocamlformat available).

🤖 Generated with Claude Code

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa


Generated by Claude Code

…ility anchors

Fold of the capability-matrix test-anchoring (closed #625) and the soundness-gate
hardening into one change. Makes docs/SOUNDNESS.adoc's claims true by building the
mechanism the prose-ahead ledger already promises.

Gate tools/check-soundness-ledger.sh: 2 -> 5 enforced properties (anchors exist,
back-links, content-binding via tools/soundness-anchors.sha256 + --reseal,
stamp-enforcement, xfail pin-liveness). New xfail harness test/xfail/, new fixture
stub_backend_return_dropped.affine, capability-matrix Test-anchors section +
tools/check-capability-anchors.sh. CI build job fetch-depth: 0 for the stamp diff.

Ground-truth corrections (compiler wins): #559 generic-subsumption IS detected
(positive test); stub-return uses #624; interp non-tail uses #623.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 41 issues detected

Severity Count
🔴 Critical 2
🟠 High 23
🟡 Medium 16

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "expect() in hot path (32 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "expect() in hot path (29 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/affine_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (2 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/panic.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (1 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/alloc.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  },
  {
    "reason": "unsafe block -- requires SAFETY comment (3 occurrences, CWE-676)",
    "type": "unsafe_block",
    "file": "/home/runner/work/affinescript/affinescript/runtime/src/ffi.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 21, 2026 13:55
@hyperpolymath hyperpolymath merged commit 65c2ae3 into main Jun 21, 2026
17 of 18 checks passed
@hyperpolymath hyperpolymath deleted the claude/soundness-gate-hardening branch June 21, 2026 13:55
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
…te small calls (#633)

Follow-up to #631. Closes one of the two holes the gate's pins watch,
and finishes the "small calls."

## #624 — Lean/Why3 silently drop `return` → now fail loud
The experimental emitters' `gen_block` skipped non-`let` statements, so
`fn f(){ return 1; … }` lowered to `:= ()`. Added
`Ast.fn_body_contains_return`; `codegen_lean`/`codegen_why3` now return
`Error` on an early `return`.

**The pinned-residual discipline, exercised end-to-end (by running, not
eyeballing):**
1. Made the fence → ran the harness → it reported `XPASS
test_stub_backend_return_xfail` and exited non-zero: *"is the hole
fixed? update docs/SOUNDNESS.adoc"*.
2. Followed that instruction: moved the `#555-stub` row `residual
(pinned)` → `loud-fail`, and converted the xfail pin to a **positive
fence-check** `test_stub_backend_return_fenced` (asserts `codegen_lean`
returns `Error`).

## Small calls — done
- **Content-bind suite files.** The manifest now seals `suite:` anchors
(whole-file hash); `test/test_borrow_polonius.ml` is included. Verified
property 3 catches a mutation to it. The ledger's "each anchor is
content-bound" is now fully true (caveat removed).
- **Branch-freshness (stamp).** Added a best-effort,
`origin/main`-guarded check to property 4: if main moved a soundness
path under the branch, the gate asks for a rebase. Skips (never
false-fails) when `origin/main` is absent.
- **#560#624.** The stub row cites #624; #560 is *variable-string
wasm ops*, unrelated. No live doc still mis-tracks it.

## A gate bug that running (not reading) caught
`pinned_row_pins` matched the status word *anywhere* in a row, so the
prose "the row moves `residual (pinned)` → `loud-fail`" made the gate
think the now-`loud-fail` row was still pinned. Fixed to match the
**status cell** (`| \`status\``). This is the kind of thing only
surfaces by executing the gate.

## Verification (all run)
- `dune build` + `dune runtest` green — **534 tests**.
- All 5 gate properties green; 3 sibling gates green.
- Self-tests red as expected: property 3 on a fixture mutation **and**
on a suite-file mutation; the XPASS→ledger-update flip.

## Not done — and why
**#623** (interpreter non-tail resume, `5` not `105`) is left for
owner-steer: it needs a CPS/delimited-continuation rewrite of
`Interp.eval` (blocked on OCaml 4.14 + the jsoo constraint) — exactly
the architecture-level change the soundness handoff says not to start
silently. Its xfail pin stays live (`XFAIL-OK`). Say the word if you
want me to scope it (or do a narrower "fail loud on non-tail resume"
interim).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa

---
_Generated by [Claude
Code](https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants