From 3f16b4760f51979f86d4be06a810bc774833d36c Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 21 Jun 2026 14:11:59 +0000 Subject: [PATCH] fix(#624): fence Lean/Why3 backends to fail loud on early `return` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The experimental Lean/Why3 emitters silently dropped early `return` statements (gen_block skipped them, lowering `fn f(){ return 1; … }` to `:= ()`). Add an Ast.fn_body_contains_return fence so codegen_lean/codegen_why3 return Error instead. The hole moves residual(pinned) -> loud-fail; the xfail pin flipped to XPASS (verified by running the harness) and is converted to a positive fence-check (test_stub_backend_return_fenced). Also (the "small calls"): - Content-bind suite-file anchors too (whole-file hash) — the manifest now seals test/test_borrow_polonius.ml; verified property 3 catches a mutation to it. Ledger's "each anchor content-bound" claim is now fully true. - Add a best-effort branch-freshness check to property 4 (origin/main-guarded): if main moved a soundness path under the branch, the gate asks for a rebase. - Fix a gate false-positive: pinned_row_pins now matches the STATUS CELL, not a prose mention of the status word (caught by running the gate, not eyeballing). - #555-stub row cites #624 (not #560 — that's variable-string wasm ops). Verified: dune build + dune runtest green (534 tests); all 5 gate properties + the three sibling gates green; self-tests for content-binding (incl. suite) red as expected. #623 (interp non-tail resume) left for owner-steer — it needs a CPS rewrite of eval, not a small fix. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01BbxKhXQwTvVgkYDgBMLJoa --- docs/SOUNDNESS.adoc | 56 +++++++++++------------ lib/ast.ml | 41 +++++++++++++++++ lib/lean_codegen.ml | 9 ++++ lib/why3_codegen.ml | 7 +++ test/xfail/test_xfail_pins.ml | 80 ++++++++++++++++++--------------- tools/check-soundness-ledger.sh | 33 +++++++++++--- tools/soundness-anchors.sha256 | 7 +-- 7 files changed, 156 insertions(+), 77 deletions(-) diff --git a/docs/SOUNDNESS.adoc b/docs/SOUNDNESS.adoc index 7c95f659..7f6f830e 100644 --- a/docs/SOUNDNESS.adoc +++ b/docs/SOUNDNESS.adoc @@ -4,7 +4,7 @@ // Freshness stamp — a real main-ancestor commit (survives squash-merge); advance // it (and --reseal) in any change that touches a soundness path. See "Keeping // this honest" and tools/check-soundness-ledger.sh (property 4). -:ground-truth-sha: dd6c19e +:ground-truth-sha: 65c2ae3 :ground-truth-date: 2026-06-21 :stamp-verified-by: tools/check-soundness-ledger.sh :toc: macro @@ -59,13 +59,11 @@ Anchored to tests:: Each claim cites the executable artefact that proves it. The gate verifies the artefact *exists*. Content-bound:: The gate also verifies the artefact still *says what it said*: -each fixture and pinned-test anchor is pinned to a content digest in -`tools/soundness-anchors.sha256`, so a fixture cannot have its assertion eroded — -weakened, re-baselined, or silently re-scoped — without the digest mismatching -and the build failing. (Suite-file anchors such as `test/test_borrow_polonius.ml` -are existence- and stamp-checked but not digest-bound: a whole-file hash is too -coarse to be a meaningful content seal.) Prose nailed to a fixture *by name only* -can still rot if the fixture's meaning drifts; prose nailed *by content* cannot. +each anchor — fixture, pinned-test body, and suite file — is pinned to a content +digest in `tools/soundness-anchors.sha256`, so an anchor cannot have its assertion +eroded — weakened, re-baselined, or silently re-scoped — without the digest +mismatching and the build failing. Prose nailed to a fixture *by name only* can +still rot if the fixture's meaning drifts; prose nailed *by content* cannot. Stamped, and the stamp is enforced:: The freshness attribute records the main-ancestor commit this was last reconciled against. The gate fails if `HEAD` @@ -200,21 +198,19 @@ corpus with a shrinking allowlist (7 known divergences). count pinned at 7) | #555-stub / #624 -| The Lean and Why3 experimental backends drop `return` statements wholesale -(`lean_codegen.ml` `gen_block` skips `StmtReturn`, so `fn f(){ return 1; … }` -lowers to `:= ()`) — a silent-wrong-value shape, the same _category_ as the #555 -codegen hole but on the stub-backend path. Not yet fenced. The backends are -`experimental` so no production verdict depends on them; the agreed remedy (#624) -is to fail loud. (NB: #628 tracked this as "#560"; that issue is actually -variable-string wasm ops — #624 is the correct tracker, and this change supplies -the pin #628 could not.) -| `residual (pinned)` +| The Lean and Why3 experimental backends now FAIL LOUD on an early `return` +instead of silently dropping it. Previously `gen_block` skipped the `return` +statements, so `fn f(){ return 1; … }` lowered to `:= ()`; now a +`Ast.fn_body_contains_return` fence makes `codegen_lean` / `codegen_why3` return +`Error`. Fixed via #624. (NB: #628 tracked this as "#560"; that issue is actually +variable-string wasm ops — #624 is the correct tracker. This change supplies the +loud fence #628 could not, so the row is now `loud-fail` (it was previously a +pinned residual).) +| `loud-fail` | `test/e2e/fixtures/stub_backend_return_dropped.affine` + -`test_stub_backend_return_xfail` (xfail: asserts the desired loud failure — -`Lean_codegen.codegen_lean` returning `Error` — on an early-`return` program; -currently fails-as-expected because the backend silently returns `Ok`; flips to -an unexpected pass the day the backend fails loud or is removed). See _Still -open_. +`test_stub_backend_return_fenced` (positive check: asserts +`Lean_codegen.codegen_lean` returns `Error` on the early-`return` program — it was +`Ok`; the check would fail if the silent drop regressed). |=== == Still open — read before claiming "sound" @@ -226,14 +222,14 @@ row in the table above, restated for the hurried reader — not a separate list: * *Interpreter non-tail resume* (#555 / #623, `residual (pinned)`) — the `5`-not-`105` shape. Silent on the interpreter path; pinned by `test_resume_nontail_xfail`. -* *Stub backends drop `return`* (#555-stub / #624, `residual (pinned)`) — Lean and -Why3 drop `return` wholesale. Silent-wrong-value, broader than #555 codegen, -fenced only by `experimental` gating and a pinning xfail, not by a loud failure. -Treat all non-reference backends as experimental (see -`docs/CAPABILITY-MATRIX.adoc`). - -If you are deciding whether AffineScript is "sound enough" for a use, these two -rows are the answer, and the metatheory caveat below applies to *both*. +(Stub backends dropping `return` (#555-stub / #624) is now `loud-fail` — Lean and +Why3 reject an early `return` instead of silently dropping it — so it is no longer +a "still open" residual. Treat all non-reference backends as experimental anyway, +see `docs/CAPABILITY-MATRIX.adoc`.) + +If you are deciding whether AffineScript is "sound enough" for a use, the +interpreter non-tail resume row above is the one remaining residual, and the +metatheory caveat below applies to it. === Pinned-residual discipline diff --git a/lib/ast.ml b/lib/ast.ml index 75992507..68d16bec 100644 --- a/lib/ast.ml +++ b/lib/ast.ml @@ -499,3 +499,44 @@ type program = { prog_decls : top_level list; } [@@deriving show, eq] + +(** Does a function body contain a `return`? Used by the experimental Lean/Why3 + emitters to FAIL LOUD rather than silently drop control flow (Refs #624): + their [gen_block] skips non-`let` statements, so an early `return` would + otherwise vanish. Covers the standard nesting (block / if / let / binary / + unary / app / span / while / for / assign); unlisted exotic variants default + to [false] (conservative — they are not generated by the fixtures this + fences, and the backends are experimental). *) +let rec expr_contains_return (e : expr) : bool = + match e with + | ExprReturn _ -> true + | ExprSpan (inner, _) -> expr_contains_return inner + | ExprBlock blk -> block_contains_return blk + | ExprIf { ei_cond; ei_then; ei_else } -> + expr_contains_return ei_cond + || expr_contains_return ei_then + || (match ei_else with Some e -> expr_contains_return e | None -> false) + | ExprLet { el_value; el_body; _ } -> + expr_contains_return el_value + || (match el_body with Some e -> expr_contains_return e | None -> false) + | ExprBinary (a, _, b) -> expr_contains_return a || expr_contains_return b + | ExprUnary (_, x) -> expr_contains_return x + | ExprApp (f, args) -> + expr_contains_return f || List.exists expr_contains_return args + | _ -> false + +and block_contains_return (blk : block) : bool = + List.exists stmt_contains_return blk.blk_stmts + || (match blk.blk_expr with Some e -> expr_contains_return e | None -> false) + +and stmt_contains_return (s : stmt) : bool = + match s with + | StmtExpr e -> expr_contains_return e + | StmtLet { sl_value; _ } -> expr_contains_return sl_value + | StmtAssign (a, _, b) -> expr_contains_return a || expr_contains_return b + | StmtWhile (c, b) -> expr_contains_return c || block_contains_return b + | StmtFor (_, e, b) -> expr_contains_return e || block_contains_return b + +let fn_body_contains_return : fn_body -> bool = function + | FnExpr e -> expr_contains_return e + | FnBlock b -> block_contains_return b diff --git a/lib/lean_codegen.ml b/lib/lean_codegen.ml index fbaee5db..a37e4b24 100644 --- a/lib/lean_codegen.ml +++ b/lib/lean_codegen.ml @@ -84,6 +84,15 @@ and gen_block (blk : block) : string = let gen_function (fd : fn_decl) : string = let name = mangle fd.fd_name.name in + (* #624: fail loud on `return` rather than silently drop it. The block emitter + skips non-`let` statements, so an early `return` would vanish (the body + would lower to `:= ()`). Refuse instead — these are experimental backends. *) + if fn_body_contains_return fd.fd_body then + failwith + (Printf.sprintf + "Lean backend does not support `return` (in `%s`, Refs #624): early \ + returns would silently drop control flow; rewrite as a tail expression" + fd.fd_name.name); let params = match fd.fd_params with | [] -> "" | _ -> " " ^ String.concat " " (List.map (fun (p : param) -> diff --git a/lib/why3_codegen.ml b/lib/why3_codegen.ml index f10b2511..2586c4f5 100644 --- a/lib/why3_codegen.ml +++ b/lib/why3_codegen.ml @@ -91,6 +91,13 @@ and gen_block (blk : block) : string = let gen_function (fd : fn_decl) : string = let name = mangle fd.fd_name.name in + (* #624: fail loud on `return` rather than silently drop it (see lean_codegen). *) + if fn_body_contains_return fd.fd_body then + failwith + (Printf.sprintf + "Why3 backend does not support `return` (in `%s`, Refs #624): early \ + returns would silently drop control flow; rewrite as a tail expression" + fd.fd_name.name); let params = match fd.fd_params with | [] -> "()" | _ -> String.concat " " (List.map (fun (p : param) -> diff --git a/test/xfail/test_xfail_pins.ml b/test/xfail/test_xfail_pins.ml index 6f6a7eb9..bc2633fe 100644 --- a/test/xfail/test_xfail_pins.ml +++ b/test/xfail/test_xfail_pins.ml @@ -1,38 +1,32 @@ (* SPDX-License-Identifier: MPL-2.0 *) (* SPDX-FileCopyrightText: 2024-2026 hyperpolymath (Jonathan D.A. Jewell ) *) -(** xfail pin harness for the soundness ledger (docs/SOUNDNESS.adoc, property 5). - - Each pin asserts the DESIRED behaviour of a known soundness hole. Because the - hole is still present, the assertion currently FAILS — which is the expected - state, reported as [XFAIL-OK]. The day the hole is fixed, the assertion - PASSES, which is unexpected and reported as [XPASS]: the process exits - non-zero (so [dune runtest] and the gate both go red) with a message telling - the engineer to update the ledger row rather than silence the pin. - - Polarity / fail-closed contract: - - pin body raises [Xfail] when the desired behaviour is NOT met (hole - present) -> XFAIL-OK (exit 0 contribution) - - pin body returns normally (desired behaviour met, hole gone) - -> XPASS (exit 1) - - pin body raises ANY OTHER exception (parse error, missing fixture, …) - -> XERROR (exit 1) — pin infrastructure is broken, - so we fail closed rather than mistake it for an - expected failure. +(** Soundness pin harness for docs/SOUNDNESS.adoc (property 5). + + Two kinds of executable soundness-row check live here: + + - [pins] — *xfail* pins for `residual (pinned)` rows. Each asserts the + DESIRED behaviour; because the hole is still present the assertion fails, + which is expected and reported [XFAIL-OK]. The day the hole is fixed it + passes — reported [XPASS], non-zero exit — telling the engineer to update + the ledger row rather than silence the pin. (Fail-closed: any OTHER + exception is [XERROR], also non-zero — broken pin infrastructure must not + masquerade as an expected failure.) + + - [fences] — positive checks for `loud-fail` rows: the hole is closed by a + loud failure, and the check asserts that loud failure still holds (it + passes now and would fail if the silent behaviour regressed). The shell gate (tools/check-soundness-ledger.sh, property 5) runs this - executable, asserts every pin named by a `residual (pinned)` / `open - (tracked)` ledger row reports XFAIL-OK, and surfaces XPASS as the distinct - "is the hole fixed?" alarm. *) + executable and asserts every pin named by a `residual (pinned)` / `open + (tracked)` row reports XFAIL-OK; XPASS is surfaced as the distinct "is the + hole fixed?" alarm. The exe exits non-zero on any XPASS / XERROR / fence + failure, so `dune runtest` catches them too. *) open Affinescript exception Xfail of string -(* Locate test/e2e/fixtures across the run contexts we care about: - AFFINE_FIXTURES override (the gate sets this), the dune-test sandbox - (cwd = _build/default/test/xfail, fixtures at ../e2e/fixtures via the - source_tree dep), cwd = test/, and cwd = repo root. Fail closed if none. *) let fixtures_dir = let cands = (match Sys.getenv_opt "AFFINE_FIXTURES" with Some d -> [ d ] | None -> []) @@ -56,7 +50,7 @@ let resolve_symbols prog = | Ok (rctx, _) -> rctx.symbols | Error (e, _) -> failwith ("resolve: " ^ Resolve.show_resolve_error e) -(* ---- Pin: #555 / #623 — interpreter non-tail single-shot resume ---------- +(* ---- xfail pin: #555 / #623 — interpreter non-tail single-shot resume ----- `let x = op(); x + 100` with `op() => resume(5)` should yield 105 once the continuation is reified; the shallow tree-walking interpreter yields 5. *) let test_resume_nontail_xfail () = @@ -82,25 +76,29 @@ let test_resume_nontail_xfail () = raise (Xfail ("non-tail resume returned " ^ Value.show_value other ^ ", not 105")) -(* ---- Pin: #555-stub / #624 — Lean backend drops `return` ------------------ - The agreed remedy (#624) is to fail loud on early `return`; today - `codegen_lean` returns Ok and silently lowers the body to `:= ()`. *) -let test_stub_backend_return_xfail () = +(* ---- fence: #555-stub / #624 — Lean backend fails loud on `return` --------- + Fixed: `codegen_lean` now returns Error on an early `return` (the + fn_body_contains_return fence) instead of silently lowering to `:= ()`. This + positive check asserts the loud failure holds; it would fail if the silent + drop regressed. *) +let test_stub_backend_return_fenced () = let path = fixture "stub_backend_return_dropped.affine" in let prog = parse path in let symbols = resolve_symbols prog in match Lean_codegen.codegen_lean prog symbols with - | Error _ -> () (* desired met -> XPASS: the backend now fails loud *) + | Error _ -> () (* desired loud failure holds *) | Ok _ -> - raise - (Xfail - "Lean_codegen.codegen_lean returned Ok; early `return` silently dropped") + failwith + "Lean_codegen.codegen_lean returned Ok on an early `return` — the #624 \ + loud fence regressed to a silent drop" let pins = [ ("test_resume_nontail_xfail", - "#555/#623 interp non-tail resume -> 5 not 105", test_resume_nontail_xfail); - ("test_stub_backend_return_xfail", - "#555-stub/#624 Lean drops `return`", test_stub_backend_return_xfail) ] + "#555/#623 interp non-tail resume -> 5 not 105", test_resume_nontail_xfail) ] + +let fences = + [ ("test_stub_backend_return_fenced", + "#555-stub/#624 Lean fails loud on `return`", test_stub_backend_return_fenced) ] let () = let bad = ref false in @@ -117,5 +115,13 @@ let () = bad := true; Printf.printf "XERROR %s -- pin infrastructure failed: %s\n" name m) pins; + List.iter + (fun (name, reason, body) -> + match try `Ok (body ()) with e -> `Err (Printexc.to_string e) with + | `Ok () -> Printf.printf "FENCE-OK %s -- %s\n" name reason + | `Err m -> + bad := true; + Printf.printf "FENCE-FAIL %s -- loud failure regressed: %s\n" name m) + fences; flush stdout; if !bad then exit 1 else exit 0 diff --git a/tools/check-soundness-ledger.sh b/tools/check-soundness-ledger.sh index a077c416..13f3e4f9 100755 --- a/tools/check-soundness-ledger.sh +++ b/tools/check-soundness-ledger.sh @@ -22,9 +22,8 @@ # Anchors are named in the ledger table's column 4 ("Proven in code"). From the # table region of docs/SOUNDNESS.adoc we extract three anchor kinds: # FIXTURE test/e2e/fixtures/.affine -> digest = sha256 of the file -# SUITE test/.ml -> existence + stamp only -# (a whole-file digest is too -# coarse to content-bind) +# SUITE test/.ml -> digest = sha256 of the whole +# file (a suite is its own unit) # TEST backtick-quoted `test_` -> digest = sha256 of the test's # function body, extracted from its `let ` line to the next # top-level `let ` in whichever single test/**/*.ml defines it. @@ -114,16 +113,19 @@ hash_anchor() { grab && index($0,"let ")==1 {exit} grab {print} ' "$file" | sha256sum | cut -d' ' -f1 ;; + suite) + [ -f "$loc" ] || die "anchor ${a}: suite file not found" + sha256sum "$loc" | cut -d' ' -f1 ;; *) die "internal: hash_anchor on non-digest kind: ${a}" ;; esac } -# --- the digest manifest (fixtures + test bodies; not suite files) ---------- +# --- the digest manifest (fixtures + test bodies + suite files) ------------- generate_manifest() { local a while IFS= read -r a; do [ -z "$a" ] && continue - case "$a" in fixture:*|test:*) printf '%s %s\n' "$(hash_anchor "$a")" "$a" ;; esac + case "$a" in fixture:*|test:*|suite:*) printf '%s %s\n' "$(hash_anchor "$a")" "$a" ;; esac done < <(extract_anchors) | LC_ALL=C sort } @@ -137,7 +139,7 @@ if [ "${1:-}" = "--reseal" ]; then echo "# Soundness anchor digests for docs/SOUNDNESS.adoc — DO NOT hand-edit." echo "# Regenerate: tools/check-soundness-ledger.sh --reseal" echo "# Format: . FIXTURE = file hash; TEST = the" - echo "# test's function-body hash. SUITE anchors are existence-checked only." + echo "# test's function-body hash; SUITE = whole-file hash." generate_manifest } > "$MANIFEST" echo "OK: resealed ${MANIFEST} ($(grep -cvE '^#' "$MANIFEST") anchors)." @@ -208,6 +210,21 @@ check_stamp() { return 1 fi fi + # Freshness vs main (best-effort; skipped if origin/main is absent, e.g. a + # shallow CI checkout): if this branch is BEHIND main on a soundness path, the + # ledger may be stale relative to a change not yet rebased in. Fires only when + # main actually moved a soundness path under you — never for a fresh branch. + if git rev-parse --verify -q origin/main >/dev/null 2>&1; then + local mb; mb="$(git merge-base HEAD origin/main 2>/dev/null || true)" + if [ -n "$mb" ]; then + local behind; behind="$(git diff --name-only "$mb" origin/main -- "$LEDGER" "${SOUNDNESS_LIB_PATHS[@]}" 2>/dev/null || true)" + if [ -n "$behind" ]; then + note "ERROR (property 4): main moved a soundness path under this branch; rebase onto origin/main and re-verify the ledger:" + printf '%s\n' "$behind" | sed 's/^/ /' >&2 + return 1 + fi + fi + fi } # ---- Property 5: pin-liveness (xfail harness) ------------------------------ @@ -221,7 +238,9 @@ pinned_row_pins() { END { emit() } function emit() { if (row=="") return - if (row ~ /residual \(pinned\)/ || row ~ /open \(tracked\)/) { + # Match the STATUS CELL (a line that is `| `status``), not a prose mention + # of the status word elsewhere in the row. + if (row ~ /\n\| `residual \(pinned\)`/ || row ~ /\n\| `open \(tracked\)`/) { if (match(row, /test_[A-Za-z0-9_]+_xfail/)) print substr(row, RSTART, RLENGTH) else if (match(row, /test_[A-Za-z0-9_]+/)) print substr(row, RSTART, RLENGTH) else print "NOPIN" diff --git a/tools/soundness-anchors.sha256 b/tools/soundness-anchors.sha256 index 69d7b2ba..af58d440 100644 --- a/tools/soundness-anchors.sha256 +++ b/tools/soundness-anchors.sha256 @@ -2,12 +2,13 @@ # Soundness anchor digests for docs/SOUNDNESS.adoc — DO NOT hand-edit. # Regenerate: tools/check-soundness-ledger.sh --reseal # Format: . FIXTURE = file hash; TEST = the -# test's function-body hash. SUITE anchors are existence-checked only. +# test's function-body hash; SUITE = whole-file hash. 0d3623cf36a67860106d64bb32d30633978bb426d4ddd77ecc2bf6e48e1b7263 fixture:test/e2e/fixtures/borrow_callee_returned_borrow_uam.affine -20f7fd48892f586fccba9bf549ff4db8ccaf97304b37052f45c0f283c324da41 test:test_resume_nontail_xfail 5b84a4f71aa65f9aa9383e31c0cb624291c4f9d0de7cca4a0ea6a76fe85aea35 fixture:test/e2e/fixtures/handle_resume_nontail.affine 68f0f4ad2462f146b87f491b7456aff9a56b77397344d22c62f4f4ff5ca6e3ae fixture:test/e2e/fixtures/assume_rejected.affine -a391d19e0190938ed0ccb4a495d8aac2f178bbe2b15f9723f64b16064f973bb6 test:test_stub_backend_return_xfail +7f416f186a15511ec5c64d2b96bfffbdf9e7c8c0879951d56c80e46129e2ea48 test:test_stub_backend_return_fenced +80541ee00580b4dccee4657cbd31d11a1ddaa18cc49813b48f686c9509f314dc suite:test/test_borrow_polonius.ml +874a5531cd8bc402adf4b66277b21b15d921efe06a7128ea6583bcd6e55781ba test:test_resume_nontail_xfail ac1da10c3206605304f11e869f6aa07acba57f6f8248bb2dfacca19984f86c31 test:test_coherence_duplicate_rejected b76c7824a9b1570d66752359a28d75ebdfaedb5de5ad68c343a0a2c26ad4221c fixture:test/e2e/fixtures/async_sync_fallback.affine c743b86cfc50512e5867ddd792b4b4273b5bfb6f40205b4a07019031d7e7b102 fixture:test/e2e/fixtures/handle_resume_multishot.affine