Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 26 additions & 30 deletions docs/SOUNDNESS.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
// Freshness stamp — a real main-ancestor commit (survives squash-merge); advance
// it (and --reseal) in any change that touches a soundness path. See "Keeping
// this honest" and tools/check-soundness-ledger.sh (property 4).
:ground-truth-sha: dd6c19e
:ground-truth-sha: 65c2ae3
:ground-truth-date: 2026-06-21
:stamp-verified-by: tools/check-soundness-ledger.sh
:toc: macro
Expand Down Expand Up @@ -59,13 +59,11 @@ Anchored to tests:: Each claim cites the executable artefact that proves it. The
gate verifies the artefact *exists*.

Content-bound:: The gate also verifies the artefact still *says what it said*:
each fixture and pinned-test anchor is pinned to a content digest in
`tools/soundness-anchors.sha256`, so a fixture cannot have its assertion eroded —
weakened, re-baselined, or silently re-scoped — without the digest mismatching
and the build failing. (Suite-file anchors such as `test/test_borrow_polonius.ml`
are existence- and stamp-checked but not digest-bound: a whole-file hash is too
coarse to be a meaningful content seal.) Prose nailed to a fixture *by name only*
can still rot if the fixture's meaning drifts; prose nailed *by content* cannot.
each anchor — fixture, pinned-test body, and suite file — is pinned to a content
digest in `tools/soundness-anchors.sha256`, so an anchor cannot have its assertion
eroded — weakened, re-baselined, or silently re-scoped — without the digest
mismatching and the build failing. Prose nailed to a fixture *by name only* can
still rot if the fixture's meaning drifts; prose nailed *by content* cannot.

Stamped, and the stamp is enforced:: The freshness attribute records the
main-ancestor commit this was last reconciled against. The gate fails if `HEAD`
Expand Down Expand Up @@ -200,21 +198,19 @@ corpus with a shrinking allowlist (7 known divergences).
count pinned at 7)

| #555-stub / #624
| The Lean and Why3 experimental backends drop `return` statements wholesale
(`lean_codegen.ml` `gen_block` skips `StmtReturn`, so `fn f(){ return 1; … }`
lowers to `:= ()`) — a silent-wrong-value shape, the same _category_ as the #555
codegen hole but on the stub-backend path. Not yet fenced. The backends are
`experimental` so no production verdict depends on them; the agreed remedy (#624)
is to fail loud. (NB: #628 tracked this as "#560"; that issue is actually
variable-string wasm ops — #624 is the correct tracker, and this change supplies
the pin #628 could not.)
| `residual (pinned)`
| The Lean and Why3 experimental backends now FAIL LOUD on an early `return`
instead of silently dropping it. Previously `gen_block` skipped the `return`
statements, so `fn f(){ return 1; … }` lowered to `:= ()`; now a
`Ast.fn_body_contains_return` fence makes `codegen_lean` / `codegen_why3` return
`Error`. Fixed via #624. (NB: #628 tracked this as "#560"; that issue is actually
variable-string wasm ops — #624 is the correct tracker. This change supplies the
loud fence #628 could not, so the row is now `loud-fail` (it was previously a
pinned residual).)
| `loud-fail`
| `test/e2e/fixtures/stub_backend_return_dropped.affine` +
`test_stub_backend_return_xfail` (xfail: asserts the desired loud failure —
`Lean_codegen.codegen_lean` returning `Error` — on an early-`return` program;
currently fails-as-expected because the backend silently returns `Ok`; flips to
an unexpected pass the day the backend fails loud or is removed). See _Still
open_.
`test_stub_backend_return_fenced` (positive check: asserts
`Lean_codegen.codegen_lean` returns `Error` on the early-`return` program — it was
`Ok`; the check would fail if the silent drop regressed).
|===

== Still open — read before claiming "sound"
Expand All @@ -226,14 +222,14 @@ row in the table above, restated for the hurried reader — not a separate list:
* *Interpreter non-tail resume* (#555 / #623, `residual (pinned)`) — the
`5`-not-`105` shape. Silent on the interpreter path; pinned by
`test_resume_nontail_xfail`.
* *Stub backends drop `return`* (#555-stub / #624, `residual (pinned)`) — Lean and
Why3 drop `return` wholesale. Silent-wrong-value, broader than #555 codegen,
fenced only by `experimental` gating and a pinning xfail, not by a loud failure.
Treat all non-reference backends as experimental (see
`docs/CAPABILITY-MATRIX.adoc`).

If you are deciding whether AffineScript is "sound enough" for a use, these two
rows are the answer, and the metatheory caveat below applies to *both*.
(Stub backends dropping `return` (#555-stub / #624) is now `loud-fail` — Lean and
Why3 reject an early `return` instead of silently dropping it — so it is no longer
a "still open" residual. Treat all non-reference backends as experimental anyway,
see `docs/CAPABILITY-MATRIX.adoc`.)

If you are deciding whether AffineScript is "sound enough" for a use, the
interpreter non-tail resume row above is the one remaining residual, and the
metatheory caveat below applies to it.

=== Pinned-residual discipline

Expand Down
41 changes: 41 additions & 0 deletions lib/ast.ml
Original file line number Diff line number Diff line change
Expand Up @@ -499,3 +499,44 @@ type program = {
prog_decls : top_level list;
}
[@@deriving show, eq]

(** Does a function body contain a `return`? Used by the experimental Lean/Why3
emitters to FAIL LOUD rather than silently drop control flow (Refs #624):
their [gen_block] skips non-`let` statements, so an early `return` would
otherwise vanish. Covers the standard nesting (block / if / let / binary /
unary / app / span / while / for / assign); unlisted exotic variants default
to [false] (conservative — they are not generated by the fixtures this
fences, and the backends are experimental). *)
let rec expr_contains_return (e : expr) : bool =
match e with
| ExprReturn _ -> true
| ExprSpan (inner, _) -> expr_contains_return inner
| ExprBlock blk -> block_contains_return blk
| ExprIf { ei_cond; ei_then; ei_else } ->
expr_contains_return ei_cond
|| expr_contains_return ei_then
|| (match ei_else with Some e -> expr_contains_return e | None -> false)
| ExprLet { el_value; el_body; _ } ->
expr_contains_return el_value
|| (match el_body with Some e -> expr_contains_return e | None -> false)
| ExprBinary (a, _, b) -> expr_contains_return a || expr_contains_return b
| ExprUnary (_, x) -> expr_contains_return x
| ExprApp (f, args) ->
expr_contains_return f || List.exists expr_contains_return args
| _ -> false

and block_contains_return (blk : block) : bool =
List.exists stmt_contains_return blk.blk_stmts
|| (match blk.blk_expr with Some e -> expr_contains_return e | None -> false)

and stmt_contains_return (s : stmt) : bool =
match s with
| StmtExpr e -> expr_contains_return e
| StmtLet { sl_value; _ } -> expr_contains_return sl_value
| StmtAssign (a, _, b) -> expr_contains_return a || expr_contains_return b
| StmtWhile (c, b) -> expr_contains_return c || block_contains_return b
| StmtFor (_, e, b) -> expr_contains_return e || block_contains_return b

let fn_body_contains_return : fn_body -> bool = function
| FnExpr e -> expr_contains_return e
| FnBlock b -> block_contains_return b
9 changes: 9 additions & 0 deletions lib/lean_codegen.ml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,15 @@ and gen_block (blk : block) : string =

let gen_function (fd : fn_decl) : string =
let name = mangle fd.fd_name.name in
(* #624: fail loud on `return` rather than silently drop it. The block emitter
skips non-`let` statements, so an early `return` would vanish (the body
would lower to `:= ()`). Refuse instead — these are experimental backends. *)
if fn_body_contains_return fd.fd_body then
failwith
(Printf.sprintf
"Lean backend does not support `return` (in `%s`, Refs #624): early \
returns would silently drop control flow; rewrite as a tail expression"
fd.fd_name.name);
let params = match fd.fd_params with
| [] -> ""
| _ -> " " ^ String.concat " " (List.map (fun (p : param) ->
Expand Down
7 changes: 7 additions & 0 deletions lib/why3_codegen.ml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,13 @@ and gen_block (blk : block) : string =

let gen_function (fd : fn_decl) : string =
let name = mangle fd.fd_name.name in
(* #624: fail loud on `return` rather than silently drop it (see lean_codegen). *)
if fn_body_contains_return fd.fd_body then
failwith
(Printf.sprintf
"Why3 backend does not support `return` (in `%s`, Refs #624): early \
returns would silently drop control flow; rewrite as a tail expression"
fd.fd_name.name);
let params = match fd.fd_params with
| [] -> "()"
| _ -> String.concat " " (List.map (fun (p : param) ->
Expand Down
80 changes: 43 additions & 37 deletions test/xfail/test_xfail_pins.ml
Original file line number Diff line number Diff line change
@@ -1,38 +1,32 @@
(* SPDX-License-Identifier: MPL-2.0 *)
(* SPDX-FileCopyrightText: 2024-2026 hyperpolymath (Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>) *)

(** xfail pin harness for the soundness ledger (docs/SOUNDNESS.adoc, property 5).

Each pin asserts the DESIRED behaviour of a known soundness hole. Because the
hole is still present, the assertion currently FAILS — which is the expected
state, reported as [XFAIL-OK]. The day the hole is fixed, the assertion
PASSES, which is unexpected and reported as [XPASS]: the process exits
non-zero (so [dune runtest] and the gate both go red) with a message telling
the engineer to update the ledger row rather than silence the pin.

Polarity / fail-closed contract:
- pin body raises [Xfail] when the desired behaviour is NOT met (hole
present) -> XFAIL-OK (exit 0 contribution)
- pin body returns normally (desired behaviour met, hole gone)
-> XPASS (exit 1)
- pin body raises ANY OTHER exception (parse error, missing fixture, …)
-> XERROR (exit 1) — pin infrastructure is broken,
so we fail closed rather than mistake it for an
expected failure.
(** Soundness pin harness for docs/SOUNDNESS.adoc (property 5).

Two kinds of executable soundness-row check live here:

- [pins] — *xfail* pins for `residual (pinned)` rows. Each asserts the
DESIRED behaviour; because the hole is still present the assertion fails,
which is expected and reported [XFAIL-OK]. The day the hole is fixed it
passes — reported [XPASS], non-zero exit — telling the engineer to update
the ledger row rather than silence the pin. (Fail-closed: any OTHER
exception is [XERROR], also non-zero — broken pin infrastructure must not
masquerade as an expected failure.)

- [fences] — positive checks for `loud-fail` rows: the hole is closed by a
loud failure, and the check asserts that loud failure still holds (it
passes now and would fail if the silent behaviour regressed).

The shell gate (tools/check-soundness-ledger.sh, property 5) runs this
executable, asserts every pin named by a `residual (pinned)` / `open
(tracked)` ledger row reports XFAIL-OK, and surfaces XPASS as the distinct
"is the hole fixed?" alarm. *)
executable and asserts every pin named by a `residual (pinned)` / `open
(tracked)` row reports XFAIL-OK; XPASS is surfaced as the distinct "is the
hole fixed?" alarm. The exe exits non-zero on any XPASS / XERROR / fence
failure, so `dune runtest` catches them too. *)

open Affinescript

exception Xfail of string

(* Locate test/e2e/fixtures across the run contexts we care about:
AFFINE_FIXTURES override (the gate sets this), the dune-test sandbox
(cwd = _build/default/test/xfail, fixtures at ../e2e/fixtures via the
source_tree dep), cwd = test/, and cwd = repo root. Fail closed if none. *)
let fixtures_dir =
let cands =
(match Sys.getenv_opt "AFFINE_FIXTURES" with Some d -> [ d ] | None -> [])
Expand All @@ -56,7 +50,7 @@ let resolve_symbols prog =
| Ok (rctx, _) -> rctx.symbols
| Error (e, _) -> failwith ("resolve: " ^ Resolve.show_resolve_error e)

(* ---- Pin: #555 / #623 — interpreter non-tail single-shot resume ----------
(* ---- xfail pin: #555 / #623 — interpreter non-tail single-shot resume -----
`let x = op(); x + 100` with `op() => resume(5)` should yield 105 once the
continuation is reified; the shallow tree-walking interpreter yields 5. *)
let test_resume_nontail_xfail () =
Expand All @@ -82,25 +76,29 @@ let test_resume_nontail_xfail () =
raise
(Xfail ("non-tail resume returned " ^ Value.show_value other ^ ", not 105"))

(* ---- Pin: #555-stub / #624 — Lean backend drops `return` ------------------
The agreed remedy (#624) is to fail loud on early `return`; today
`codegen_lean` returns Ok and silently lowers the body to `:= ()`. *)
let test_stub_backend_return_xfail () =
(* ---- fence: #555-stub / #624 — Lean backend fails loud on `return` ---------
Fixed: `codegen_lean` now returns Error on an early `return` (the
fn_body_contains_return fence) instead of silently lowering to `:= ()`. This
positive check asserts the loud failure holds; it would fail if the silent
drop regressed. *)
let test_stub_backend_return_fenced () =
let path = fixture "stub_backend_return_dropped.affine" in
let prog = parse path in
let symbols = resolve_symbols prog in
match Lean_codegen.codegen_lean prog symbols with
| Error _ -> () (* desired met -> XPASS: the backend now fails loud *)
| Error _ -> () (* desired loud failure holds *)
| Ok _ ->
raise
(Xfail
"Lean_codegen.codegen_lean returned Ok; early `return` silently dropped")
failwith
"Lean_codegen.codegen_lean returned Ok on an early `return` — the #624 \
loud fence regressed to a silent drop"

let pins =
[ ("test_resume_nontail_xfail",
"#555/#623 interp non-tail resume -> 5 not 105", test_resume_nontail_xfail);
("test_stub_backend_return_xfail",
"#555-stub/#624 Lean drops `return`", test_stub_backend_return_xfail) ]
"#555/#623 interp non-tail resume -> 5 not 105", test_resume_nontail_xfail) ]

let fences =
[ ("test_stub_backend_return_fenced",
"#555-stub/#624 Lean fails loud on `return`", test_stub_backend_return_fenced) ]

let () =
let bad = ref false in
Expand All @@ -117,5 +115,13 @@ let () =
bad := true;
Printf.printf "XERROR %s -- pin infrastructure failed: %s\n" name m)
pins;
List.iter
(fun (name, reason, body) ->
match try `Ok (body ()) with e -> `Err (Printexc.to_string e) with
| `Ok () -> Printf.printf "FENCE-OK %s -- %s\n" name reason
| `Err m ->
bad := true;
Printf.printf "FENCE-FAIL %s -- loud failure regressed: %s\n" name m)
fences;
flush stdout;
if !bad then exit 1 else exit 0
33 changes: 26 additions & 7 deletions tools/check-soundness-ledger.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,8 @@
# Anchors are named in the ledger table's column 4 ("Proven in code"). From the
# table region of docs/SOUNDNESS.adoc we extract three anchor kinds:
# FIXTURE test/e2e/fixtures/<name>.affine -> digest = sha256 of the file
# SUITE test/<name>.ml -> existence + stamp only
# (a whole-file digest is too
# coarse to content-bind)
# SUITE test/<name>.ml -> digest = sha256 of the whole
# file (a suite is its own unit)
# TEST backtick-quoted `test_<name>` -> digest = sha256 of the test's
# function body, extracted from its `let <name> ` line to the next
# top-level `let ` in whichever single test/**/*.ml defines it.
Expand Down Expand Up @@ -114,16 +113,19 @@ hash_anchor() {
grab && index($0,"let ")==1 {exit}
grab {print}
' "$file" | sha256sum | cut -d' ' -f1 ;;
suite)
[ -f "$loc" ] || die "anchor ${a}: suite file not found"
sha256sum "$loc" | cut -d' ' -f1 ;;
*) die "internal: hash_anchor on non-digest kind: ${a}" ;;
esac
}

# --- the digest manifest (fixtures + test bodies; not suite files) ----------
# --- the digest manifest (fixtures + test bodies + suite files) -------------
generate_manifest() {
local a
while IFS= read -r a; do
[ -z "$a" ] && continue
case "$a" in fixture:*|test:*) printf '%s %s\n' "$(hash_anchor "$a")" "$a" ;; esac
case "$a" in fixture:*|test:*|suite:*) printf '%s %s\n' "$(hash_anchor "$a")" "$a" ;; esac
done < <(extract_anchors) | LC_ALL=C sort
}

Expand All @@ -137,7 +139,7 @@ if [ "${1:-}" = "--reseal" ]; then
echo "# Soundness anchor digests for docs/SOUNDNESS.adoc — DO NOT hand-edit."
echo "# Regenerate: tools/check-soundness-ledger.sh --reseal"
echo "# Format: <sha256> <kind:locator>. FIXTURE = file hash; TEST = the"
echo "# test's function-body hash. SUITE anchors are existence-checked only."
echo "# test's function-body hash; SUITE = whole-file hash."
generate_manifest
} > "$MANIFEST"
echo "OK: resealed ${MANIFEST} ($(grep -cvE '^#' "$MANIFEST") anchors)."
Expand Down Expand Up @@ -208,6 +210,21 @@ check_stamp() {
return 1
fi
fi
# Freshness vs main (best-effort; skipped if origin/main is absent, e.g. a
# shallow CI checkout): if this branch is BEHIND main on a soundness path, the
# ledger may be stale relative to a change not yet rebased in. Fires only when
# main actually moved a soundness path under you — never for a fresh branch.
if git rev-parse --verify -q origin/main >/dev/null 2>&1; then
local mb; mb="$(git merge-base HEAD origin/main 2>/dev/null || true)"
if [ -n "$mb" ]; then
local behind; behind="$(git diff --name-only "$mb" origin/main -- "$LEDGER" "${SOUNDNESS_LIB_PATHS[@]}" 2>/dev/null || true)"
if [ -n "$behind" ]; then
note "ERROR (property 4): main moved a soundness path under this branch; rebase onto origin/main and re-verify the ledger:"
printf '%s\n' "$behind" | sed 's/^/ /' >&2
return 1
fi
fi
fi
}

# ---- Property 5: pin-liveness (xfail harness) ------------------------------
Expand All @@ -221,7 +238,9 @@ pinned_row_pins() {
END { emit() }
function emit() {
if (row=="") return
if (row ~ /residual \(pinned\)/ || row ~ /open \(tracked\)/) {
# Match the STATUS CELL (a line that is `| `status``), not a prose mention
# of the status word elsewhere in the row.
if (row ~ /\n\| `residual \(pinned\)`/ || row ~ /\n\| `open \(tracked\)`/) {
if (match(row, /test_[A-Za-z0-9_]+_xfail/)) print substr(row, RSTART, RLENGTH)
else if (match(row, /test_[A-Za-z0-9_]+/)) print substr(row, RSTART, RLENGTH)
else print "NOPIN"
Expand Down
7 changes: 4 additions & 3 deletions tools/soundness-anchors.sha256
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
# Soundness anchor digests for docs/SOUNDNESS.adoc — DO NOT hand-edit.
# Regenerate: tools/check-soundness-ledger.sh --reseal
# Format: <sha256> <kind:locator>. FIXTURE = file hash; TEST = the
# test's function-body hash. SUITE anchors are existence-checked only.
# test's function-body hash; SUITE = whole-file hash.
0d3623cf36a67860106d64bb32d30633978bb426d4ddd77ecc2bf6e48e1b7263 fixture:test/e2e/fixtures/borrow_callee_returned_borrow_uam.affine
20f7fd48892f586fccba9bf549ff4db8ccaf97304b37052f45c0f283c324da41 test:test_resume_nontail_xfail
5b84a4f71aa65f9aa9383e31c0cb624291c4f9d0de7cca4a0ea6a76fe85aea35 fixture:test/e2e/fixtures/handle_resume_nontail.affine
68f0f4ad2462f146b87f491b7456aff9a56b77397344d22c62f4f4ff5ca6e3ae fixture:test/e2e/fixtures/assume_rejected.affine
a391d19e0190938ed0ccb4a495d8aac2f178bbe2b15f9723f64b16064f973bb6 test:test_stub_backend_return_xfail
7f416f186a15511ec5c64d2b96bfffbdf9e7c8c0879951d56c80e46129e2ea48 test:test_stub_backend_return_fenced
80541ee00580b4dccee4657cbd31d11a1ddaa18cc49813b48f686c9509f314dc suite:test/test_borrow_polonius.ml
874a5531cd8bc402adf4b66277b21b15d921efe06a7128ea6583bcd6e55781ba test:test_resume_nontail_xfail
ac1da10c3206605304f11e869f6aa07acba57f6f8248bb2dfacca19984f86c31 test:test_coherence_duplicate_rejected
b76c7824a9b1570d66752359a28d75ebdfaedb5de5ad68c343a0a2c26ad4221c fixture:test/e2e/fixtures/async_sync_fallback.affine
c743b86cfc50512e5867ddd792b4b4273b5bfb6f40205b4a07019031d7e7b102 fixture:test/e2e/fixtures/handle_resume_multishot.affine
Expand Down
Loading