diff --git a/.hypatia-ignore b/.hypatia-ignore index 5cce87c2..d0bb6e8f 100644 --- a/.hypatia-ignore +++ b/.hypatia-ignore @@ -71,3 +71,7 @@ cicd_rules/vlang_detected:formal/QttTyping.v cicd_rules/banned_language_file:formal/QttTyping.v cicd_rules/vlang_detected:formal/QttDynamic.v cicd_rules/banned_language_file:formal/QttDynamic.v +cicd_rules/vlang_detected:formal/F5_RenderFaithful.v +cicd_rules/banned_language_file:formal/F5_RenderFaithful.v +cicd_rules/vlang_detected:formal/RealWasm.v +cicd_rules/banned_language_file:formal/RealWasm.v diff --git a/docs/PROOF-NEEDS.adoc b/docs/PROOF-NEEDS.adoc index e0f5c66b..efa03811 100644 --- a/docs/PROOF-NEEDS.adoc +++ b/docs/PROOF-NEEDS.adoc @@ -70,7 +70,7 @@ than the prose corpus suggests: `TypeSafety.v` is example lemmas about list length, not AffineScript. Not core-metatheory. * **`formal/`** — the directory #513 names as the mechanized-proof target now - **exists** (Coq/Rocq 8.18), axiom-free throughout — **14 files, 23 `Print + **exists** (Coq/Rocq 8.18), axiom-free throughout — **16 files, 27 `Print Assumptions` closure reports**, all "Closed under the global context". Codegen keystone: `K1_CodegenPreservation.v` (K-1 minimal) + `K1Let_CodegenPreservation.v` (K-1 grown with variables/`let`/environment). @@ -79,7 +79,8 @@ than the prose corpus suggests: in `P2_Progress.v` (first-order) + `P2_Stlc.v` (STLC with functions + the substitution lemma, funext-free) / `P3_BorrowSound.v` (single-bit) + `P3_BorrowGraph.v` (multi-resource loan graph) / `F3_PragmaDecidable.v` / - `F4_ErrorFaithful.v`. Affine/QTT layer: `QttSemiring.v` (`{0,1,ω}`) + + `F4_ErrorFaithful.v` / `F5_RenderFaithful.v` (render non-collision). + Affine/QTT layer: `QttSemiring.v` (`{0,1,ω}`) + `AffineUsage.v` (`λx.x` affine, `λx. x x` not) + `QttTyping.v` (quantitative typing, `usage x t = Γ x` sound) + `QttDynamic.v` (the dynamic half: β-reduction preserves the usage profile). See <>, <>. @@ -220,8 +221,13 @@ The work in flight changes which obligations are *load-bearing* and which are (was `prose`): a Wave-0 mechanized seed — `formal/K1_CodegenPreservation.v` (Coq, no axioms) — proves preservation for a minimal nat/bool · add/and fragment compiled to a stack machine — `K1Let_CodegenPreservation.v` grows it - with de Bruijn variables, `let`, and an environment; the real AffineScript AST - + real typed-WASM operational semantics remain the open obligation. + with de Bruijn variables, `let`, and an environment. The **real lift** onto the + real AffineScript AST + real typed-WASM semantics is scoped in + `formal/REAL-LIFT.adoc` (fragment ladder R0→R-eff); its first stone is + **landed** — `formal/RealWasm.v` (R0) re-targets the *actual* `lib/wasm.ml` + `instr`/`value_type` with a stack-arity soundness theorem (checked code never + traps). R1 (real `RealAst`/`RealAstSem`/`RealCompile` for the int/bool/let + fragment) is the first real `⟦compile p⟧ = ⟦p⟧`. * **K-2 — Effect-soundness is *blocked*, not merely unproven.** P-6 cannot be honestly *stated against the current backend* because #555 drops handler arms on three of the codegen targets. The obligation must be split: (a) prove @@ -302,8 +308,13 @@ cover them. renaming of canonical type names, and never collapses two distinct canonical types to one face string inside a single message (e.g. Js maps both `Unit` and `Option[T]` into "null"-shaped text — show this is unambiguous in context). -| S | `absent` -| new; `lib/face.ml` `render_ty` +| S | `partial` +| **mechanized** for the vocabulary model: `formal/F5_RenderFaithful.v` proves + every face's renaming is injective (`render_inj` — no two canonical types + collapse onto one string), incl. the Js/Cafe "null"/"?" overlap + (`js_no_collapse`, `cafe_no_collapse`, `option_never_atom`). Faithful to the + whole-string Unit/Bool match + canonical-inner Option rewrite; models the + name vocabulary, not raw OCaml strings. `lib/face.ml` `render_ty` | F-6 | **Preview round-trip totality.** The `preview-{python,js,pseudocode,lucid,cafe}` @@ -367,8 +378,9 @@ F-1 (full transformer preservation) is non-trivial rather than a formality. | **Done.** Grew the models toward the real language: **P-2 → STLC with functions + the substitution lemma** (`P2_Stlc.v`, funext-free); **P-3 → a multi-resource borrow graph** (`P3_BorrowGraph.v`: loan edges, liveness, move-locality, - multi-resource #554 rejected). *Leftover:* F-5 (small, self-contained - `render_ty` injectivity) and the K-4 `CAPABILITY-MATRIX` cross-link. + multi-resource #554 rejected); **F-5 → `render_ty` non-collision** + (`F5_RenderFaithful.v`: every face's renaming injective). *Leftover:* the K-4 + `CAPABILITY-MATRIX` cross-link. | Wave 0 | 2 @@ -382,13 +394,16 @@ F-1 (full transformer preservation) is non-trivial rather than a formality. | Wave 1 | 3 -| **The real lift (open — the bulk of the remaining work).** Replace the toy - objects with the real ones: formalise the **real AffineScript AST** and the - **real typed-WASM operational semantics**, then re-prove K-1/F-1/F-2 against - them (this is what makes K-1/F-1 `XL`, and what actually resolves #601). Then +| **The real lift (the bulk of the remaining work) — scoped + started.** Replace + the toy objects with the real ones: formalise the **real AffineScript AST** and + the **real typed-WASM operational semantics**, then re-prove K-1/F-1/F-2 against + them (this is what makes K-1/F-1 `XL`, and what actually resolves #601). Planned + as a strict-superset *fragment ladder* in `formal/REAL-LIFT.adoc` + (R0→R1→R2→R-mem→R-float→R-str→R-call→R-wrap→R-eff); **R0 landed** + (`formal/RealWasm.v` — real `lib/wasm.ml` IR, i32 core, arity soundness). Then the still-`prose`/`absent` obligations: P-5 (inference), P-6 (effects — blocked - on #555/K-2), P-7 (resolution), P-8 (parser), P-10 (coherence); faces F-6/F-7; - full-language borrowing (P-3) pending #553. + on #555/K-2, = rung R-eff), P-7 (resolution), P-8 (parser), P-10 (coherence); + faces F-6/F-7; full-language borrowing (P-3) pending #553. | Wave 2; #555; #553 |=== diff --git a/formal/F5_RenderFaithful.v b/formal/F5_RenderFaithful.v new file mode 100644 index 00000000..2ac59053 --- /dev/null +++ b/formal/F5_RenderFaithful.v @@ -0,0 +1,135 @@ +(* SPDX-License-Identifier: MPL-2.0 *) +(* SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell (hyperpolymath) *) + +(* + F5_RenderFaithful.v + ═══════════════════ + F-5 (docs/PROOF-NEEDS.adoc): `render_ty` faithfulness / non-collision. + + lib/face.ml's `render_ty` rewrites a small fixed vocabulary of canonical type + names per face — Unit, Bool, and Option[_] — leaving everything else + untouched. F-5 asks: is that renaming injective (it never collapses two + distinct canonical types onto one displayed string), even where face lexemes + overlap — Js renders Unit as "null" and Option[T] as "T | null", so both + mention "null"? + + This models the canonical-type vocabulary render_ty branches on (`cty`), the + displayed-name lexemes (`name` — distinct constructors model distinct on-screen + tokens), and the rendered form (`rty`), then mirrors render_ty as + `render : face → cty → rty`. Two fidelity points carried over from the OCaml: + + * Unit/Bool are special-cased only at the WHOLE-type level (`if s = "Unit"`), + never on a nested occurrence. + * Option's inner is rendered *canonically* — the OCaml `global_replace` + captures the canonical substring `\1` — so Js `Option[Option[Int]]` ⇒ + "Option[Int] | null" (inner Option intact), exactly the greedy-regex result. + + Theorems: `canon_inj`; `render_inj` (every face's renaming is injective — the + non-collapse guarantee); and the pointed `js_no_collapse` / `cafe_no_collapse` + / `option_never_atom` (Unit and Option[_] never read as the same type despite + the shared "null"/"?" lexeme). Axiom-free, no Admitted. + + Scope (honest): models the displayed-name vocabulary, not raw OCaml strings; + assumes well-formed type strings (no adversarial text around `Option[...]`). + + `.v` is Coq, not V-lang — see formal/README.adoc and .hypatia-ignore. +*) + +Require Import PeanoNat. + +(* The six faces (lib/face.ml `face`). *) +Inductive face := Canonical | Python | Js | Pseudocode | Lucid | Cafe. + +(* Displayed atom lexemes. Distinct constructors ⇒ distinct on-screen names — + faithful, because the real renamings (None/null/nothing/bool/boolean/Boolean) + are genuinely distinct tokens and no canonical base name equals any of them. *) +Inductive name := +| NUnit | NBool +| NNone | NNothing | NNull +| Nbool | NBoolean | NbooleanJs +| NBase (n : nat). + +(* Rendered type forms. *) +Inductive rty := +| RNm (x : name) +| ROptBr (r : rty) (* Option[ r ] *) +| ROrNull (r : rty) (* r | null *) +| RMaybe (r : rty) (* Maybe r *) +| RQ (r : rty). (* r ? *) + +(* Canonical types: the vocabulary render_ty distinguishes. *) +Inductive cty := +| CUnit | CBool | CBase (n : nat) | COption (a : cty). + +(* Canonical rendering — Option's inner stays canonical (matches the regex \1). *) +Fixpoint canon (t : cty) : rty := + match t with + | CUnit => RNm NUnit + | CBool => RNm NBool + | CBase n => RNm (NBase n) + | COption a => ROptBr (canon a) + end. + +(* Per-face rendering, mirroring lib/face.ml render_ty: top-level special-case + for Unit/Bool; per-face Option wrapper with a canonical inner. *) +Definition render (f : face) (t : cty) : rty := + match t with + | CUnit => + match f with + | Canonical | Lucid => RNm NUnit + | Python => RNm NNone + | Js | Cafe => RNm NNull + | Pseudocode => RNm NNothing + end + | CBool => + match f with + | Canonical => RNm NBool + | Python => RNm Nbool + | Js => RNm NbooleanJs + | Pseudocode | Lucid | Cafe => RNm NBoolean + end + | CBase n => RNm (NBase n) + | COption a => + match f with + | Canonical | Python | Pseudocode => ROptBr (canon a) + | Js => ROrNull (canon a) + | Lucid => RMaybe (canon a) + | Cafe => RQ (canon a) + end + end. + +(* ── canonical rendering is injective ──────────────────────────────────── *) +Lemma canon_inj : forall t1 t2, canon t1 = canon t2 -> t1 = t2. +Proof. + induction t1; destruct t2; simpl; intro H; + try discriminate; try reflexivity. + - inversion H; reflexivity. (* CBase = CBase *) + - inversion H; f_equal; apply IHt1; assumption. (* COption = COption *) +Qed. + +(* ── every face's renaming is injective: the non-collapse guarantee ─────── *) +Theorem render_inj : forall f t1 t2, render f t1 = render f t2 -> t1 = t2. +Proof. + intros f t1 t2 H; destruct f, t1, t2; cbn in H; + solve [ discriminate + | reflexivity + | inversion H; subst; reflexivity + | inversion H; f_equal; apply canon_inj; assumption ]. +Qed. + +(* ── the pointed overlaps don't collapse ─────────────────────────────────── + Js renders Unit as "null" and Option[T] as "T | null"; Cafe as "null"/"T?". + The shared lexeme never makes Unit and an Option read as the same type. *) +Corollary js_no_collapse : forall a, render Js CUnit <> render Js (COption a). +Proof. intros a H; discriminate. Qed. + +Corollary cafe_no_collapse : forall a, render Cafe CUnit <> render Cafe (COption a). +Proof. intros a H; discriminate. Qed. + +(* No Option ever collides with a base/Unit/Bool atom (different rendered shape). *) +Corollary option_never_atom : forall f a x, render f (COption a) <> RNm x. +Proof. intros f a x H; destruct f; cbn in H; discriminate. Qed. + +Print Assumptions render_inj. +Print Assumptions js_no_collapse. +Print Assumptions option_never_atom. diff --git a/formal/README.adoc b/formal/README.adoc index 9343360e..40bbffec 100644 --- a/formal/README.adoc +++ b/formal/README.adoc @@ -89,6 +89,16 @@ cannot mistake it. Coq has no `v.mod` manifest, so `vmod_detected` never fires. | **P-4** (Wave 3) — the *dynamic* half: β-reduction preserves the usage profile; QTT scaling law; ties back to `usage_soundness` | **mechanized**, axiom-free + +| `F5_RenderFaithful.v` +| **F-5** — `render_ty` per-face type renaming is injective (no two canonical + types collapse onto one displayed string), incl. the Js/Cafe "null"/"?" overlap +| **mechanized**, axiom-free + +| `RealWasm.v` +| **Real lift R0** (`REAL-LIFT.adoc`) — the real `lib/wasm.ml` IR (i32 core): + `wexec` + arity checker + soundness (`wexec_sound`: checked code never traps) +| **mechanized**, axiom-free |=== All mechanized theorems report *Closed under the global context* under `Print @@ -100,7 +110,7 @@ the obligation shapes as `Definition ... : Prop` over section `Variable`s each ending in a `*_discharged : Siblings_Stated. … := ` line that type-checks the concrete proof against the stated obligation. -== Mechanized siblings (P-2, P-3, F-3, F-4) +== Mechanized siblings (P-2, P-3, F-3, F-4, F-5) Each is proven for a deliberately small concrete model — enough to discharge the stated obligation and pin its meaning, not the full language: @@ -133,6 +143,14 @@ the stated obligation and pin its meaning, not the full language: * **F-4** (`F4_ErrorFaithful.v`) — the face renderer preserves an error's *class* and *referent*, changing only vocabulary, so no face can make error X read as a different error Y. +* **F-5** (`F5_RenderFaithful.v`) — `render_ty`'s per-face type-name renaming is + *injective*: distinct canonical types never collapse onto one displayed string. + Models the vocabulary it special-cases (`Unit`, `Bool`, `Option[_]`), faithful + to the whole-string Unit/Bool match and the canonical-inner Option rewrite + (Js `Option[Option[Int]]` ⇒ "Option[Int] | null"). Proves the pointed case the + obligation names — Js/Cafe render Unit as "null"/"?" *and* `Option[T]` with the + same lexeme, yet the two never read as one type (`js_no_collapse`, + `cafe_no_collapse`, `option_never_atom`). == Wave 2: the affine/QTT layer (P-4) @@ -202,12 +220,26 @@ the same canonical AST emit identical code). The observational version for the trailing-statement/tail split is mechanized separately in `invariant-path/proofs/SameCube.agda` (F-2). +== The real lift (R0) + +`RealWasm.v` is the first foundation stone of the *real lift* — replacing the +toy stack machine with the actual `lib/wasm.ml` IR. R0 covers the pure i32 +numeric core (real `instr` / `value_type` names), a stack-machine `wexec`, an +arity checker `wcheck`, and the soundness theorem `wexec_sound`: arity-checked +code never gets stuck — the target-side invariant the real K-1 preservation +proof rests on. The full plan — the fragment ladder R0 → R1 → R2 → R-mem → +R-float → R-str → R-call → R-wrap → R-eff, the Coq module structure, and the +strategy for each hard part (loops/termination, linear memory, floats, the +elaboration nodes, effects) — is `formal/REAL-LIFT.adoc`. + == Scope (honest) -These are Wave-0 seeds, not the discharged obligations. The fragments are +These are Wave-0/1 seeds, not the discharged obligations. The fragments are deliberately small; the full theorems need the real AffineScript AST and the real typed-WASM operational semantics, expanded the way solo-core's -Duet/Ensemble tracks expand Solo. Tracked in `docs/PROOF-NEEDS.adoc` and #513. +Duet/Ensemble tracks expand Solo. That expansion has **begun** — see the real +lift above (`RealWasm.v`, R0) and the plan in `formal/REAL-LIFT.adoc`. Tracked +in `docs/PROOF-NEEDS.adoc` and #513. == Checking diff --git a/formal/REAL-LIFT.adoc b/formal/REAL-LIFT.adoc new file mode 100644 index 00000000..22af1efd --- /dev/null +++ b/formal/REAL-LIFT.adoc @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: CC-BY-SA-4.0 +// SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell (hyperpolymath) += The Real Lift — from toy seeds to the real AffineScript AST + typed-WASM +Jonathan D.A. Jewell +:toc: left +:toclevels: 3 +:sectnums: +:sectnumlevels: 2 + +This is the engineering plan for the single largest remaining body of proof work +in `docs/PROOF-NEEDS.adoc` §6 Wave 3: replacing the deliberately-small models the +`formal/` track proves today with the *real* objects, so that K-1 / F-1 / F-2 and +the dynamic halves of P-2 / P-4 become discharged obligations rather than +meaning-pinning seeds. + +It is grounded in the actual compiler: the source AST is `lib/ast.ml`, the target +IR is `lib/wasm.ml`, the compiler is `lib/codegen.ml` (+ `lib/codegen_gc.ml`), +and the reference dynamic semantics is `lib/interp.ml`. + +[IMPORTANT] +==== +The lift is **XL** (CompCert-scale for a substantial language). It is delivered +as a *ladder* of strict-superset fragments, each fully mechanized and axiom-free +before the next — the way `K1 → K1Let` already grew, and the way solo-core's +`Solo → Duet → Ensemble` is meant to expand. No milestone lands with an +`Admitted`. +==== + +== 1. The theorem + +For the real compiler `compile : Ast.program → Wasm.wasm_module` and the real +operational semantics on each side: + +[source] +---- +⟦ compile(p) ⟧_wasm = ⟦ p ⟧_source (for every well-typed p) +---- + +K-1 is the backend half; F-1 (`⟦T_F(p)⟧ = ⟦T_canonical(canon(p))⟧`) composes the +face transformers on top of it; F-2 (#601) is the observational corollary across +faces. All three collapse to "grow K-1 onto the real objects, then re-chain the +existing F-1/F-2 scaffolding." + +== 2. The two real objects (what we are formalising) + +=== 2.1 Source — `lib/ast.ml` + +The real `expr` has ~30 constructors. Grouped by the order the ladder tackles +them: + +[cols="1,3"] +|=== +| Group | Constructors + +| Core values/binding +| `ExprLit` (Int/Float/Bool/Char/String/Unit), `ExprVar`, `ExprLet` + (carries `el_quantity : quantity option` — the QTT multiplicity), `ExprBlock`, + `stmt` (`StmtLet`/`StmtExpr`/`StmtAssign`/`StmtWhile`/`StmtFor`) + +| Control +| `ExprIf`, `ExprMatch` (+ `match_arm`, guards), `ExprReturn`, `ExprBreak`, + `ExprContinue` + +| Functions +| `ExprLambda` (+ `param` with `p_quantity`/`p_ownership`), `ExprApp`, + top-level `fn_decl` (`FnBlock`/`FnExpr`/`FnExtern`) + +| Aggregates (heap) +| `ExprTuple`/`ExprTupleIndex`, `ExprArray`/`ExprIndex`, `ExprRecord`/`ExprField`/`ExprRowRestrict` + +| Operators +| `ExprBinary` (`binary_op`), `ExprUnary` (`unary_op`, incl. `OpRef`/`OpMutRef`/`OpDeref`) + +| Elaboration nodes (post-typecheck, backend-only — `Typecheck.elaborate_string_concat`) +| `ExprStringConcat`/`ExprStringEq`/`ExprStringRel` (the "string wall"), + `ExprFloatBinary`/`ExprFloatArray`/`ExprFloatIndex` (the "float wall"), + `ExprCellTuple`/`ExprCellTupleIndex`/`ExprCellRecord`/`ExprCellField` + (uniform-8 float-bearing layout) + +| Effects +| `ExprHandle` (+ `handler_arm`), `ExprResume`, `ExprTry`, effect rows on + `TyArrow` + +| Unsafe +| `ExprUnsafe` (`UnsafeRead`/`Write`/`Offset`/`Transmute`/`Forget`) +|=== + +The type layer (`type_expr`) carries the substructural data the proofs care +about: `quantity` `{QZero,QOne,QOmega}` (→ already mechanized in `QttSemiring.v`), +`ownership` `{Own,Ref,Mut}`, `origin_var` (ADR-022 Polonius regions, → P-3 / +`P3_BorrowGraph.v`), effect rows (→ P-6, blocked on #555), and record rows. + +=== 2.2 Target — `lib/wasm.ml` + +A faithful WebAssembly-1.0 IR: `value_type {I32,I64,F32,F64}`, ~150 `instr` +constructors (numeric i32/i64/f32/f64, `LocalGet/Set/Tee`, `GlobalGet/Set`, the +load/store family with align+offset, `Block/Loop/If/Br/BrIf/Return/Call/CallIndirect`, +`MemorySize/Grow`), and a `wasm_module` with `funcs`/`mems`/`tables`/`globals`/ +`exports`/`imports`/`elems`/`datas`/`custom_sections`. The +`typedwasm.ownership` custom section carries `TyOwn/TyRef/TyMut` to the binary +for typed-wasm Level-7/10 verification — the seam where P-3 (borrow soundness) +connects to the target. + +=== 2.3 The gap from the toy K-1 + +`K1_CodegenPreservation.v` invents an ad-hoc stack machine; `RealWasm.v` +(this directory, milestone **R0**, below) re-targets the *actual* `instr` / +`value_type` names with a stack-arity soundness theorem (`wexec_sound`: +arity-checked code never gets stuck). That is the first real foundation stone. + +== 3. Coq module structure + +[cols="1,3,1"] +|=== +| Module | Contents | Status + +| `RealWasm.v` +| Target IR (real `instr`/`value_type` names) + stack-machine `wexec` + + arity checker `wcheck` + soundness `wexec_sound`. +| **R0 landed** + +| `RealWasmSem.v` +| The full target operational semantics as the ladder reaches control/memory: + fuel-indexed step relation, linear-memory model, frame/locals. +| planned (R2+) + +| `RealAst.v` +| The real source `expr`/`stmt`/`program` inductives (faithful to `lib/ast.ml`), + per-fragment. +| planned (S0) + +| `RealAstSem.v` +| The reference dynamic semantics (mirrors `lib/interp.ml`), per-fragment. +| planned (S0) + +| `RealCompile.v` +| The real `compile` (port of `lib/codegen.ml`), per-fragment. +| planned (K-1/R1) + +| `RealPreservation.v` +| `⟦compile p⟧ = ⟦p⟧` — the real K-1, per-fragment; then re-chains + `F1_TransformerPreservation.v` and the `SameCube` F-2 story. +| planned +|=== + +== 4. The fragment ladder + +Each rung is a strict superset; each is fully proven (axiom-free) before the +next. The rung names are stable references the `PROOF-NEEDS.adoc` rows will cite. + +[cols="1,3,2"] +|=== +| Rung | Adds | Retires / advances + +| **R0** ✅ +| Pure i32 numeric stack core (`I32Const`, add/sub/mul/and/or/xor, `eqz`, drop); + `wexec` + arity soundness. (`RealWasm.v`) +| The toy stack machine's *target* — real names now. + +| **R1** +| Source side S0+S1: `ExprLit`/`ExprVar`/`ExprLet`/`ExprBinary` (int/bool) → + `RealAst.v` + `RealAstSem.v`; `compile` for them (`RealCompile.v`); locals + (`LocalGet/Set/Tee`) + i64 in `RealWasm.v`. First real `⟦compile p⟧=⟦p⟧`. +| Subsumes **K1** and **K1Let** on real objects. + +| **R2** +| Structured control: `Block/Loop/If/Br/BrIf/Return`; `ExprIf`/`ExprMatch`/ + `while`/`for`/`break`/`continue`. Needs a **fuel-indexed** `wexec` (loops may + diverge) and the matching source semantics. +| Closes the toy K-1's "`if`/control is the next increment" gap. Lets **F-2** + (#601) be settled *concretely*: the statement-tail vs expression-tail lowerings + become two real `instr` sequences whose `wexec` agreement (unit) / divergence + (value-returning) is a theorem, not a prose observation. + +| **R-mem** +| Linear memory: load/store family, `MemorySize/Grow`, a byte-addressed memory + model; `ExprTuple`/`ExprArray`/`ExprRecord` + indexing/field access. +| Heap layout — the precondition for strings and the "cell" elaboration nodes. + +| **R-float** +| `F64` (then `F32`) lane; the float-wall elaboration nodes + (`ExprFloatBinary`/`FloatArray`/`FloatIndex`, `ExprCellTuple`/`CellRecord`/`CellField`, + uniform-8 layout). Float model: an abstract IEEE carrier with deterministic ops + first (defer Flocq/NaN-bit fidelity to R-float-exact). +| The "float wall" / "float heap wall" become preservation lemmas. + +| **R-str** +| The string wall: `ExprStringConcat`/`StringEq`/`StringRel` over the + `[len][utf8]` byte layout (lengths-prefixed compare/concat). +| String-wall slices 8b/9/10 (#458) as preservation lemmas. + +| **R-call** +| `Call`/`CallIndirect` + `tables`/`elems`; multi-function `program`; `ExprApp` + to top-level `fn_decl`; closures (the indirect-dispatch path). +| Whole-program K-1. + +| **R-wrap** +| Replace `Z` i32 with mod-2^32 semantics (normalise at each op); i64 mod-2^64. +| Bit-exact integer fidelity. + +| **R-eff** +| Effects/handlers (`ExprHandle`/`Resume`/`Try`, effect rows) via CPS lowering. + **Gated on #555 / K-2**: prove against the interpreter's *intended* handler + semantics, then show the lowered targets refine it (the part that currently + fails on three backends). +| **P-6** effect-row soundness; the #555 proof-blocking defect. +|=== + +The source-side rungs S0…Sn track the target rungs one-for-one (each `RealAst`/ +`RealAstSem` increment matches the `RealWasm`/`RealCompile` increment it is +proven against). + +== 5. Hard parts and the strategy for each + +* **Loops & termination.** Wasm `Loop` can diverge, so `wexec` cannot stay a + structural fixpoint past R1. Strategy: **fuel-indexed** big-step + (`wexec : nat → …`) with the standard "more fuel never changes a `Some` + result" monotonicity lemma; preservation is stated up to fuel. (Same device + the source `while`/`for` semantics will use.) +* **Linear memory.** Model memory as `Z → byte` (or a finite map) + a bound; + load/store mirror `lib/wasm.ml`'s align/offset. The allocator + (`runtime/src/alloc.rs`) is modelled abstractly (a bump pointer in a global) — + enough for the deterministic layout the elaboration nodes assume. +* **Floats.** Start with an *abstract* IEEE carrier: opaque `f64` with the + arithmetic/compare ops as deterministic uninterpreted functions, so R-float + proves *layout/dispatch* preservation without a NaN-bit model. A later + R-float-exact swaps in Flocq if bit-fidelity is ever load-bearing. +* **The elaboration nodes.** `ExprStringConcat`/`Float*`/`Cell*` are introduced + by `Typecheck.elaborate_string_concat` *after* typing. The proof must include + that elaboration as part of `compile` (source semantics is on the + pre-elaboration AST; the elaboration must be shown meaning-preserving — itself + a clean lemma per node). +* **Borrowing / QTT at the seam.** `origin_var` + the `typedwasm.ownership` + custom section connect to typed-wasm Level-7/10. P-3 (`P3_BorrowGraph.v`) and + P-4 (`QttTyping.v`/`QttDynamic.v`) are the *static+dynamic* discipline; the + lift shows `compile` preserves the multiplicity/ownership facts into the custom + section (a refinement, not a re-proof). +* **Effects.** The genuinely hardest, and *blocked*: do it last (R-eff), only + after #555 is resolved per K-2. + +== 6. Reuse: typed-WASM, don't reinvent + +`hyperpolymath/typed-wasm` and `ephapax` both have Coq (`Semantics.v`). The +target operational semantics (`RealWasmSem.v`) and `value_type`/`instr` encoding +should be *imported or aligned* with typed-wasm's, not independently reinvented — +the compile target is the one genuinely shared surface across the estate +(per `.claude/CLAUDE.md`'s disambiguation table). The `typedwasm.ownership` +custom section is the contracted hand-off point; the `crates/typed-wasm-verify/` +verifier crate is the executable companion to the Coq semantics. + +== 7. Sequencing & effort + +[cols="1,3,1"] +|=== +| Phase | Rungs | Size + +| First-order K-1 | R0 ✅ → R1 → R2 | M (multi-week) +| Heap & data | R-mem → R-float → R-str | M–L +| Whole program | R-call → R-wrap | M +| The long tail | R-eff (after #555), generics/traits, `unsafe` | L–XL +|=== + +"First-order K-1" (R0→R2) is the natural first *publishable* milestone: it +discharges K-1 for the nat/bool/let/if/control fragment on the **real** AST and +**real** wasm IR, retires `K1`/`K1Let`, and settles #601 concretely. The estate +already has the front-end scaffolding (`F1_TransformerPreservation.v`, +`SameCube.agda`) waiting to re-chain onto it. + +== 8. Status + +* **R0 landed** — `RealWasm.v`: real `instr`/`value_type` names, `wexec`, + `wcheck`, `wexec_sound` (arity-checked code never traps), axiom-free. +* Next: **R1** — stand up `RealAst.v` + `RealAstSem.v` + `RealCompile.v` for the + int/bool/let/binary fragment and prove the first real `⟦compile p⟧ = ⟦p⟧`, + adding locals + i64 to `RealWasm.v`. + +Tracked against `docs/PROOF-NEEDS.adoc` §6 Wave 3 and #513. diff --git a/formal/RealWasm.v b/formal/RealWasm.v new file mode 100644 index 00000000..92c6c5e4 --- /dev/null +++ b/formal/RealWasm.v @@ -0,0 +1,131 @@ +(* SPDX-License-Identifier: MPL-2.0 *) +(* SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell (hyperpolymath) *) + +(* + RealWasm.v + ══════════ + The REAL-LIFT foundation stone (see formal/REAL-LIFT.adoc). + + The toy K-1 (K1_CodegenPreservation.v) compiles to an ad-hoc stack machine + invented for the proof. The real lift re-targets the *actual* compiler IR — + lib/wasm.ml's `instr` / `value_type` / `wasm_module`. This file begins that by + encoding the **R0 fragment**: the pure i32 numeric stack core, with the real + constructor names from lib/wasm.ml, a stack-machine evaluator `wexec`, a + stack-arity checker `wcheck`, and a soundness theorem — *arity-checked code + never gets stuck* (`wexec_sound`). This is the target-side invariant the real + K-1 preservation proof will rest on (a well-typed wasm body cannot trap on the + covered fragment). + + Scope of R0 (deliberately the smallest real slice; the ladder in + REAL-LIFT.adoc adds the rest): + * i32 only; i32 is modelled as `Z` — **wrap-around (mod 2^32) is deferred** + (milestone R-wrap). The constructor names and stack discipline are exact. + * No locals / memory / control / i64 / f64 yet (R1/R2/R-float milestones). + + `value_type` is given in full (faithful to lib/wasm.ml) even though R0's + evaluator only inhabits I32 — so later milestones extend, not rewrite, it. + + Axiom-free, no Admitted. `.v` is Coq, not V-lang — see formal/README.adoc and + .hypatia-ignore. +*) + +Require Import List. +Require Import ZArith. +Require Import PeanoNat. +Require Import Lia. +Import ListNotations. + +(* lib/wasm.ml `value_type` — full, faithful. *) +Inductive value_type := I32 | I64 | F32 | F64. + +(* R0 slice of lib/wasm.ml `instr`: the pure i32 numeric stack core. + Real names; i32 ≔ Z (wrap deferred). *) +Inductive instr := +| I32Const (z : Z) +| I32Add | I32Sub | I32Mul +| I32And | I32Or | I32Xor +| I32Eqz +| Drop. + +Definition stack := list Z. + +(* Big-step stack-machine evaluator. None = stuck (stack underflow). *) +Fixpoint wexec (is : list instr) (st : stack) : option stack := + match is with + | [] => Some st + | I32Const z :: r => wexec r (z :: st) + | I32Add :: r => match st with a :: b :: t => wexec r (Z.add b a :: t) | _ => None end + | I32Sub :: r => match st with a :: b :: t => wexec r (Z.sub b a :: t) | _ => None end + | I32Mul :: r => match st with a :: b :: t => wexec r (Z.mul b a :: t) | _ => None end + | I32And :: r => match st with a :: b :: t => wexec r (Z.land b a :: t) | _ => None end + | I32Or :: r => match st with a :: b :: t => wexec r (Z.lor b a :: t) | _ => None end + | I32Xor :: r => match st with a :: b :: t => wexec r (Z.lxor b a :: t) | _ => None end + | I32Eqz :: r => match st with a :: t => wexec r ((if Z.eqb a 0 then 1%Z else 0%Z) :: t) | _ => None end + | Drop :: r => match st with _ :: t => wexec r t | _ => None end + end. + +(* Stack-arity discipline (a degenerate wasm validation: track the i32 count). *) +Definition consumes (i : instr) : nat := + match i with + | I32Const _ => 0 + | I32Add | I32Sub | I32Mul | I32And | I32Or | I32Xor => 2 + | I32Eqz => 1 + | Drop => 1 + end. + +Definition produces (i : instr) : nat := + match i with + | Drop => 0 + | _ => 1 + end. + +Fixpoint wcheck (is : list instr) (n : nat) : option nat := + match is with + | [] => Some n + | i :: r => + if Nat.leb (consumes i) n + then wcheck r (n - consumes i + produces i) + else None + end. + +(* ── soundness: arity-checked code never gets stuck ─────────────────────── + If the body checks from height n to height m, then on any stack of height n + it runs to Some stack of height m. The target invariant the real K-1 needs. *) +Theorem wexec_sound : forall is n m st, + wcheck is n = Some m -> length st = n -> + exists st', wexec is st = Some st' /\ length st' = m. +Proof. + induction is as [| i r IH]; intros n m st Hchk Hlen; cbn in Hchk. + - injection Hchk as <-. exists st. split; [reflexivity | exact Hlen]. + - destruct (Nat.leb (consumes i) n) eqn:Hle; [| discriminate]. + apply Nat.leb_le in Hle. + destruct i; cbn in Hchk, Hle |- *. + + (* I32Const *) eapply IH; [exact Hchk |]. cbn. lia. + + (* I32Add *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32Sub *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32Mul *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32And *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32Or *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32Xor *) destruct st as [| a [| b t]]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* I32Eqz *) destruct st as [| a t]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. + + (* Drop *) destruct st as [| a t]; cbn in Hlen; try (exfalso; lia). + eapply IH; [exact Hchk |]. cbn in Hlen |- *; lia. +Qed. + +(* A concrete sanity check: (2 + 3) * 4 evaluates to [20], and it arity-checks. *) +Example wexec_demo : + wexec [I32Const 2; I32Const 3; I32Add; I32Const 4; I32Mul] [] = Some [20%Z]. +Proof. reflexivity. Qed. + +Example wcheck_demo : + wcheck [I32Const 2; I32Const 3; I32Add; I32Const 4; I32Mul] 0 = Some 1. +Proof. reflexivity. Qed. + +Print Assumptions wexec_sound. diff --git a/formal/_CoqProject b/formal/_CoqProject index d5404dc0..a9457476 100644 --- a/formal/_CoqProject +++ b/formal/_CoqProject @@ -13,3 +13,5 @@ QttSemiring.v AffineUsage.v QttTyping.v QttDynamic.v +F5_RenderFaithful.v +RealWasm.v diff --git a/formal/justfile b/formal/justfile index 0d991c95..11c09e0e 100644 --- a/formal/justfile +++ b/formal/justfile @@ -12,7 +12,8 @@ check: for f in K1_CodegenPreservation K1Let_CodegenPreservation Siblings_Stated \ F1_TransformerPreservation F3_PragmaDecidable F4_ErrorFaithful \ P3_BorrowSound P3_BorrowGraph P2_Progress P2_Stlc \ - QttSemiring AffineUsage QttTyping QttDynamic; do + QttSemiring AffineUsage QttTyping QttDynamic \ + F5_RenderFaithful RealWasm; do echo "== coqc $f.v ==" o="$(coqc -Q . ASFormal "$f.v")" printf '%s\n' "$o"