|
| 1 | +// SPDX-License-Identifier: MPL-2.0 |
| 2 | +// SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> |
| 3 | += Interpreter CESK Port — closing the non-tail-resume residual (#623) |
| 4 | +Jonathan D.A. Jewell |
| 5 | +:toc: left |
| 6 | +:toclevels: 3 |
| 7 | +:sectnums: |
| 8 | +:sectnumlevels: 2 |
| 9 | + |
| 10 | +This is the engineering plan to port the production tree-walking interpreter |
| 11 | +`lib/interp.ml` onto a *CESK abstract machine with reified, defunctionalised |
| 12 | +continuations*, so that `resume` re-enters the handled computation instead of |
| 13 | +merely returning a value. It closes the one remaining *silent* soundness hole in |
| 14 | +the language — the interpreter non-tail single-shot resume residual, #623 / |
| 15 | +link:SOUNDNESS.adoc[SOUNDNESS.adoc] row "#555 (interp, non-tail single-shot)" — |
| 16 | +and delivers milestone *M2* of link:decisions/0025-semantics-vm.adoc[ADR-0025] |
| 17 | +for the *real* interpreter surface (not only the Solo core). |
| 18 | + |
| 19 | +It is grounded in the actual compiler: |
| 20 | + |
| 21 | +* the target being ported is `lib/interp.ml` — a `Result`-monad tree-walker |
| 22 | + (`eval : env -> expr -> value result`); |
| 23 | +* the value/error domain is `lib/value.ml` (`value`, `eval_error`, |
| 24 | + `'a result = ('a, eval_error) Result.t`); |
| 25 | +* the *proven template* is `lib/solo_cesk.ml` — the Solo-core CESK machine that |
| 26 | + already implements deep handlers + multi-shot + non-tail resume, axiom-compatible |
| 27 | + with `docs/academic/formal-verification/solo-core/Soundness.idr`; |
| 28 | +* the governing decision is ADR-0025 (owner sign-off 2026-06-16), whose *M2* is |
| 29 | + exactly "deep effect handlers + multi-shot `resume`… the loud-fail messages |
| 30 | + already route here". |
| 31 | + |
| 32 | +[IMPORTANT] |
| 33 | +==== |
| 34 | +This port is *interpreter-tier* work. It does **not** touch the wasm / wasm-GC / |
| 35 | +native backends — those cannot express reified continuations without the wasm |
| 36 | +exception-handling / typed-continuations proposals and correctly *loud-fail* |
| 37 | +today (ADR-0025; `docs/CAPABILITY-MATRIX.adoc`). The interpreter is the language's |
| 38 | +*reference semantics*; closing the hole here closes it where correctness is |
| 39 | +defined. The backends remain experimental and loud per ADR-0025. |
| 40 | +==== |
| 41 | + |
| 42 | +== 1. The defect being closed |
| 43 | + |
| 44 | +=== 1.1 The shape |
| 45 | + |
| 46 | +`docs/SOUNDNESS.adoc` names it precisely: |
| 47 | + |
| 48 | +[source,affinescript] |
| 49 | +---- |
| 50 | +// handle a `get()` effect; the handler resumes with 5 |
| 51 | +let x = get(); // perform `get` |
| 52 | +x + 100 // continuation: "add 100 to whatever get() resumed with" |
| 53 | +---- |
| 54 | + |
| 55 | +Expected: `resume(5)` re-enters the body at the bind, so `x = 5` and the `handle` |
| 56 | +yields `105`. Actual (tree-walker): the `handle` yields `5`. The continuation |
| 57 | +`x + 100` is silently dropped. This is the *one genuinely-still-silent* shape in |
| 58 | +the language (interpreter path only); everything else on the soundness ledger is |
| 59 | +`fixed` / `loud-fail` / `removed`. |
| 60 | + |
| 61 | +=== 1.2 Why the tree-walker structurally cannot fix it in place |
| 62 | + |
| 63 | +The interpreter models a performed effect as a `Result` *error* that unwinds the |
| 64 | +OCaml call stack to the nearest handler (`lib/interp.ml`): |
| 65 | + |
| 66 | +[source,ocaml] |
| 67 | +---- |
| 68 | +(* an effect operation, when called, raises: *) |
| 69 | +Error (PerformEffect (op_name, args)) (* interp.ml ~1218 *) |
| 70 | +
|
| 71 | +(* ExprHandle catches it after eval of the body has already returned: *) |
| 72 | +| ExprHandle eh -> |
| 73 | + begin match eval env eh.eh_body with (* interp.ml ~287 *) |
| 74 | + | Error (PerformEffect (op_name, args)) -> (* ...select arm... *) |
| 75 | +---- |
| 76 | + |
| 77 | +By the time `ExprHandle` sees `PerformEffect`, `eval eh.eh_body` has *already |
| 78 | +returned* — the OCaml frames that represented "the rest of the body after |
| 79 | +`get()`" (`x + 100`) have been unwound and discarded. The continuation no longer |
| 80 | +exists. The `resume` builtin can therefore only do one thing: return its argument |
| 81 | +as the value of the whole `handle` (`interp.ml ~327`): |
| 82 | + |
| 83 | +[source,ocaml] |
| 84 | +---- |
| 85 | +let resume_fn = VBuiltin ("__resume__", fun resume_args -> |
| 86 | + ... match resume_args with [v] -> Ok v | ...) (* returns v; cannot re-enter *) |
| 87 | +---- |
| 88 | + |
| 89 | +That is correct for *tail* resume (`op() => resume(...)` as the body's tail) and |
| 90 | +silently wrong for *non-tail* resume (a bind chain after the `perform`). The |
| 91 | +multi-shot case is already *fenced loud* (`interp.ml ~329`, a call-counter that |
| 92 | +errors on the second `resume`), because returning the argument twice is even more |
| 93 | +obviously wrong. The non-tail single-shot case cannot be fenced the same way: |
| 94 | +nothing local tells `resume` that a discarded continuation existed. |
| 95 | + |
| 96 | +The fix is not a patch to `ExprHandle`. It is a change of *evaluation strategy*: |
| 97 | +the continuation must be **reified as data** so it survives the `perform` and can |
| 98 | +be re-entered. That is a CESK machine. |
| 99 | + |
| 100 | +=== 1.3 Why not OCaml 5 native effect handlers |
| 101 | + |
| 102 | +OCaml 5's `effect`/`continuation` would express this directly. It is unavailable |
| 103 | +to us: the interpreter must compile under `js_of_ocaml` (the in-browser / editor |
| 104 | +playground path), and must build on OCaml 4.14. `js_of_ocaml`'s support for |
| 105 | +native effects is partial and the estate pins 4.14 for the toolchain. A |
| 106 | +*defunctionalised* continuation — a list of frames that is ordinary data — needs |
| 107 | +no runtime support at all and runs identically native and under jsoo. This is |
| 108 | +exactly why `lib/solo_cesk.ml` was built defunctionalised rather than in CPS over |
| 109 | +host closures (ADR-0025 Q1). |
| 110 | + |
| 111 | +== 2. The proven template — `lib/solo_cesk.ml` |
| 112 | + |
| 113 | +The hard part is already built and (for the Solo fragment) proof-coupled. The |
| 114 | +Solo-core CESK machine reifies the continuation as a `frame list` and re-installs |
| 115 | +it on `resume`: |
| 116 | + |
| 117 | +[source,ocaml] |
| 118 | +---- |
| 119 | +(* solo_cesk.ml: perform finds the nearest handler, capturing the delimited |
| 120 | + continuation up to AND INCLUDING that handler — so resume re-installs it. *) |
| 121 | +| Ret v, KPerform op :: k -> |
| 122 | + let (captured, henv, arg_q, res_q, body, krest) = find_handler op k [] in |
| 123 | + { ctrl = Eval body; |
| 124 | + env = bind v arg_q (bind (VCont captured) res_q henv); (* k is DATA *) |
| 125 | + k = krest; cost } |
| 126 | +
|
| 127 | +(* resume injects a value into the captured continuation and runs it; because |
| 128 | + `kc` is data, the SAME VCont may be resumed again — multi-shot. *) |
| 129 | +| Ret v, KResumeArg kc :: k -> { ctrl = Ret v; env = st.env; k = kc @ k; cost } |
| 130 | +---- |
| 131 | + |
| 132 | +`VCont of kont` makes a captured continuation a first-class value; |
| 133 | +`find_handler` walks the frame stack to the matching `KHandle`, returning the |
| 134 | +delimited slice; `kc @ k` re-enters it. This is precisely the mechanism the |
| 135 | +tree-walker lacks. The port's job is to bring `interp.ml`'s *full surface* up to |
| 136 | +this machine, not to invent the machine. |
| 137 | + |
| 138 | +== 3. The gap from `solo_cesk.ml` to `interp.ml` |
| 139 | + |
| 140 | +`solo_cesk.ml` deliberately covers only the reduced Solo core (ADR-0025 M1 |
| 141 | +scope). The production interpreter is much larger. The port must close these |
| 142 | +gaps: |
| 143 | + |
| 144 | +[cols="1,3,3"] |
| 145 | +|=== |
| 146 | +| Concern | `solo_cesk.ml` (M1) | `interp.ml` (port target) |
| 147 | + |
| 148 | +| Term surface |
| 149 | +| ~12 `term` constructors (STLC + unit + ⊗ + ⊕ + let + 3 effect nodes), de Bruijn |
| 150 | +| ~30 `expr` constructors + `stmt` + `block`, *named* environment, literals of |
| 151 | + 6 base types, records/rows, arrays, tuples, variants, match, try/catch, builtins |
| 152 | + |
| 153 | +| Value domain |
| 154 | +| 6-constructor `value` (`VUnit`/`VClos`/`VPair`/`VInl`/`VInr`/`VCont`) |
| 155 | +| `value.ml`: `VUnit`/`VBool`/`VInt`/`VFloat`/`VChar`/`VString`/`VTuple`/`VArray`/ |
| 156 | + `VRecord`/`VVariant`/`VClosure`/`VBuiltin`/`VRef`/`VMut`/`VOwn` |
| 157 | + |
| 158 | +| Store (S) |
| 159 | +| *elided* — "Solo is pure, so the (S)tore is trivial and elided until |
| 160 | + effects/mutation arrive at M2" |
| 161 | +| **required** — `VRef`/`VMut`/`VOwn`, `StmtAssign`, `while`/`for` mutation need a |
| 162 | + real heap. This is the (S) the port must add. |
| 163 | + |
| 164 | +| Errors |
| 165 | +| OCaml exceptions (`Stuck`/`Affine_violation`/`Unhandled_effect`) |
| 166 | +| `Result` + `eval_error` (`TypeMismatch`/`RuntimeError`/`PerformEffect`/…) — the |
| 167 | + machine's `Stuck` maps onto `Error (RuntimeError …)` |
| 168 | + |
| 169 | +| Effect operation surface |
| 170 | +| `Perform (op:int, arg)` — integer-labelled, single arg |
| 171 | +| `PerformEffect (op:string, args:value list)` — string-labelled, multi-arg, |
| 172 | + registered as builtins; `ExprHandle` arms (`HandlerOp`/`HandlerReturn`), |
| 173 | + `ExprResume` keyword, `ExprTry`/catch/finally |
| 174 | +|=== |
| 175 | + |
| 176 | +Net: the port is `solo_cesk.ml`'s machine *grown to the real value/term surface |
| 177 | +and given the store S*. It is M1+M2 of ADR-0025 fused onto the production |
| 178 | +interpreter, because the production interpreter cannot ship the M2 effect |
| 179 | +machine without simultaneously carrying its existing (stateful) M1 surface. |
| 180 | + |
| 181 | +== 4. The port, staged so `dune runtest` stays green at every step |
| 182 | + |
| 183 | +Each stage is independently shippable and CI-gated. The governing safety net is a |
| 184 | +**differential oracle**: the new machine and the existing tree-walker are run |
| 185 | +against the whole positive fixture corpus and must agree — until Stage M2-3, where |
| 186 | +they are *designed* to diverge on the non-tail fixture (the machine is right, the |
| 187 | +walker is wrong) and the oracle flips to asserting the machine. |
| 188 | + |
| 189 | +The new machine lands as a *new module* (`lib/cesk.ml` — the full-surface |
| 190 | +production CESK, distinct from `solo_cesk.ml`, the Solo-core proof oracle). The |
| 191 | +tree-walker stays the default until M2-4, exactly as ADR-0025 directs |
| 192 | +("`interp.ml` keeps serving real programs until the machine grows to supersede |
| 193 | +it"). |
| 194 | + |
| 195 | +[cols="1,4,2"] |
| 196 | +|=== |
| 197 | +| Stage | Deliverable | Gate |
| 198 | + |
| 199 | +| *M2-0* |
| 200 | +| Scaffold `lib/cesk.ml`: `control`/`env`/`kont`/`store`/`state` types, the |
| 201 | + `value` re-used from `value.ml`, an empty `step` covering literals + `var` + |
| 202 | + `let`. A `run` driver and an `eval_via_cesk : env -> expr -> value result` |
| 203 | + wrapper that maps `Stuck → Error (RuntimeError …)`. Behind `--cesk` (off by |
| 204 | + default). Differential oracle harness comparing `eval` vs `eval_via_cesk`. |
| 205 | +| `dune runtest` green; oracle green on the pure-literal fixtures. |
| 206 | + |
| 207 | +| *M2-1* |
| 208 | +| Port the **pure fragment** by defunctionalising every `let* x = eval … in …` |
| 209 | + in `interp.ml` into a frame: `ExprIf`/`ExprBinary`/`ExprUnary`/`ExprApp`/ |
| 210 | + `ExprLambda`/`ExprBlock`/`ExprTuple`/`ExprRecord`/`ExprArray` + field/index/ |
| 211 | + match. Each interpreter recursion point becomes one `frame` constructor; each |
| 212 | + `Ok v` continuation becomes a `Ret v` refocus rule. No effects, no store yet. |
| 213 | +| Oracle green across all *pure* fixtures (no `mut`, no `handle`). |
| 214 | + |
| 215 | +| *M2-2* |
| 216 | +| Add the **store (S)**: `VRef`/`VMut`/`VOwn`, a `store` (int-keyed heap), |
| 217 | + `StmtAssign`, `while`/`for`, `OpRef`/`OpMutRef`/`OpDeref`. The machine state |
| 218 | + becomes the full `(C,E,K,S)`. Affine/linear runtime enforcement stays *off* in |
| 219 | + the interpreter port (it is ADR-0025 M4, opt-in) — parity with today's |
| 220 | + interpreter is the contract here. |
| 221 | +| Oracle green across all *stateful* fixtures (mutation, loops, refs). |
| 222 | + |
| 223 | +| *M2-3* — **closes #623** |
| 224 | +| Port effects with **reified continuations**, transcribing `solo_cesk.ml`'s |
| 225 | + `KHandle`/`KPerform`/`KResume*` frames and `find_handler` onto the |
| 226 | + string-labelled, multi-arg, `Result`-shaped surface. `ExprHandle` installs a |
| 227 | + `KHandle` boundary; the effect builtins push `KPerform`; `ExprResume` re-enters |
| 228 | + the captured `VCont`. Non-tail resume now re-enters the bind chain → `105`. |
| 229 | + Multi-shot now genuinely supported (the `VCont` is data) — retire the |
| 230 | + `interp.ml` multi-shot loud-fence on the CESK path. `ExprTry`/catch is the |
| 231 | + single-shot, non-resumable specialisation (ADR-0025 M3 folds in here). |
| 232 | +| The xfail pin `test_resume_nontail_xfail` flips **XFAIL→XPASS** (see §5). |
| 233 | + New runtime handler fixtures (ADR-0025 notes "*zero exist today*") added and |
| 234 | + green. Oracle *intentionally diverges* on `handle_resume_nontail.affine` — the |
| 235 | + oracle is updated to assert the CESK value (`105`) there. |
| 236 | + |
| 237 | +| *M2-4* |
| 238 | +| Make `lib/cesk.ml` the **default** evaluator; keep the tree-walker one release |
| 239 | + behind `--legacy-interp` as an escape; then decommission `interp.ml`'s `eval` |
| 240 | + tree (or reduce it to a thin re-export). Update `bin/` entry points and the |
| 241 | + jsoo build. |
| 242 | +| Full `dune runtest` green on the CESK path as default; jsoo build green. |
| 243 | +|=== |
| 244 | + |
| 245 | +== 5. What flips, concretely (the ledger consequences) |
| 246 | + |
| 247 | +When Stage M2-3 lands, these change in lock-step — they are the *acceptance |
| 248 | +criteria*, enforced by the soundness-ledger gate (`tools/check-soundness-ledger.sh`): |
| 249 | + |
| 250 | +. `test/e2e/fixtures/handle_resume_nontail.affine` evaluates to `105`, not `5`. |
| 251 | +. `test/xfail/test_xfail_pins.ml`'s `test_resume_nontail_xfail` (which asserts the |
| 252 | + *correct* `105` and today fails-as-expected) reports **XPASS** — the harness's |
| 253 | + "unexpected pass" signal. Per the *Pinned-residual discipline* in |
| 254 | + `docs/SOUNDNESS.adoc`, an XPASS is the trigger to *promote* the pin to an |
| 255 | + ordinary positive regression test and retire the residual. |
| 256 | +. `docs/SOUNDNESS.adoc`: the row "#555 (interp, non-tail single-shot) / #623" |
| 257 | + moves `residual (pinned)` → `fixed`; the *Still open* section's "the one |
| 258 | + remaining residual" sentence is removed (there is then *no* silent hole on the |
| 259 | + interpreter path). The `:ground-truth-sha:` stamp is refreshed. |
| 260 | +. `docs/SOUNDNESS.adoc`: the row "#555 (interp, multi-shot)" moves `loud-fail` → |
| 261 | + `fixed`/supported on the CESK path (multi-shot is now expressible). |
| 262 | +. ADR-0025's *M2* is delivered for the production interpreter; `docs/ROADMAP.adoc` |
| 263 | + and `docs/EFFECTS-IMPLEMENTATION.adoc` "Not Yet Complete" items 1–3 close for |
| 264 | + the interpreter tier. |
| 265 | + |
| 266 | +Because the gate is content-bound (sha256 manifest) and stamp-enforced, the |
| 267 | +ledger edit and the test flip must be in the *same* PR — the gate fails if the |
| 268 | +row says `fixed` while the pin still reports XFAIL, or vice-versa. That coupling |
| 269 | +is deliberate: it is the anti-staleness property this whole subsystem exists for. |
| 270 | + |
| 271 | +== 6. Design notes for the capture, on the real surface |
| 272 | + |
| 273 | +* *Named environments, not de Bruijn.* `interp.ml` binds by name (`extend_env`, |
| 274 | + `lookup_env`); the CESK port keeps that — frames close over the *named* `env`, |
| 275 | + exactly as `solo_cesk.ml`'s frames close over a `qenv`. No de Bruijn conversion. |
| 276 | +* *String-labelled, multi-arg effects.* `find_handler` matches `op_name : string` |
| 277 | + against `HandlerOp` arm names (today's `interp.ml ~302` logic), and binds the |
| 278 | + arm's parameter patterns to `args : value list` plus the reified continuation — |
| 279 | + reusing the existing `match_pattern` / `__resume__` plumbing, but with |
| 280 | + `__resume__` now bound to a `VCont` (data) instead of a return-the-arg builtin. |
| 281 | +* *The store survives capture.* A captured `VCont` closes over the continuation |
| 282 | + frames but **not** a snapshot of the store — the store is threaded through the |
| 283 | + machine state, so a resumed continuation observes store writes made between |
| 284 | + capture and resume (the standard, and correct, semantics for mutable state under |
| 285 | + delimited control). Multi-shot resumes therefore share the heap; this matches |
| 286 | + OCaml-5-effects semantics and is the behaviour the runtime handler tests must |
| 287 | + pin. |
| 288 | +* *`Stuck` ⇒ `Error (RuntimeError …)`.* The machine's "no rule applies" cases |
| 289 | + (the analogue of `solo_cesk.ml`'s `Stuck`) are surfaced as the interpreter's |
| 290 | + existing `eval_error`, so callers see no new error vocabulary (F-4 faithfulness |
| 291 | + is preserved). |
| 292 | + |
| 293 | +== 7. Non-goals and risks |
| 294 | + |
| 295 | +* *Non-goal: backends.* wasm/native effect lowering stays loud-fail (ADR-0025). |
| 296 | + This plan does not change `docs/CAPABILITY-MATRIX.adoc`'s backend rows. |
| 297 | +* *Non-goal: runtime affine enforcement.* ADR-0025 M4 (affine/linear + tropical |
| 298 | + cost *on by default*) is a *separate* milestone; the interpreter port keeps |
| 299 | + today's "enforced statically, not dynamically" contract so it is a pure |
| 300 | + semantics-preserving refactor plus the resume fix. |
| 301 | +* *Risk: performance.* A CESK `step` loop is slower than direct tree-walk for |
| 302 | + pure code. Mitigation: the pure fragment can keep a fast in-frame refocus; and |
| 303 | + per ADR-0025 the VM tier is the *semantics* tier, not the performance tier |
| 304 | + (native is). Acceptable. |
| 305 | +* *Risk: corpus coverage.* The oracle is only as good as the fixtures. ADR-0025 |
| 306 | + flags that *zero* runtime handler tests exist today; M2-3 must add them |
| 307 | + (single-shot tail, single-shot non-tail, multi-shot, nested/deep handlers, |
| 308 | + handler-over-mutation) *before* claiming the hole closed. |
| 309 | +* *Risk: scope of the refactor.* Defunctionalising ~30 `expr` constructors is |
| 310 | + large. It is staged (M2-0…M2-4) precisely so no single PR is unreviewable and |
| 311 | + `dune runtest` is never red. |
| 312 | + |
| 313 | +== 8. Sequencing & effort |
| 314 | + |
| 315 | +[cols="1,3,1"] |
| 316 | +|=== |
| 317 | +| Phase | Stages | Size |
| 318 | + |
| 319 | +| Machine + pure surface | M2-0 → M2-1 | M (multi-week) |
| 320 | +| State | M2-2 | S–M |
| 321 | +| Effects (closes #623) | M2-3 | M — the payload milestone |
| 322 | +| Cutover | M2-4 | S |
| 323 | +|=== |
| 324 | + |
| 325 | +M2-3 is the milestone that retires the last silent soundness hole; M2-0…M2-2 are |
| 326 | +the semantics-preserving runway that makes M2-3 a localised, oracle-checked |
| 327 | +change rather than a big-bang rewrite. The Solo-core machine (`solo_cesk.ml`) is |
| 328 | +the executable reference for the effect rules; the differential oracle is the |
| 329 | +safety net for the surface growth. |
| 330 | + |
| 331 | +Tracked against ADR-0025 (M2), `docs/SOUNDNESS.adoc` (the #623 / non-tail row), |
| 332 | +and `docs/PROOF-NEEDS.adoc` (P-6 effect-row soundness, whose *blocked* status is |
| 333 | +on #555 — this port is the interpreter-tier half of unblocking it). |
0 commit comments