|
| 1 | +// SPDX-License-Identifier: MPL-2.0 |
| 2 | +// SPDX-FileCopyrightText: 2025-2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> |
| 3 | += Server-authoritative determinism — the same wasm on both sides |
| 4 | +:toc: macro |
| 5 | + |
| 6 | +[IMPORTANT] |
| 7 | +==== |
| 8 | +*Status: ARCHITECTURE NOTE / PROPOSAL.* Analysis of why the |
| 9 | +ReScript→AffineScript→wasm migration *unlocks* a class of multiplayer |
| 10 | +netcode that ReScript→JS made impractical, why it is not Elixir-specific, |
| 11 | +and the open threads it raises (Burble string-offload, safe-NIF embedding, |
| 12 | +three-runtime debugging). Staged in affinescript (MPL); subject is idaptik |
| 13 | +(AGPL). A companion SVG (`server-authoritative-determinism.svg`) renders |
| 14 | +the two diagrams below. |
| 15 | +==== |
| 16 | + |
| 17 | +toc::[] |
| 18 | + |
| 19 | +== The thesis in one line |
| 20 | + |
| 21 | +The *server's* simulation is the single source of truth (authoritative); |
| 22 | +every machine runs the *same deterministic* simulation; so a client can |
| 23 | +*predict ahead* of the server and, when it guesses wrong, *rewind and |
| 24 | +replay* to land exactly on the server's result — no drift, no cheating, no |
| 25 | +laggy feel. |
| 26 | + |
| 27 | +Three legs: **authority** (server decides), **prediction** (client |
| 28 | +simulates locally so input feels instant), **determinism** (same inputs + |
| 29 | +same start state → bit-identical result everywhere). Determinism is the |
| 30 | +load-bearing leg — and AffineScript→wasm makes it nearly free. |
| 31 | + |
| 32 | +== Diagram 1 — same binary, both sides |
| 33 | + |
| 34 | +---- |
| 35 | + ┌──────────── CLIENT (browser) ───────────┐ ┌──────────── SERVER (Elixir / OTP) ──────────┐ |
| 36 | + │ my input ─▶ [ vm.wasm ] ─▶ predict ─▶ Pixi│ │ collect ALL inputs ─▶ [ vm.wasm ] ─▶ step │ |
| 37 | + │ ▲ │ │ (authoritative) via wasmex │ |
| 38 | + │ rollback + replay │ │ per-entity GenServer + OTP supervision │ |
| 39 | + └───────────────────────────┬───────────────┘ └───────────────┬─────────────────────────────┘ |
| 40 | + │ inputs (tick, player, bits) │ |
| 41 | + └──────────────▶ Phoenix Channel ◀─────┘ |
| 42 | + authoritative state / hash @ tick T |
| 43 | + ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ |
| 44 | + the IDENTICAL AffineScript-compiled vm.wasm runs in BOTH boxes |
| 45 | +---- |
| 46 | + |
| 47 | +Only *inputs* cross client→server (tiny), and *authoritative state (or a |
| 48 | +hash)* crosses server→client. Determinism is what lets you ship inputs |
| 49 | +instead of whole world-states. |
| 50 | + |
| 51 | +== Diagram 2 — predict → confirm → reconcile |
| 52 | + |
| 53 | +---- |
| 54 | +ticks → 8 9 10 11 12 13 14 ← client's predicted "now" |
| 55 | +CLIENT: [C] [C] [C] [P] [P] [P] [P] C = confirmed P = predicted (my inputs only) |
| 56 | + ▲ |
| 57 | +SERVER confirms tick 11 — but it includes PLAYER B's input the client never predicted |
| 58 | + │ |
| 59 | + ▼ predicted@11 ≠ authoritative@11 → MISMATCH |
| 60 | + │ |
| 61 | + RECONCILE: 1. rewind to 10 (last confirmed) |
| 62 | + 2. re-apply now-known inputs (mine + B's) for 11..14 |
| 63 | + 3. replay forward → corrected "now" @14 |
| 64 | + (determinism ⇒ this lands exactly on the server's result) |
| 65 | +---- |
| 66 | + |
| 67 | +If the prediction was right (the common case) the compare matches and the |
| 68 | +player never notices. Mispredictions cost a one-frame correction, not a |
| 69 | +stall. |
| 70 | + |
| 71 | +== Diagram 3 — why ReScript couldn't, and what the real requirement is |
| 72 | + |
| 73 | +---- |
| 74 | + CLASSIC netcode (brittle) AFFINESCRIPT → wasm (by construction) |
| 75 | + ───────────────────────── ───────────────────────────────────── |
| 76 | + client sim server sim client server |
| 77 | + (JS / ReScript) (rewritten in Elixir) [ vm.wasm ] ===== [ vm.wasm ] |
| 78 | + └─ must match BIT-FOR-BIT, by hand same bytes → same result, always |
| 79 | + float rounding / map order / overflow compile ONCE, run both places |
| 80 | + → silent divergence → DESYNC (browser + Elixir via wasmex/wasmtime NIF) |
| 81 | +---- |
| 82 | + |
| 83 | +In the classic world you maintain *two* implementations that must agree to |
| 84 | +the last bit forever; one float-rounding difference between V8 and the BEAM |
| 85 | +and you desync. AffineScript compiles to one `vm.wasm` you run in both |
| 86 | +places, so "the two sims match" is true *by construction*. |
| 87 | + |
| 88 | +== Every gamer knows this (the examples) |
| 89 | + |
| 90 | +[cols="1,2a",options="header"] |
| 91 | +|=== |
| 92 | +| What gamers say | The concept it is |
| 93 | + |
| 94 | +| *"GGPO / rollback netcode is so good"* (Guilty Gear Strive, Skullgirls, Street Fighter 6) |
| 95 | +| This *is* predict + rollback + replay. It feels like magic *because* the |
| 96 | + sim is deterministic — the praised netcode is literally this mechanism. |
| 97 | + |
| 98 | +| *"I SHOT him, he was behind the wall!"* (CS, Valorant, Apex — peeker's advantage) |
| 99 | +| Prediction vs authority under latency: you acted on your predicted frame; |
| 100 | + the server's authoritative timeline disagreed. The eternal complaint is |
| 101 | + the reconciliation gap made visible. |
| 102 | + |
| 103 | +| *Rubber-banding / teleporting players* |
| 104 | +| Reconciliation snapping a client to the authoritative position after a |
| 105 | + misprediction (or packet loss). The "warp" is the correction. |
| 106 | + |
| 107 | +| *"This guy's on wifi"* (Smash Ultimate desync/lag jokes — "why is my Falco teleporting") |
| 108 | +| Latency + weak determinism = the experience you are trying to *kill*. |
| 109 | + Delay-based, non-deterministic netcode is the thing gamers mock. |
| 110 | + |
| 111 | +| *Dark Souls backstab / "phantom hit" desyncs* (P2P, no authoritative host) |
| 112 | +| The horror of *no authority + weak determinism*: two peers each think |
| 113 | + they're right, neither can reconcile. This is the architecture you are |
| 114 | + *avoiding*. |
| 115 | +|=== |
| 116 | + |
| 117 | +The punchline: determinism + same-binary + reversibility is the recipe |
| 118 | +behind the netcode gamers *praise* (GGPO rollback) and whose *absence* is |
| 119 | +the netcode they *mock* (delay-based, P2P backstab desync, wifi Falco). |
| 120 | + |
| 121 | +== Is this Elixir-specific? (no) |
| 122 | + |
| 123 | +It is a *determinism + shared-artifact* property, not an Elixir property. |
| 124 | +The requirement is: **a deterministic simulation that the same artifact can |
| 125 | +run on both the client and the authoritative host.** |
| 126 | + |
| 127 | +[cols="2,1,3a",options="header"] |
| 128 | +|=== |
| 129 | +| Stack | Works? | Why |
| 130 | +| ReScript → JS + Elixir | ✗ | JS float/GC nondeterminism; and you can't run the *same artifact* server-side — you'd reimplement the sim in Elixir (the two-sims problem). |
| 131 | +| *AffineScript → wasm* + Elixir | ✓✓ | Deterministic integer wasm both sides + affine no-aliasing + *reversible* VM (rollback = step backward). |
| 132 | +| Rust → wasm + Elixir | ✓ | Rust is deterministic; same wasm both sides (Rust-native also works server-side via a Rustler NIF). A very common, strong combo. |
| 133 | +| C/C++/Zig → wasm + any host | ✓ | Deterministic if you avoid float / UB / map-order. wasm is the shared artifact. |
| 134 | +| "Elixir sim on both sides" | ✗ (browser) | BEAM doesn't run in the browser; Gleam→JS puts you back on JS-client nondeterminism. |
| 135 | +|=== |
| 136 | + |
| 137 | +**wasm is the lingua franca** that makes "same binary both sides" possible. |
| 138 | +Elixir's role is just a *great authoritative host* (per-entity processes, |
| 139 | +OTP supervision, `wasmex` to embed the module) — swap Elixir for a |
| 140 | +Rust/Go server and the determinism story is unchanged. So: *Rust+Elixir, |
| 141 | +Rust+Rust, AffineScript+Elixir* all work; *ReScript+anything* doesn't, |
| 142 | +because it can't give you the shared deterministic artifact. |
| 143 | + |
| 144 | +== Honest caveats |
| 145 | + |
| 146 | +* The synced sim must be **pure**: no wall-clock, unseeded RNG, or host |
| 147 | + effects in the stepped path (exactly the effect-codegen wall — networking |
| 148 | + and time stay *outside* the VM). AffineScript *enforces* this rather than |
| 149 | + hoping for it. |
| 150 | +* The VM's I/O ports must be **fed identically** server-side (the server's |
| 151 | + wasm consumes recorded inputs, it does not make live host calls). |
| 152 | +* `wasmex`/`wasmtime` is real but you **batch ticks** across the NIF |
| 153 | + boundary to amortise call overhead, and validate under load. |
| 154 | +* Determinism removes *desync*, not *latency* — you still need input-delay / |
| 155 | + a rollback window. |
| 156 | + |
| 157 | +== Open threads (raised, not yet resolved) |
| 158 | + |
| 159 | +=== T1 — Burble's data channel to offload the string gap |
| 160 | + |
| 161 | +CONFIRMED from `github.com/hyperpolymath/burble`: a *media plane* (WebRTC |
| 162 | +RTP/SRTP voice), an Elixir/OTP *control plane* (auth, rooms, presence, |
| 163 | +signaling), a P2P *data channel* (the burble proof-spec: a "bidirectional |
| 164 | +AI agent data channel" exchanging JSON over the same connection as voice), |
| 165 | +and a *Protobuf-defined wire protocol* shared by server and clients. (Your |
| 166 | +three-channel framing — voice/chat, an LLM channel, an "rtsm" real-time |
| 167 | +state channel — maps onto media-plane + two uses of the data channel; the |
| 168 | +exact `rtsm` name isn't in the public ARCHITECTURE.adoc I could read.) |
| 169 | + |
| 170 | +Idea: route the game's *stringy* comms (chat, names, AI/LLM text) over |
| 171 | +Burble's data channel so the AffineScript sim never touches them — routing |
| 172 | +strings *around* the AS string-gap, not through it. |
| 173 | + |
| 174 | +* *Strong fit:* the data channel is **Protobuf**, not ad-hoc string |
| 175 | + parsing — structured, length-delimited, integer-tagged. That is exactly |
| 176 | + the shape AffineScript likes at a boundary; the AS sim can read protobuf |
| 177 | + field tags/ints without needing variable-string ops. |
| 178 | +* *Works for:* non-authoritative peer strings (chat, voice, the LLM |
| 179 | + channel). Brain/senses: AS = integer brain on Phoenix; Burble data |
| 180 | + channel = peer string/voice senses. |
| 181 | +* *Does NOT replace the authoritative path:* game-*affecting* strings must |
| 182 | + still traverse Phoenix→Elixir→wasm (authority + determinism), not a P2P |
| 183 | + side-channel. |
| 184 | +* *Integration cost is real:* Burble needs signaling/discovery (its Elixir |
| 185 | + control plane, or Groove); adding it as a game dependency is a second |
| 186 | + transport. Verdict: a good *partial* offload for the comms layer, the |
| 187 | + protobuf wire is a bonus — but not a substitute for the variable-string |
| 188 | + backend on the authoritative path. |
| 189 | + |
| 190 | +=== T2 — "snifs instead of nifs": it doesn't dismantle this — it IS the server side |
| 191 | + |
| 192 | +CONFIRMED from `github.com/hyperpolymath/snifs`: *SNIFS = Safe NIFs* — |
| 193 | +native (Zig) code compiled to WebAssembly and run via `wasmex`/`wasmtime`, |
| 194 | +so guest faults (out-of-bounds, overflow, divide-by-zero, crashes) become |
| 195 | +`{:error, reason}` tuples instead of taking down the BEAM. Tagline: |
| 196 | +"WebAssembly sandboxing provides genuine crash isolation for BEAM NIFs." |
| 197 | + |
| 198 | +This is not a threat to server-authoritative determinism — *it is the |
| 199 | +recommended way to do the server side of it.* The diagram's "server-side |
| 200 | +`vm.wasm` via wasmex" literally IS a SNIF. So: |
| 201 | + |
| 202 | +* *Determinism is unaffected:* it comes from the *wasm module being |
| 203 | + identical both sides*. SNIFS runs that same wasm; the computation is |
| 204 | + bit-identical. |
| 205 | +* *SNIFS improves the architecture:* a runaway or faulting authoritative |
| 206 | + tick yields `{:error}` (the entity's GenServer rejects/rolls back that |
| 207 | + input) instead of crashing the lobby — exactly the OTP-shaped recovery |
| 208 | + you want. Raw-NIF embedding gives determinism but not containment; SNIFS |
| 209 | + gives both. |
| 210 | +* *A convergence worth naming:* the determinism argument *wants* wasm on |
| 211 | + the server (the same-binary property); SNIFS *independently* wants wasm |
| 212 | + on the server (crash isolation). Same choice, two reasons. SNIFS is the |
| 213 | + production substrate for "vm.wasm via wasmex." |
| 214 | + |
| 215 | +Net: use SNIFs, not raw NIFs — *same determinism, crash-contained |
| 216 | +authority*. Far from dismantling it, SNIFS is the piece that makes the |
| 217 | +server side safe to ship. |
| 218 | + |
| 219 | +=== T3 — three-runtime debugging (concrete anchor) |
| 220 | + |
| 221 | +A desync spans three runtimes. Concrete bug: *"Player A sees the door open; |
| 222 | +Player B sees it closed."* The door bit lives in the AS-wasm VM; the input |
| 223 | +was marshalled by the JS host; relayed/authorised by Elixir. Suspects: |
| 224 | + |
| 225 | +. *AS↔JS ABI* — A's input mis-marshalled (wrong integer crossed the wasm boundary). |
| 226 | +. *JS↔Elixir codec* — schema drift (a field dropped in JSON). |
| 227 | +. *Elixir ordering* — input applied at the wrong tick. |
| 228 | +. *determinism break* — B's wasm ≠ A's wasm (shouldn't happen with the same binary). |
| 229 | + |
| 230 | +One symptom, three runtimes, four boundaries, *no single stack trace*. The |
| 231 | +concrete handle: a **unified trace keyed by `(tick, entityId, traceId)`** |
| 232 | +that every runtime emits, so the cross-runtime causal chain can be |
| 233 | +reconstructed — and because the VM is *reversible + deterministic*, the |
| 234 | +exact desync can be **replayed from recorded inputs** across all three |
| 235 | +runtimes (record-and-replay debugging end to end). A generalised |
| 236 | +debugging idea is worth testing against this anchor: *does it help |
| 237 | +reconstruct/replay the cross-runtime causal chain keyed by tick?* |
| 238 | + |
| 239 | +== Provenance |
| 240 | + |
| 241 | +2026-06-04 AffineScript co-development session. Companion: the multiplayer |
| 242 | +architecture analysis in `proposals/idaptik/README.adoc` and the migratability |
| 243 | +tally of `src/app/multiplayer/*.res` (6 MIGRATABLE NOW / 5 EFFECT-GATED / |
| 244 | +2 STRING-GATED) that empirically confirms the brain/senses cleavage. |
0 commit comments