HodeTech · cemililik · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -8,7 +8,8 @@
 #   - lint-and-host-test, kernel-build, host-stable-check  (fast lane)
 #   - miri                                                 (required, slow)
 # The `miri` job runs the host-test suite under Stacked Borrows; it is
-# slower (~10–15 min) but a Miri regression is a hard stop. The
+# slower (~1–2 min in practice on the current small suite; historically
+# budgeted at ~10–15 min) but a Miri regression is a hard stop. The
 # `coverage` job is INFORMATIONAL only (it sets `continue-on-error: true`)
 # and must NOT be added to the required-checks list until the post-T-011
 # flip removes that flag — see docs/guides/ci.md §"Branch protection".
@@ -182,7 +183,8 @@ jobs:
   # ─── Miri: aliasing validation ──────────────────────────────────────────
   # Runs the full host-test suite under Miri's Stacked Borrows checker
   # (see ADR-0021 / UNSAFE-2026-0014). Slower than the fast lane
-  # (~10–15 min) and requires nightly, so it runs as its own job. A
+  # (~1–2 min in practice; historically budgeted ~10–15 min) and requires
+# nightly, so it runs as its own job. A
   # Miri regression is a hard stop.
   miri:
     name: miri (Stacked Borrows)

diff --git a/docs/analysis/reports/perf-baseline-2026-05-28-B4-closure.md b/docs/analysis/reports/perf-baseline-2026-05-28-B4-closure.md
@@ -0,0 +1,91 @@
+# Boot-to-end perf baseline — 2026-05-28 — B4-closure
+
+Generated by `tools/perf-harness.sh` — multi-run aggregation of the kernel's
+`boot-to-end elapsed = X ns` emission (P10 from the [2026-05-06 Track D
+review](../reviews/code-reviews/2026-05-06-full-tree/track-d-performance.md)).
+
+## Inputs
+
+| Field | Value |
+|-------|-------|
+| Run timestamp (UTC) | `2026-05-28T20:28:44Z` |
+| Iterations requested | 20 |
+| Iterations valid | 20 |
+| Iterations failed | 0 |
+| Per-run timeout | 5 s |
+| Build profile | release |
+| Kernel ELF | `target/aarch64-unknown-none/release/tyrne-bsp-qemu-virt` |
+| Git HEAD | `3ab029f` on `main` |
+| QEMU | `QEMU emulator version 10.2.2` |
+| QEMU machine | `-M virt -cpu cortex-a72 -m 128M -smp 1` |
+| Host `uname -a` | `Darwin MacBookPro.hgw.local 24.6.0 Darwin Kernel Version 24.6.0: Wed Nov  5 21:30:23 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_X86_64 x86_64` |
+| Wall-clock (full harness run) | 102 s |
+
+## Methodology
+
+Each iteration invokes `tools/run-qemu.sh` under a per-run watchdog;
+QEMU emits the boot trace through to `tyrne: all tasks complete` plus
+the `boot-to-end elapsed = X ns` line, then halts in WFI. The watchdog
+kills the QEMU process after the per-run timeout (the kernel never
+exits on its own). The integer ns delta is parsed out of stdout.
+
+Counter source: the kernel's `now_ns()` (`hal::Timer`) reads the EL1
+virtual generic-timer counter and converts to nanoseconds via the
+cached `CNTFRQ_EL0` resolution (62 500 000 Hz, 16 ns). Under QEMU TCG
+the counter advances based on emulated instructions rather than
+wall-clock time, so variance reflects translation-cache behaviour and
+host scheduler jitter, not real hardware performance. Each iteration
+is a fresh QEMU process; the TCG translation cache is destroyed
+between iterations, so every iteration pays the full translation cost.
+
+Statistics are computed across the valid samples only. Percentile
+convention is *nearest-rank* (1-indexed; `idx = ceil(p/100 * n)`).
+Stddev is the population formula (`n` divisor) — descriptive.
+
+**Note on p99 at small `n`.** Under nearest-rank, `p99 = a[ceil(0.99 *
+n)]`; for any `n < 100` the index rounds up to `n` and `p99 == max`
+by construction. The number is reported as-computed (matching p10 /
+p50 / p90's convention) but readers should not over-read it as a
+tail-latency signal at small `n`. p99 becomes statistically
+informative when `n >= 100`.
+
+## Metric — boot-to-end elapsed (nanoseconds)
+
+| Statistic | ns | ms |
+|-----------|---:|---:|
+| min | 15,624,992 | 15.625 |
+| p10 | 15,640,992 | 15.641 |
+| p50 | 17,587,008 | 17.587 |
+| p90 | 19,150,000 | 19.150 |
+| p99 | 21,154,992 | 21.155 |
+| max | 21,154,992 | 21.155 |
+| mean | 17,586,899 | 17.587 |
+| stddev | 1,428,711 | 1.429 |
+
+## Δ vs prior baseline (B3 closure, release)
+
+| Statistic | B3 closure (ms) | B4 closure (ms) | Δ ms |
+|-----------|---:|---:|---:|
+| p10 | 10.311 | 15.641 | +5.330 |
+| p50 | 11.884 | 17.587 | +5.703 |
+| p90 | 13.823 | 19.150 | +5.327 |
+
+The +5.3 to +5.7 ms increase is the T-019 task loader running at boot
+— the first post-bootstrap exercise of the per-call `Mmu::map`
+page-table-walk + TLB-flush sequence under a live MMU, amplified by
+QEMU TCG software-MMU emulation. One-time-at-boot; tightly clustered
+across percentiles (the signature of a uniform fixed cost, not a
+variance regression). Real-hardware projection: ~40 µs on a Cortex-A72.
+See the [B4 closure performance review](../reviews/performance-optimization-reviews/2026-05-28-B4-closure.md)
+§"Hotspot" for the per-component decomposition.
+
+## Verdict
+
+Baseline only — no proposal under measurement. This is the
+baseline-of-record for B5+ regression checks against B4's closing
+release-build performance. Cite the band above (p10 / p50 / p90) when
+comparing later changes against this snapshot. Single-run boot-to-end
+claims in PR bodies should be replaced with a fresh harness run when a
+non-trivial perf-relevant change lands; see
+[`docs/standards/infrastructure.md`](../../standards/infrastructure.md)
+§"Performance harness".