Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
# - lint-and-host-test, kernel-build, host-stable-check (fast lane)
# - miri (required, slow)
# The `miri` job runs the host-test suite under Stacked Borrows; it is
# slower (~10–15 min) but a Miri regression is a hard stop. The
# slower (~1–2 min in practice on the current small suite; historically
# budgeted at ~10–15 min) but a Miri regression is a hard stop. The
# `coverage` job is INFORMATIONAL only (it sets `continue-on-error: true`)
# and must NOT be added to the required-checks list until the post-T-011
# flip removes that flag — see docs/guides/ci.md §"Branch protection".
Expand Down Expand Up @@ -182,7 +183,8 @@ jobs:
# ─── Miri: aliasing validation ──────────────────────────────────────────
# Runs the full host-test suite under Miri's Stacked Borrows checker
# (see ADR-0021 / UNSAFE-2026-0014). Slower than the fast lane
# (~10–15 min) and requires nightly, so it runs as its own job. A
# (~1–2 min in practice; historically budgeted ~10–15 min) and requires
# nightly, so it runs as its own job. A
# Miri regression is a hard stop.
miri:
name: miri (Stacked Borrows)
Expand Down
91 changes: 91 additions & 0 deletions docs/analysis/reports/perf-baseline-2026-05-28-B4-closure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Boot-to-end perf baseline — 2026-05-28 — B4-closure

Generated by `tools/perf-harness.sh` — multi-run aggregation of the kernel's
`boot-to-end elapsed = X ns` emission (P10 from the [2026-05-06 Track D
review](../reviews/code-reviews/2026-05-06-full-tree/track-d-performance.md)).

## Inputs

| Field | Value |
|-------|-------|
| Run timestamp (UTC) | `2026-05-28T20:28:44Z` |
| Iterations requested | 20 |
| Iterations valid | 20 |
| Iterations failed | 0 |
| Per-run timeout | 5 s |
| Build profile | release |
| Kernel ELF | `target/aarch64-unknown-none/release/tyrne-bsp-qemu-virt` |
| Git HEAD | `3ab029f` on `main` |
| QEMU | `QEMU emulator version 10.2.2` |
| QEMU machine | `-M virt -cpu cortex-a72 -m 128M -smp 1` |
| Host `uname -a` | `Darwin MacBookPro.hgw.local 24.6.0 Darwin Kernel Version 24.6.0: Wed Nov 5 21:30:23 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_X86_64 x86_64` |
| Wall-clock (full harness run) | 102 s |

## Methodology

Each iteration invokes `tools/run-qemu.sh` under a per-run watchdog;
QEMU emits the boot trace through to `tyrne: all tasks complete` plus
the `boot-to-end elapsed = X ns` line, then halts in WFI. The watchdog
kills the QEMU process after the per-run timeout (the kernel never
exits on its own). The integer ns delta is parsed out of stdout.

Counter source: the kernel's `now_ns()` (`hal::Timer`) reads the EL1
virtual generic-timer counter and converts to nanoseconds via the
cached `CNTFRQ_EL0` resolution (62 500 000 Hz, 16 ns). Under QEMU TCG
the counter advances based on emulated instructions rather than
wall-clock time, so variance reflects translation-cache behaviour and
host scheduler jitter, not real hardware performance. Each iteration
is a fresh QEMU process; the TCG translation cache is destroyed
between iterations, so every iteration pays the full translation cost.

Statistics are computed across the valid samples only. Percentile
convention is *nearest-rank* (1-indexed; `idx = ceil(p/100 * n)`).
Stddev is the population formula (`n` divisor) — descriptive.

**Note on p99 at small `n`.** Under nearest-rank, `p99 = a[ceil(0.99 *
n)]`; for any `n < 100` the index rounds up to `n` and `p99 == max`
by construction. The number is reported as-computed (matching p10 /
p50 / p90's convention) but readers should not over-read it as a
tail-latency signal at small `n`. p99 becomes statistically
informative when `n >= 100`.

## Metric — boot-to-end elapsed (nanoseconds)

| Statistic | ns | ms |
|-----------|---:|---:|
| min | 15,624,992 | 15.625 |
| p10 | 15,640,992 | 15.641 |
| p50 | 17,587,008 | 17.587 |
| p90 | 19,150,000 | 19.150 |
| p99 | 21,154,992 | 21.155 |
| max | 21,154,992 | 21.155 |
| mean | 17,586,899 | 17.587 |
| stddev | 1,428,711 | 1.429 |

## Δ vs prior baseline (B3 closure, release)

| Statistic | B3 closure (ms) | B4 closure (ms) | Δ ms |
|-----------|---:|---:|---:|
| p10 | 10.311 | 15.641 | +5.330 |
| p50 | 11.884 | 17.587 | +5.703 |
| p90 | 13.823 | 19.150 | +5.327 |

The +5.3 to +5.7 ms increase is the T-019 task loader running at boot
— the first post-bootstrap exercise of the per-call `Mmu::map`
page-table-walk + TLB-flush sequence under a live MMU, amplified by
QEMU TCG software-MMU emulation. One-time-at-boot; tightly clustered
across percentiles (the signature of a uniform fixed cost, not a
variance regression). Real-hardware projection: ~40 µs on a Cortex-A72.
See the [B4 closure performance review](../reviews/performance-optimization-reviews/2026-05-28-B4-closure.md)
§"Hotspot" for the per-component decomposition.

## Verdict

Baseline only — no proposal under measurement. This is the
baseline-of-record for B5+ regression checks against B4's closing
release-build performance. Cite the band above (p10 / p50 / p90) when
comparing later changes against this snapshot. Single-run boot-to-end
claims in PR bodies should be replaced with a fresh harness run when a
non-trivial perf-relevant change lands; see
[`docs/standards/infrastructure.md`](../../standards/infrastructure.md)
§"Performance harness".
Loading
Loading