diff --git a/CLAUDE.md b/CLAUDE.md index 0ed7b41..e220f8d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ This file is the entry point for Claude-based AI agents (Claude Code, Claude API ## What this project is -Tyrne is a **capability-based microkernel** written in Rust, in the lineage of seL4 and Hubris. The project is **pre-alpha**, but implementation is well underway: the kernel boots end-to-end on QEMU `virt` aarch64 and runs a two-task capability-gated IPC demo. The project is **mid-Phase B** — the MMU, PMM, address-space objects, and task loader (load half) are done; the syscall ABI and first userspace task are next. Architecture is documented as Architecture Decision Records (see the [ADR index](docs/decisions/README.md)); active implementation work lives under `kernel/`, `hal/`, and `bsp-qemu-virt/`. Primary development target is QEMU `virt` on aarch64; first real hardware target is the Raspberry Pi 4. +Tyrne is a **capability-based microkernel** written in Rust, in the lineage of seL4 and Hubris. The project is **pre-alpha**, but implementation is well underway: the kernel boots end-to-end on QEMU `virt` aarch64 and runs a two-task capability-gated IPC demo. The project is **mid-Phase B** — the MMU, PMM, address-space objects, task loader (load half), and the syscall boundary (the EL0→EL1 ABI + panic-free dispatcher) are done; the first userspace task running in EL0 is next. Architecture is documented as Architecture Decision Records (see the [ADR index](docs/decisions/README.md)); active implementation work lives under `kernel/`, `hal/`, and `bsp-qemu-virt/`. Primary development target is QEMU `virt` on aarch64; first real hardware target is the Raspberry Pi 4. See [README.md](README.md) for the public overview. diff --git a/README.md b/README.md index 93a8d8b..8962fad 100644 --- a/README.md +++ b/README.md @@ -26,8 +26,8 @@ The kernel boots end-to-end on QEMU `virt` aarch64 today, runs a capability-gate | Physical Memory Manager | **Done** — bitmap allocator with zero-fill on `alloc_frame` and three-stage validation on `free_frame`. | | Per-task `AddressSpace` kernel object | **Done** — cap-gated `cap_create_address_space` / `cap_map` / `cap_unmap`. | | Task loader (load half) | **Done** — `load_image` produces a `LoadedImage` describing a populated address space for a `.rodata`-resident raw-flat blob. | -| Syscall ABI + EL0 entry | **Next** — Phase B5; will turn `LoadedImage` into a runnable `Task`. | -| First userspace "hello" | **Planned** — Phase B6. | +| Syscall ABI + dispatcher | **Done** — Phase B5; `SVC` trap → panic-free dispatcher → typed `SyscallError`; five-syscall v1 set; capability-gated `console_write` (debug-gated); validated copy-from/to-user. | +| First userspace "hello" (EL0) | **Next** — Phase B6; turns `LoadedImage` into a runnable EL0 `Task` and exercises the real EL0↔EL1 round-trip. | The active task and its current state live in [`docs/roadmap/current.md`](docs/roadmap/current.md). Full phase plans are under [`docs/roadmap/phases/`](docs/roadmap/phases/). diff --git a/docs/analysis/reports/perf-baseline-2026-05-29-B5-closure.md b/docs/analysis/reports/perf-baseline-2026-05-29-B5-closure.md new file mode 100644 index 0000000..dba6005 --- /dev/null +++ b/docs/analysis/reports/perf-baseline-2026-05-29-B5-closure.md @@ -0,0 +1,96 @@ +# Boot-to-end perf baseline — 2026-05-29 — B5-closure + +Generated by `tools/perf-harness.sh` — multi-run aggregation of the kernel's +`boot-to-end elapsed = X ns` emission (P10 from the [2026-05-06 Track D +review](../reviews/code-reviews/2026-05-06-full-tree/track-d-performance.md)). + +## Inputs + +| Field | Value | +|-------|-------| +| Run timestamp (UTC) | `2026-05-29T13:55:53Z` | +| Iterations requested | 20 | +| Iterations valid | 20 | +| Iterations failed | 0 | +| Per-run timeout | 5 s | +| Build profile | release | +| Kernel ELF | `target/aarch64-unknown-none/release/tyrne-bsp-qemu-virt` | +| Git HEAD | `afeed10` on `sec-review-b5-syscall-boundary` | +| QEMU | `QEMU emulator version 10.2.2` | +| Host `uname -a` | `Darwin MacBookPro.hgw.local 24.6.0 Darwin Kernel Version 24.6.0: Wed Nov 5 21:30:23 PST 2025; root:xnu-11417.140.69.705.2~1/RELEASE_X86_64 x86_64` | +| Wall-clock (full harness run) | 102 s | + +## Methodology + +Each iteration invokes `tools/run-qemu.sh` under a per-run watchdog; +QEMU emits the boot trace through to `tyrne: all tasks complete` plus +the `boot-to-end elapsed = X ns` line, then halts in WFI. The watchdog +kills the QEMU process after the per-run timeout (the kernel never +exits on its own). The integer ns delta is parsed out of stdout. + +Counter source: the kernel's `now_ns()` (`hal::Timer`) reads the EL1 +virtual generic-timer counter and converts to nanoseconds via the +cached `CNTFRQ_EL0` resolution. Under QEMU TCG the counter advances +based on emulated instructions rather than wall-clock time, so +variance reflects translation-cache behaviour and host scheduler +jitter, not real hardware performance. + +Statistics are computed across the valid samples only. Percentile +convention is *nearest-rank* (1-indexed; `idx = ceil(p/100 * n)`). +Stddev is the population formula (`n` divisor) — descriptive. + +**Note on p99 at small `n`.** Under nearest-rank, `p99 = a[ceil(0.99 * +n)]`; for any `n < 100` the index rounds up to `n` and `p99 == max` +by construction. The number is reported as-computed (matching p10 / +p50 / p90's convention) but readers should not over-read it as a +tail-latency signal at small `n`. p99 becomes statistically +informative when `n >= 100`. + +## Metric — boot-to-end elapsed (nanoseconds) + +| Statistic | ns | ms | +|-----------|---:|---:| +| min | 17,334,000 | 17.334 | +| p10 | 17,645,008 | 17.645 | +| p50 | 20,300,000 | 20.300 | +| p90 | 24,706,000 | 24.706 | +| p99 | 26,265,008 | 26.265 | +| max | 26,265,008 | 26.265 | +| mean | 21,065,050 | 21.065 | +| stddev | 2,696,816 | 2.697 | + +## Raw samples + +One ns value per line, in iteration order (NOT sorted): + +```text +23964992 +22780000 +22014992 +22609008 +18599008 +25636000 +17334000 +24706000 +18634000 +20300000 +22456000 +19612992 +20786000 +18138000 +17645008 +18336000 +19196992 +19168992 +23118000 +26265008 +``` + +## Verdict + +Baseline only — no proposal under measurement. Cite the band above +(p10 / p50 / p90) when comparing later changes against this snapshot. +Single-run boot-to-end claims in PR bodies should be replaced with a +fresh harness run when a non-trivial perf-relevant change lands; see +[`docs/standards/infrastructure.md`](../../standards/infrastructure.md) +§"Performance harness". diff --git a/docs/analysis/reviews/business-reviews/2026-05-29-B5-closure.md b/docs/analysis/reviews/business-reviews/2026-05-29-B5-closure.md new file mode 100644 index 0000000..5f47954 --- /dev/null +++ b/docs/analysis/reviews/business-reviews/2026-05-29-B5-closure.md @@ -0,0 +1,180 @@ +# Business review 2026-05-29 — B5 closure retrospective (syscall boundary; T-020 + T-021) + +- **Trigger:** milestone-completion. Phase B / Milestone B5 ("Syscall boundary") reached implementation-complete on 2026-05-29 when [T-020](../../tasks/phase-b/T-020-syscall-error-taxonomy.md) + [T-021](../../tasks/phase-b/T-021-syscall-dispatch.md) merged together into `main` via [PR #34](https://github.com/HodeTech/Tyrne/pull/34) (merge commit [`f98e1af`](https://github.com/HodeTech/Tyrne/commit/f98e1af)); this closure trio formally promotes B5 from "implementation-complete" to `Done`. The [security review](../security-reviews/2026-05-29-B5-syscall-boundary.md) (**Approve**, eight axes) was produced first at the maintainer's sequencing; this retrospective + the [performance baseline](../performance-optimization-reviews/2026-05-29-B5-closure.md) complete the trio. **B5 builds the boundary *mechanism* but does not open the real EL0 transition** — the EL0↔EL1 privilege round-trip through the lower-EL `VBAR_EL1 + 0x400` vector is a **B6** runtime-verification item, exercised at B5 only via an EL1-kernel-stub `SVC` through the current-EL `+0x200` vector (see [§"What changed in the plan"](#what-changed-in-the-plan)). +- **Scope:** the B5 syscall-boundary arc — [ADR-0030](../../../decisions/0030-syscall-abi.md) (syscall ABI + the K2-5 `IpcError` taxonomy split) + [ADR-0031](../../../decisions/0031-initial-syscall-set.md) (the five-syscall v1 set), both Accepted 2026-05-29; T-020 (the `IpcError::InvalidCapability` → `StaleHandle`/`WrongObjectKind`/`MissingRight` split + `Capability`/`CapObject` `Debug` redaction) + T-021 (the `SVC` trap trampoline + panic-free dispatcher + copy-from/to-user + the debug-console capability + `SyscallError`), bundled in one combined review on PR #34. Plus the closure-branch work measured here: the standalone security review and the `security-model.md` SMMUv3-CI reconcile ([`afeed10`](https://github.com/HodeTech/Tyrne/commit/afeed10)) that closed a B4-forward-flagged doc-vs-ADR-0036 contradiction. +- **Period:** 2026-05-28 (B4 closed by its closure trio) → 2026-05-29 (today; B5 opened, implemented, merged, and closed). A **single-calendar-day** milestone — unusually fast in wall-clock, deliberately *not* in rigor (see [§"What we learned"](#what-we-learned)). +- **Participants:** @cemililik (+ Claude Opus 4.8 (1M context) agent as scribe for the ADR-0030/0031 drafting, the T-020 + T-021 implementation + review-round arc, and this closure trio; an adversarial multi-agent pass on PR #34 acting as an independent attacker against the boundary; bot review-rounds from sourcery-ai + coderabbitai on PR #34). + +> **Canonical source for B5 closure metrics.** This artefact + the [security review](../security-reviews/2026-05-29-B5-syscall-boundary.md) + the [performance baseline](../performance-optimization-reviews/2026-05-29-B5-closure.md) are the source of truth for B5's closing numbers (test counts, ELF sizes, smoke traces, perf band, audit-log surface). Every other location ([`current.md`](../../../roadmap/current.md), [`phase-b.md`](../../../roadmap/phases/phase-b.md), the task review-history rows) is a *summary at its layer of abstraction*. Note in particular: pre-closure, `current.md` cited a mid-arc kernel-suite count of **236** and the security review cited **240** (the kernel crate alone); the live workspace total at HEAD `afeed10` is **339** (see [§Adjustments](#adjustments)). + +--- + +## What landed + +### Tasks promoted to Done + +| Task | Promoted | Description | +|------|----------|-------------| +| [T-020](../../tasks/phase-b/T-020-syscall-error-taxonomy.md) | 2026-05-29 (PR #34 merge `f98e1af`) | **Syscall error taxonomy — the pure-Rust B5 foundation.** Split `IpcError::InvalidCapability` into `StaleHandle` / `WrongObjectKind` / `MissingRight` (validation reordered to resolve → type-check → authority-check across `validate_ep_cap` / `validate_notif_cap` / `sched::resolve_ep_cap`; four arena-staleness sites → `StaleHandle`), and redacted `Capability`'s **and** `CapObject`'s `Debug` (rights/kind visible, the named object + slot/generation hidden — K3-9). No new `unsafe` (safe Rust throughout). Closes the [2026-04-21 Phase-A review](../code-reviews/2026-04-21-tyrne-to-phase-a.md)'s `InvalidCapability`-collapse follow-up (K2-5) and the security-review §6 redaction item (K3-9). Adds an additive §Revision rider to [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) (the IPC primitive set is *refined*, not superseded). | +| [T-021](../../tasks/phase-b/T-021-syscall-dispatch.md) | 2026-05-29 (PR #34 merge `f98e1af`) | **EL0→EL1 `SVC` dispatch — the hardware-facing half.** A new architecture-agnostic, panic-free kernel [`syscall`](../../../../kernel/src/syscall/) module (`error.rs` `SyscallError` + stable status encoding; `abi.rs` number decode + register packing + null-handle sentinel; `user_access.rs` `UserAccessWindow` + `copy_from_user`/`copy_to_user`; `dispatch.rs` the dispatcher + five handlers + the debug-console capability check) + the BSP `tyrne_sync_trampoline` (installed at **both** `VBAR_EL1 + 0x200` current-EL and `+0x400` lower-EL slots, saving the full 272-byte register frame) + `syscall_entry` + the `syscall_boundary_smoke` EL1 stub. New cap surface: `CapObject::DebugConsole` (unit variant, no handle) + `CapRights::CONSOLE_WRITE` (bit 7) + `CapHandle::from_raw`. `console_write` carries two independent gates — the capability check (all builds) + the release debug-gate (`cfg!(debug_assertions)`; number `5` → `BadSyscallNumber` in release). Two new audit entries (UNSAFE-2026-0029 / 0030). **Security-relevant** — separately security-reviewed. B5 exercises only the `+0x200` current-EL path via the EL1 stub; the real EL0 `+0x400` round-trip is B6. | + +### ADRs + +| ADR | Action | Notes | +|-----|--------|-------| +| [ADR-0030 — Syscall ABI and userspace error taxonomy](../../../decisions/0030-syscall-abi.md) | **Accepted 2026-05-29** (Propose `476710b` → careful-re-read + maintainer-review Accept `93d5960`) | Settles the register convention (`x8` = number, `x0`–`x5` args, `SVC #0`, `x0` = status `0`=Ok + `x1`–`x7` payload), the dedicated-status-register error encoding (Option A — result/error never alias), the `SyscallError` composition type (`From`/`From`), and the **K2-5 `IpcError` split** + its §"Security of the taxonomy split" rationale (per-subject unforgeable handles ⇒ revealing *which* check failed aids no forgery/enumeration). First syscall ABI ADR; includes the §Simulation table + the row-to-verification mapping that splits B5 (`+0x200` proxy) from B6 (`+0x400` real EL0). | +| [ADR-0031 — Initial syscall set (B-phase)](../../../decisions/0031-initial-syscall-set.md) | **Accepted 2026-05-29** (Propose `476710b` → Accept `93d5960`) | Fixes the five-syscall v1 set — `send`(1) / `recv`(2) / `task_yield`(3) / `task_exit`(4) / `console_write`(5); number `0` reserved-invalid — with concrete per-call register layouts, the "thin validator over an existing primitive" handler design, and the two-gate `console_write` contract. The numbers `1`–`5` are a **fixed ABI decision** (T-021's host tests regression-verify them, do not choose them). Minimal-surface discipline: every added syscall is unused panic-free dispatch surface with no v1 consumer. | +| [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) | **Append-only §Revision rider (T-020)** | Records that `IpcError`'s taxonomy was refined (the K2-5 split) — the three-primitive `send`/`recv`/`notify` surface is unchanged; not a supersession. | +| ADR-0033 / ADR-0034 | **Still slot-reserved** | Named-but-unallocated per [ADR-0025 §Rule 1](../../../decisions/0025-adr-governance-amendments.md). ADR-0033 (high-half migration) gate is now **imminent** — it opens with B6's per-task `TTBR0_EL1` swap (the prerequisite for ever running a real EL0 task: kernel mappings in the userspace AS + an EL0 context register file). ADR-0034 (kernel-image section permissions) opens with the first attacker-observable EL0 execution (B6). | + +### Pull requests merged into `main` + +| PR | Merge | Scope | +|---|---|---| +| #34 | `f98e1af` | **The combined T-020 + T-021 B5 review** (the maintainer chose one bundled PR over stacked PRs). 9 commits: `476710b` (ADR-0030/0031 propose) → `93d5960` (ADR accept after careful re-read + maintainer review) → `d20e6d0` (T-020 `IpcError` split) → `324457a` (T-020 `Capability`/`CapObject` `Debug` redaction) → `4777f9a` (T-020 `cancel_recv` variant tests) → `7b35ed6` (T-020 wrong-kind-before-rights proof) → `1df1b52` (T-020 In Review + B5-acceptance narrowed to the current-EL proxy) → `806c966` (T-021 dispatch — trampoline, panic-free dispatcher, copy-user) → `5145d4d` (T-021 review-round dispatch tests + compile-time payload guard) → `2c713c0` (T-021 review-round 2 — overlap-safe `core::ptr::copy` + scope/guard fixes) → `1a7deab` (UNSAFE-2026-0030 Amendment — disjointness correction). | + +Two further commits landed on the closure branch (`sec-review-b5-syscall-boundary`, off `f98e1af`): `c424dcb` (the standalone B5 security review) and `afeed10` (the `security-model.md` SMMUv3-CI reconcile — closing a B4-forward-flagged doc-vs-ADR-0036 contradiction; the kernel binary is byte-identical to `f98e1af`). + +### Audit-log surface + +The audit log ([`docs/audits/unsafe-log.md`](../../../audits/unsafe-log.md)) now holds **30** `UNSAFE-2026-####` entries (0001–0030; 0012 `Removed`, so **29 Active**). The period's changes: + +- **[UNSAFE-2026-0029](../../../audits/unsafe-log.md#unsafe-2026-0029--svc-sync-trap-trampoline--syscall_entry-register-frame-access)** introduced (T-021): the `SVC` sync trap trampoline asm + `syscall_entry`'s `*mut SyscallTrapFrame` reads/writes. Opened **standalone** (not an Amendment of the IRQ path's UNSAFE-2026-0020) — a different vector-slot pair, a larger full-register-file 272-byte frame, a synchronous `SVC` cause with `ESR_EL1.EC` decode, and a frame the handler *writes back* (the syscall result). **Second-reviewer-signed** per unsafe-policy §Review.4 (the EL0→EL1 trust boundary). Smoke-verified for the `+0x200` current-EL path; carries a status note that the `+0x400` lower-EL path's verification lifts via Amendment when B6's first EL0 task runs. +- **[UNSAFE-2026-0030](../../../audits/unsafe-log.md#unsafe-2026-0030--validated-copy-fromto-user-byte-move-via-coreptrcopy_nonoverlapping)** introduced (T-021): the validated copy-from/to-user byte move. Standalone (not an Amendment of the loader's UNSAFE-2026-0027) — the source/destination is a **userspace-supplied integer** the kernel does not own a reference into, gated by a runtime range check against the active AS. Carries the **2026-05-29 disjointness-correction Amendment** (`2c713c0`): a review-round finding observed that the original "non-overlap is proven by `validate`" invariant was false (`validate` checks bounds, not disjointness); the operation moved to `core::ptr::copy`, and — verified empirically under Miri — the actual soundness basis is the **user/kernel disjointness invariant** (an overlapping pair is UB regardless of the copy primitive, via the parameter's borrow exclusivity). An earlier over-claim that "`copy` makes aliasing safe" was itself corrected in the Amendment. +- T-020 added **zero** new `unsafe` (the `IpcError` split + the hand-written redacting `Debug` impls are safe Rust). +- **Forward-flagged carry:** UNSAFE-2026-0019 / 0020 / 0021 retain their `Pending QEMU smoke verification` notes for the IRQ-take / deadline-fire path (gates on the first `arm_deadline` caller — still unfired in v1). UNSAFE-2026-0029's `+0x400` path note (above) is the new analogous forward-flag. + +### Test counts at B5 closure + +| Crate | B4 closure (2026-05-28) | **B5 closure (2026-05-29)** | Δ | +|-------|------------------------:|----------------------------:|---| +| `tyrne-hal` (lib) | 43 | **43** | 0 | +| `tyrne-kernel` (lib) | 187 | **240** | **+53** | +| `tyrne-test-hal` (lib) | 53 | **53** | 0 | +| doc-tests | 3 | **3** | 0 | +| **Total** | **286** | **339** | **+53** | + +All +53 land in `tyrne-kernel`: **+9 from T-020** (the kernel suite moved 187 → 196 — 6 remapped `InvalidCapability` assertions + 9 new variant/redaction tests) and **+44 from T-021** (196 → 240 — the `syscall` module's `error`/`abi`/`user_access`/`dispatch` host suites, including the four review-round dispatch tests). The [performance baseline §Metric 2](../performance-optimization-reviews/2026-05-29-B5-closure.md#metric-2--test-count) decomposes the split. + +**Gates (reproduced live, pinned `nightly-2026-01-15`, HEAD `afeed10`):** + +- `cargo host-test` — **339 passed / 0 failed** (43 hal + 240 kernel + 53 test-hal + 3 doc-tests). +- `cargo fmt --check` / `cargo host-clippy` (`-D warnings`) / `cargo kernel-clippy` (`-D warnings`) / `cargo kernel-build` — all clean. +- `cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt` — **339 passed / 0 failed, 0 UB** under Stacked Borrows. **Run locally this closure** — the first closure to do so (Miri is installed on this host's pinned toolchain; B2–B4 recorded the CI-gate Miri result). This directly discharges the [Phase-B exit prerequisite](../../../roadmap/phases/phase-b.md) (a green Miri run weighted on `sched`/`ipc`) against the now-larger syscall-bearing kernel. + +#### Smoke trace (the load-bearing closure evidence) + +Per the [business master-plan §Acceptance criteria](master-plan.md#acceptance-criteria), a milestone cannot promote past `In Review` to `Done` without a recorded smoke trace + `-d int,unimp,guest_errors` count; narrative claims are insufficient. QEMU 10.2.2, `-M virt -cpu cortex-a72 -m 128M -smp 1`, HEAD `afeed10`. + +**Release build (canonical; the debug-gate visible at runtime):** + +```text +tyrne: hello from kernel_main +tyrne: mmu activated +tyrne: pmm initialized (32602 frames available; 166 reserved) +tyrne: address-space-arena ready (1 / 8 slots used; bootstrap AS root = 0x40092000) +tyrne: image loaded (entry = 0x800000; sp = 0x802000; image bytes 8; stack bytes 4096; AS cap = idx 1) +tyrne: timer ready (62500000 Hz, resolution 16 ns) +tyrne: syscall smoke ok (console_write status=0x1, bytes=0; bad-number status=0x1) +tyrne: starting cooperative scheduler +tyrne: task B — waiting for IPC +tyrne: task A -- sending IPC +tyrne: task B — received IPC (label=0xaaaa); replying +tyrne: task A — received reply (label=0xbbbb); done +tyrne: all tasks complete +tyrne: boot-to-end elapsed = 25092992 ns +``` + +The B5 marker is `tyrne: syscall smoke ok (...)`, between `timer ready` and `starting cooperative scheduler`. In **release**, `console_write` (number `5`) returns **`status=0x1` (`BadSyscallNumber`, `bytes=0`)** — the release debug-gate drops it from the surface even for a capability holder, so the buffer is never touched and the `tyrne: hello from the syscall boundary` greeting is **absent**. This absence is a direct runtime proof of the debug-gate. + +**Debug build (the functional `SVC` round-trip, emitting):** the two extra lines + +```text +tyrne: hello from the syscall boundary (console_write via SVC) +tyrne: syscall smoke ok (console_write status=0x0, bytes=63; bad-number status=0x1) +``` + +appear between `timer ready` and `starting cooperative scheduler` — the first `SVC` runs the full `console_write` path (cap check → `copy_from_user` validates the 63-byte buffer → console emits → `status=0x0`, `bytes=63`); the second `SVC` (reserved-invalid) → `0x1`. + +**`-d int,unimp,guest_errors`:** release **712 events** (712 pre-existing `PL011 data written to disabled UART` warnings + **2 `SVC` exceptions**); debug **776 + 2 `SVC`**. The **2 `SVC` exceptions are new but expected** (the boot smoke) — both taken at the current-EL `+0x200` vector (`Taking exception 2 [SVC] ... from EL1 to EL1 ... with ESR 0x15/0x56000000` = EC `0x15` SVC64), each `ERET`ing cleanly. **Zero Translation faults, zero Permission faults, zero unimplemented/unallocated events** — no new fault class. The +83 PL011 vs B4 (629) is exactly the new banner-line bytes. The full IPC demo still runs to `tyrne: all tasks complete`. + +#### Perf band + footprint (summary; the [performance leg](../performance-optimization-reviews/2026-05-29-B5-closure.md) is canonical) + +- **Release harness band** (20 iterations, HEAD `afeed10`): p10 / p50 / p90 = **17.645 / 20.300 / 24.706 ms** ([report](../../reports/perf-baseline-2026-05-29-B5-closure.md)). A **same-host back-to-back control** (the B4 binary `3ab029f` rebuilt + re-measured this session) proves the **~+2.9 ms p10/p50 increase versus B4 is real B5 code, not host jitter** — it is the boot `SVC` smoke (2 exception round-trips + cold TCG translation of the new syscall path), one-time-at-boot and projected at ~µs on real hardware. The control corrected an initial mis-read of the raw band as host jitter; details in the perf leg's §Measurement. +- **ELF footprint** (release): `.text` **34,648** (+1,524) / `.rodata` **4,856** (+296) / `.bss` **50,592** (+2,272); total **~88.0 KiB** (+4.76 % vs B4). The smallest non-refactor `.text` growth of any Phase-B milestone — the syscall boundary is a thin validator/dispatch layer, not a new subsystem. + +### Documentation surface + +- [ADR-0030](../../../decisions/0030-syscall-abi.md) + [ADR-0031](../../../decisions/0031-initial-syscall-set.md) — new ADRs (the syscall ABI + initial set). +- [`docs/architecture/ipc.md`](../../../architecture/ipc.md) — new "`IpcError` taxonomy" section (the K2-5 split). +- [`docs/architecture/security-model.md`](../../../architecture/security-model.md) — redaction rule broadened to capabilities (T-020); **and** the SMMUv3-CI-gate staleness reconciled with ADR-0036 (`afeed10`, the closure-branch fix — both prior "QEMU `virt` has SMMUv3 / is the CI gate" sentences now state the v1 GICv2/no-IOMMU reality and point at [ADR-0036](../../../decisions/0036-qemu-virt-gicv2-no-iommu-v1.md)). +- [`docs/glossary.md`](../../../glossary.md) — syscall terms; [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) §Revision rider. +- [`docs/audits/unsafe-log.md`](../../../audits/unsafe-log.md) — UNSAFE-2026-0029 + 0030 added (0030 with the disjointness Amendment). +- [security review 2026-05-29](../security-reviews/2026-05-29-B5-syscall-boundary.md) + [performance baseline 2026-05-29](../performance-optimization-reviews/2026-05-29-B5-closure.md) + the harness report [`perf-baseline-2026-05-29-B5-closure.md`](../../reports/perf-baseline-2026-05-29-B5-closure.md). + +## What changed in the plan + +- **B5 closed in a single calendar day — the fastest milestone of the project.** B4 closed 2026-05-28; B5's ADRs, both tasks, the combined PR #34 merge, and the security review all landed 2026-05-29. This is a structural data point about pace: a milestone *can* land in a day when (a) the ADRs front-load the decisions, (b) the work splits cleanly into a pure-Rust half and a hardware half, and (c) the per-step review discipline is preserved (it was — see [§"What we learned"](#what-we-learned)). It is **not** a precedent to rush future milestones; it is evidence that the project's discipline scales *down* to a small, well-bounded milestone without being skipped. +- **The phase-b §B5 acceptance criterion was narrowed before code landed (the same-day maintainer correction).** The originally-drafted §B5 acceptance criterion promised a *real EL0 round-trip*; a same-day maintainer review observed that an `SVC` from the only available caller (an EL1 kernel-stub) takes the **current-EL `+0x200`** vector, not the lower-EL `+0x400` EL0 vector — so a real EL0 round-trip is **structurally impossible** in B5 (it needs kernel mappings in the userspace AS + an EL0 context register file, both gated on the ADR-0033 high-half placeholder). The criterion was narrowed to "the dispatch mechanism via the current-EL kernel-stub proxy," and the real EL0 `+0x400` round-trip was **moved to a B6 acceptance criterion** ([commit `1df1b52`](https://github.com/HodeTech/Tyrne/commit/1df1b52)). This is an acceptance-criteria wording change that makes B5 prove exactly what it can prove and no more. +- **A new "T-021 carry-forward gates" subsection was added to phase-b §B6.** Three gates the B5 EL1-stub proxy did not need but B6's first real EL0 task **must** close: (1) `console_write`'s user window must become **per-task** (derived from the EL0 task's mapped region, not the whole RAM extent) + the int-to-pointer deref must become a per-page user-VA → kernel-VA translation, returning `FaultAddress` (never panic) — *the single most important gate*; (2) `SP_EL1` must be initialised for the `+0x400` entry; (3) the `SYSCALL_STUB_TABLE` must be swapped for the scheduler's current-task table. These are tracked so they are not lost between B5 and B6. +- **The ADR-correction-before-Accept process held under time pressure.** A same-day maintainer review of the ADR/task arc raised that the corrections had initially been folded into the *Accepted* ADR bodies (an append-only concern). It was resolved by rebasing so the corrections land in the `Proposed` draft and Accept is a separate clean commit (`93d5960`) — **no Accepted body was edited post-Accept**. The append-only ADR integrity rule survived a one-day milestone intact. +- **A B4-forward-flagged doc contradiction was closed.** The B4 closure forward-flagged that `security-model.md` still described QEMU `virt` as SMMUv3-equipped / the CI gate, contradicting [ADR-0036](../../../decisions/0036-qemu-virt-gicv2-no-iommu-v1.md). The B5 security pass reconciled both sentences to the v1 GICv2/no-IOMMU reality (`afeed10`). N/A as a v1 *defect* (no bus-master driver exists), but a live doc-vs-Accepted-ADR contradiction now closed. +- **No B-phase milestone re-shuffling.** B6 (First userspace "hello") becomes active. The deferred `task_create_from_image` wrapper (phase-b §B4 §3) — the `LoadedImage` → runnable `CapHandle{CapObject::Task(...)}` bridge — is unchanged and opens as the first B6-area task, now that B5 has landed the syscall prerequisites it waited on. + +## What we learned + +### The pure-Rust / hardware-boundary task split let the most security-sensitive milestone land safely in one day + +This is the central learning. B5 builds the EL0→EL1 trap — the single widest untrusted-input surface in the system — and it landed in a calendar day without being rushed, because the work was split per [CLAUDE.md §6](../../../../CLAUDE.md) ("do not dump entire subsystems in a single pass") into **T-020** (the pure-Rust, host-testable error taxonomy + `Debug` redaction — no `unsafe`, no trampoline, exercisable by the existing IPC test suite *before* the dispatcher existed) and **T-021** (the hardware-facing trap/dispatch/copy-user — its own focused review, its own two audit entries, its own adversarial pass). The split meant the security-critical half got concentrated scrutiny rather than being diluted across a taxonomy refactor, and the pure-Rust half de-risked the error space ahead of the boundary. The pattern generalises: a security-sensitive milestone should separate its *provable-in-safe-Rust foundation* from its *`unsafe` hardware boundary*, land the foundation first, and review the boundary alone. This is the reusable shape for B6's EL0 work and any future driver boundary. + +### An adversarial pass + Miri caught a real soundness over-claim that a confirming review missed + +T-021's review included a multi-agent adversarial pass actively trying to break the boundary, plus Miri. Together they found that the copy-from/to-user `unsafe` had an **incorrect stated invariant** — the original claim that `UserAccessWindow::validate` proves the kernel/user buffers don't overlap is false (it proves *bounds*, not *disjointness*). Worse, an interim "fix" over-claimed that switching to `core::ptr::copy` made overlapping calls safe — which Miri **empirically disproved** (Stacked Borrows rejects the aliasing access through the exposed `user_ptr` because the slice parameter's borrow is exclusive). The honest resolution — the soundness basis is the user/kernel *disjointness invariant* (userspace memory vs a distinct kernel allocation / a separate AS), which holds structurally for every real caller — landed as an append-only Amendment to UNSAFE-2026-0030. Two lessons: (a) an adversarial reviewer that tries to *break* the claim catches over-claims a confirming reviewer rubber-stamps; (b) Miri is a **soundness oracle**, not just a regression gate — it falsified a plausible-sounding safety argument. The audit log's append-only-Amendment mechanism absorbed the correction cleanly (the introducing commit's claim is preserved; the Amendment carries the SHA and the corrected basis). + +### A same-host perf control corrected a mis-attribution — and showed the QEMU harness is nearing its resolving floor + +The raw B5 boot-to-end band looked noisier than B4's (stddev ~doubled), and the first read was "host jitter." A **same-host, back-to-back control** (the B4 binary rebuilt and re-measured this session) disproved that: the B5 binary is a reproducible **~+2.9 ms** slower at p10/p50 than the B4 binary on the *same* host, while the B4 binary reproduced its *own* recorded p50 across sessions (17.4 vs 17.6 ms). So the delta is real B5 code — the boot `SVC` smoke (2 exception round-trips + cold TCG translation), benign and one-time, but not noise. The learning has two parts: (a) **trust a same-host control over a cross-session band** — the closure perf method should keep this; (b) the per-milestone QEMU-TCG signal (~+2.9 ms) is now **comparable in magnitude to the session-to-session host-jitter** (the same B4 binary's stddev moved 1.4 → 2.2 ms between sessions), meaning the harness is approaching its resolving floor for *small* milestones. Real-hardware measurement — the long-deferred trigger — is increasingly the only instrument that can resolve sub-ms milestone costs. + +### Front-loading the ABI before the trampoline kept the numbers a decision, not an accident + +ADR-0030 (convention) + ADR-0031 (the concrete numbers/layouts) were Accepted *before* any trampoline or dispatcher was written. The payoff: T-021's host tests **regression-verify** the syscall numbers and register layouts against the ADR table — they do not get to choose them. Had the ABI been settled by "what the first trampoline happened to do," every downstream artefact (the EL0 stub, the EL1 frame, the dispatcher, the `tyrne-user` crate) would have inherited an accidental convention. This is the same front-loading discipline ADR-0027 (MMU) and ADR-0029 (image format) applied to their boundaries, now confirmed for the widest interface in the system. + +### Honest scope-statement discipline matured: "build the mechanism, defer the transition" + +B5 is a clean example of stating precisely what a milestone proves versus defers. The ADR §Simulation row-to-verification mapping, the same-day acceptance-criterion narrowing, and the three named B6 carry-forward gates all draw the same line: the `+0x200` proxy proves the *shared* save → decode → dispatch → `ERET` mechanism + the dispatcher logic; it does **not** prove the `+0x400` EL0 vector entry, the privilege transition, or copy-user against a separate userspace `TTBR0_EL1`. The smoke evidence is scoped to match (the release trace even proves the debug-gate by the *absence* of a line). This avoids the "Simulation table without verification = documentation drift" anti-pattern and the inverse over-claimed-smoke trap — the milestone's claims and its evidence agree exactly. + +## B4 closure §Adjustments — closure status + +The [2026-05-28 B4 closure retrospective §Adjustments](2026-05-28-B4-closure.md#adjustments) listed items; status at B5 closure: + +| B4 Adjustment | Status (2026-05-29) | Closing reference | +|---|---|---| +| **B5 opens — ADR-0030 + ADR-0031** | **Closed** | Both Accepted 2026-05-29; T-020 + T-021 Done via PR #34. This closure trio. | +| **B5+ `MemoryRegion` cap + per-op rights extension** | **Open (trigger now reachable in B6)** | The `cap_map`/`cap_unmap` kind-only-check gap. B5 did not introduce a per-task AS cap, so the trigger held; B6's per-task userspace AS is the natural trigger. The B5 security review re-flagged it (the `CapRights::{MAP,UNMAP,ACTIVATE}` + `CapKind::MemoryRegion` pairing). | +| **B-phase BSP task — proper PL011 init** | **Open (trigger-deferred)** | `-d guest_errors` is still 100 % PL011 noise (712 release / 776 debug at B5; the +83 vs B4's 629 is the new banner line). **Trigger:** B6's first userspace fault-test, where a clean baseline lets the test distinguish real fault classes from PL011 noise. | +| **BSP host-test crate for block-descriptor `unmap`** | **Open (trigger-deferred)** | Unchanged; `bsp-qemu-virt` still has no host-test crate. **Trigger:** the first runtime `cap_unmap` block-descriptor case (B6+ teardown). | +| **High-half kernel migration (ADR-0033 placeholder)** | **Trigger now imminent** | B5 reinforced and *dated* the trigger: the syscall path's `+0x400` EL0 vector and copy-user against a separate userspace `TTBR0_EL1` are exactly what ADR-0033 unblocks. Opens with B6's first per-task `TTBR0_EL1` swap. | +| **Kernel-image section permissions (ADR-0034 placeholder)** | **Still trigger-deferred** | Opens with the first attacker-observable EL0 execution (B6). | + +Net: **1 of 6 closed** (B5 milestone via ADR-0030/0031 + T-020/T-021); 5 carry forward, two with **now-imminent B6 triggers** (the `MemoryRegion` rights ADR + ADR-0033 high-half). No trigger fired prematurely; the carry-forward is honest. + +## Adjustments + +- [x] **`current.md` test-count drift.** **✅ Closed in-branch 2026-05-29.** `current.md` cited a mid-arc kernel count of **236** (T-021 review-round-1 snapshot); the standalone security review cited **240** (the kernel crate's final count). The live workspace total at HEAD `afeed10` is **339** (43 hal + 240 kernel + 53 test-hal + 3 doc-tests). `current.md` refreshed to the live figures + a B5-closure banner. +- [x] **`current.md` banner + Pathfinder flip to B5-closed / B6-next.** **✅ Closed in-branch 2026-05-29** — a new 2026-05-29 B5-closure top banner (339 tests + 712/776 guest-errors + the perf band + the same-host-control note) and the four Pathfinder bullets flipped to B6. +- [x] **`security-model.md` SMMUv3-CI reconcile (B4 forward-flag).** **✅ Closed in-branch 2026-05-29** (`afeed10`) — both stale "QEMU `virt` has SMMUv3 / is the CI gate" sentences now state the v1 GICv2/no-IOMMU reality and point at ADR-0036. +- [ ] **B6 opens — First userspace "hello".** The first task is the deferred **`task_create_from_image`** wrapper (phase-b §B4 §3): turn a `LoadedImage` into a runnable `CapHandle{CapObject::Task(...)}` (composes the loader output with an EL0-ready `Task` context register file). Then `userland/hello` (a `no_std`/`no_main` crate) + the `tyrne-user` safe-wrapper crate + the wire-up that schedules and runs it. **Must close the three T-021 carry-forward gates** (per-task `console_write` window + per-page user-VA translation returning `FaultAddress`; `SP_EL1` init for `+0x400`; `SYSCALL_STUB_TABLE` → current-task table) **before** a real EL0 task runs. **Trigger:** the next task-open arc; paired with the **ADR-0033 high-half placeholder** opening (the per-task `TTBR0_EL1` swap + kernel mappings in the userspace AS). B6 is the **Phase B closure** milestone (its review is the phase-level retrospective). +- [ ] **B5+ `MemoryRegion` cap + per-operation rights extension.** Carry from B3/B4. **Trigger:** the first B6 per-task AS cap not co-resident with the bootstrap-everything cap. +- [ ] **B-phase BSP task — proper PL011 init.** Carry. **Trigger:** B6's first userspace fault-test needing a clean `-d guest_errors` baseline. +- [ ] **BSP host-test crate for block-descriptor `unmap`.** Carry. **Trigger:** first runtime `cap_unmap` block-descriptor case (B6+ teardown). +- [ ] **ADR-0034 — kernel-image section permissions.** Carry. **Trigger:** first attacker-observable EL0 execution (B6). +- [ ] **Real-hardware perf measurement.** Carry — reinforced by this closure's perf leg (the QEMU-TCG harness is nearing its resolving floor for small milestones). **Trigger:** the first Raspberry Pi 4 BSP (Phase D). + +> **Closure-trio actions taken in-branch (2026-05-29).** Of this list, the three *actionable-now* Adjustments were executed during the closure: the `current.md` test-count refresh, the banner/Pathfinder flip, and the `security-model.md` SMMUv3 reconcile (the last landed via the security pass, `afeed10`). The rest are forward / trigger-deferred B6-and-later work with their stated triggers; none has fired, so leaving them open is the honest state. + +## Next + +- **Active phase:** B (unchanged). +- **Active milestone:** **B6 — First userspace "hello"** (B5 closed via this trio). B6 is the Phase-B-closing milestone; its review will double as the Phase B retrospective. +- **Active task:** none — B5 closed. The first task to open is the deferred **`task_create_from_image`** wrapper (the `LoadedImage` → runnable `TaskCap` bridge), paired with the **ADR-0033 high-half** ADR opening; then `userland/hello` + `tyrne-user`. The three T-021 carry-forward gates must close before a real EL0 task runs. +- **Next review trigger:** **B6 closure trio + Phase B retrospective.** Produced when B6 (the first real userspace "hello") reaches `In Review`. Possible interim triggers: a mini-retro if the EL0 bring-up (the `+0x400` vector, the privilege transition, per-task copy-user) surfaces a learning worth capturing; a maintainer-initiated or on-demand full-tree master review if the corpus drifts before B6 closes. diff --git a/docs/analysis/reviews/business-reviews/README.md b/docs/analysis/reviews/business-reviews/README.md index 805e004..0327b8f 100644 --- a/docs/analysis/reviews/business-reviews/README.md +++ b/docs/analysis/reviews/business-reviews/README.md @@ -35,3 +35,4 @@ A business review may point at outcomes from those other reviews as part of "wha | 2026-05-09 | B2 closure retrospective — MMU activation + kernel-half mapping (T-016); ADR-0027 + `MapperFlush` flush-token discipline; closed cleanly on first attempt (no smoke-regression arc) | [2026-05-09-B2-closure.md](2026-05-09-B2-closure.md) | | 2026-05-14 | B3 closure retrospective — Address-space abstraction (T-017 PMM + T-018 `AddressSpace` kernel object); ADR-0035 + ADR-0028; five-round PR #28 review arc + cross-cutting `MmuError::BlockMapped` + `.claude/skills/` → `.agents/skills/` migration | [2026-05-14-B3-closure.md](2026-05-14-B3-closure.md) | | 2026-05-28 | B4 closure retrospective — Task loader (T-019 `load_image` → `LoadedImage`); ADR-0029 + the 2026-05-22 master-review interlude (4 Blocker / 18 Major full-tree audit) + PR #32 remediation closing all 24 verified findings (MR-009's exit-bar half closed in-branch here); ADR-0036 supersession of GICv3/SMMUv3; UNSAFE-2026-0027 + 0028 added, 0025/0026 lifted; HodeTech org migration | [2026-05-28-B4-closure.md](2026-05-28-B4-closure.md) | +| 2026-05-29 | B5 closure retrospective — Syscall boundary (T-020 error taxonomy + T-021 EL0→EL1 SVC dispatch); ADR-0030 + ADR-0031; one-day milestone via the pure-Rust/hardware-boundary split; adversarial pass + Miri corrected a copy-user soundness over-claim; same-host perf control; UNSAFE-2026-0029/0030 (audit log 30 / 29 Active); B6 + Phase B retrospective next | [2026-05-29-B5-closure.md](2026-05-29-B5-closure.md) | diff --git a/docs/analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md b/docs/analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md new file mode 100644 index 0000000..e9aa090 --- /dev/null +++ b/docs/analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md @@ -0,0 +1,243 @@ +# Performance baseline 2026-05-29 — B5 closure (syscall boundary; T-020 + T-021) + +- **Concern:** Did the B5 syscall boundary — [T-020](../../tasks/phase-b/T-020-syscall-error-taxonomy.md) (the `IpcError` split + `Capability`/`CapObject` `Debug` redaction) plus [T-021](../../tasks/phase-b/T-021-syscall-dispatch.md) (the `SVC` trap trampoline + panic-free dispatcher + copy-from/to-user + the debug-console capability), merged together via [PR #34](https://github.com/HodeTech/Tyrne/pull/34) (`f98e1af`) — shift the kernel image footprint, RAM use, or boot-to-end timing versus the [post-T-019 B4 closure baseline](2026-05-28-B4-closure.md)? +- **Scope:** All committed code on `main` from [`3ab029f`](https://github.com/HodeTech/Tyrne/commit/3ab029f) (B4 closure baseline anchor — README clarity follow-up) through the B5 arc: [ADR-0030](../../../decisions/0030-syscall-abi.md) + [ADR-0031](../../../decisions/0031-initial-syscall-set.md) (Accepted 2026-05-29), T-020 + T-021 merged via PR #34 (`476710b` ADR propose → `806c966` dispatch → `5145d4d`/`2c713c0`/`1a7deab` review-round follow-ups → `f98e1af` merge). Measured at the closure-branch HEAD [`afeed10`](https://github.com/HodeTech/Tyrne/commit/afeed10) (`sec-review-b5-syscall-boundary`: the B5 security review + the `security-model.md` SMMUv3 reconcile; identical kernel binary to `f98e1af` — the two extra commits are docs-only). +- **Hypothesis:** Re-baseline artefact, **with one hypothesis the measurement tested adversarially.** Per the [master plan's pre-flight](master-plan.md#pre-flight-hypothesis), no improvement target is set — the goal is the canonical post-B5 baseline for B6+ regression checks. The implicit non-hypothesis: T-020 (pure-Rust, no boot-path change) + T-021 (a kernel `syscall` module + a BSP trap trampoline, exercised at boot by an EL1-kernel-stub `SVC` smoke) should add **bounded** `.text`/`.rodata`/`.bss` and a **small, one-time-at-boot** timing cost (2 `SVC` round-trips + the cold TCG translation of the new syscall path), `<<` real-hardware cost. The hypothesis under test: *is the measured boot-to-end delta versus B4 a real B5 cost, or this session's host jitter?* — answered by a **same-host back-to-back control** (§Measurement). +- **Reviewer:** @cemililik (+ Claude Opus 4.8 (1M context) agent in the Baseline / Hotspot / Measurement / Reporter roles; Proposal is empty — no optimisation is proposed this cycle). +- **Target:** QEMU `virt`, aarch64, Cortex-A72 model, single core, 128 MiB RAM (`-M virt -cpu cortex-a72 -m 128M -smp 1`; unchanged). +- **Build:** release profile (`cargo build --release --target aarch64-unknown-none -p tyrne-bsp-qemu-virt`); footprint via `rust-size -A -d` (the workspace `llvm-tools-preview` size tool, sysv/decimal form). + +> **Canonical source for B5 closure metrics.** This artefact + the [business retrospective](../business-reviews/2026-05-29-B5-closure.md) + the [security review](../security-reviews/2026-05-29-B5-syscall-boundary.md) are the source of truth for B5's closing footprint / timing / test numbers. Other locations ([`current.md`](../../../roadmap/current.md), [`phase-b.md`](../../../roadmap/phases/phase-b.md), the [T-020](../../tasks/phase-b/T-020-syscall-error-taxonomy.md) / [T-021](../../tasks/phase-b/T-021-syscall-dispatch.md) review-history rows) are *summaries at their layer of abstraction*; corrections start here. **Note:** before this closure, `current.md` cited a mid-arc kernel-suite count of **236** and the standalone security review cited **240** (the kernel crate alone); the live workspace total at HEAD `afeed10` is **339** (43 hal + 240 kernel + 53 test-hal + 3 doc-tests) — see §"Metric 2" and the business retro's §Adjustments. + +--- + +## Baseline + +### Methodology + +- ELF section sizes via `rust-size -A -d` (workspace-pinned `nightly-2026-01-15` toolchain, `llvm-tools-preview`), release profile (`-C opt-level=3`). The `-A` (all-sections, sysv) `-d` (decimal) form composes directly into the trajectory table; the value reported is each section's allocated size (test-only `#[cfg(test)]` fakes are not in the release BSP ELF). +- Boot-to-end timing via [`tools/perf-harness.sh`](../../../../tools/perf-harness.sh) (the P10 wall-clock harness; see [`infrastructure.md` §"Performance harness"](../../../standards/infrastructure.md)). 20 iterations, 5 s per-run timeout, release build, single host (Darwin 24.6.0 / x86_64; QEMU 10.2.2 / TCG). Each iteration is a fresh QEMU process; the TCG translation cache is destroyed between iterations. +- The harness report (the canonical B5 band) lives at [`docs/analysis/reports/perf-baseline-2026-05-29-B5-closure.md`](../../reports/perf-baseline-2026-05-29-B5-closure.md). +- Single-run smoke traces (release **and** debug) recorded for the canonical record; `-d int,unimp,guest_errors` event counts captured for the fault-class regression check. +- Counter source: the kernel's `now_ns()` (`hal::Timer`) reads the EL1 virtual generic-timer counter (62 500 000 Hz, 16 ns resolution) and converts to ns. Under QEMU TCG the counter advances with emulated execution, so variance reflects translation-cache behaviour and host scheduler jitter, not real-hardware performance. +- **A same-host control** was run this session: the B4 closure binary (`3ab029f`) was rebuilt in an isolated git worktree and re-measured with the same 20-iteration harness on the same host, back-to-back with the B5 measurement, to separate B5 code cost from inter-session host-state drift (§Measurement). + +### Metric 1 — Kernel image size + +| Section | post-T-019 + remediation (2026-05-28 B4 closure) | post-T-020/T-021 (2026-05-29 B5 closure) | Δ bytes | Δ % | +|---------|-------------------------------------------------:|-----------------------------------------:|--------:|----:| +| `.text` | 33,124 | **34,648** | **+1,524** | **+4.6 %** | +| `.rodata` | 4,560 | **4,856** | **+296** | **+6.5 %** | +| `.bss` | 48,320 | **50,592** | **+2,272** | **+4.7 %** | + +Total kernel image (`.text + .rodata + .bss`) = **90,096 B ≈ 88.0 KiB**, up from B4's 86,004 B ≈ 84.0 KiB — a **+4,092-byte (+4.76 %)** combined growth. B5 is a code-modest milestone: the `.text` delta (+1,524) is ~⅙ of B4's (+9,116), because B5 adds a thin validator/dispatch layer over existing kernel primitives rather than a new runtime subsystem (B4's loader). + +**Observations:** + +- **`.text` grew by +1,524 bytes (+4.6 %)** — the architecture-agnostic kernel [`syscall`](../../../../kernel/src/syscall/) module (`error.rs` — `SyscallError` + the exhaustive-without-wildcard status encoders; `abi.rs` — `SyscallNumber::decode` + the register packing + the null-handle sentinel; `user_access.rs` — `UserAccessWindow` + `copy_from_user`/`copy_to_user`; `dispatch.rs` — the `match`-over-number dispatcher + the five thin handlers + the debug-console capability check), plus the cap-system additions (`CapRights::CONSOLE_WRITE`, `CapObject::DebugConsole`, `CapHandle::from_raw`) and the BSP-side glue (the `tyrne_sync_trampoline` asm + `syscall_entry` + the `syscall_boundary_smoke` EL1 stub). Each handler is a thin validator over an existing primitive (`ipc_send`/`ipc_recv`/`yield_now`/console/terminate), so the per-handler footprint is small — consistent with [ADR-0031](../../../decisions/0031-initial-syscall-set.md)'s "thin validator + call into an existing primitive" design driver. +- **`.rodata` grew by +296 bytes (+6.5 %)** — the `SyscallError` / `SyscallNumber` discriminant material + the two new boot-banner format strings (`tyrne: hello from the syscall boundary (...)` and `tyrne: syscall smoke ok (...)`). The redacting `Capability`/`CapObject` `Debug` impls (T-020) use `format_args!` with inline literals, adding negligible static strings. +- **`.bss` grew by +2,272 bytes (+4.7 %)** — dominantly the new `SYSCALL_STUB_TABLE` BSP static (a dedicated `CapabilityTable` the B5 EL1-stub resolves its debug-console capability from; B6 swaps it for the scheduler's current-task table) plus the syscall path's static working set. No new linker-script reservation beyond the existing `.boot_pt`. + +**A6 → … → B5 cumulative trajectory** (extends the [B4 closure trajectory table](2026-05-28-B4-closure.md#metric-1--kernel-image-size); A6 baseline `.text 13,940` / `.rodata 1,960` / `.bss 17,872`): + +| Section | A6 | B1 | post-T-016 | post-T-018 (B3) | post-T-019 (B4) | **post-T-020/T-021 (B5)** | A6 → today total Δ | +|---------|---:|---:|---:|---:|---:|---:|---| +| `.text` | 13,940 | 21,908 | 22,384 | 24,008 | 33,124 | **34,648** | **+20,708 (+148.5 %)** | +| `.rodata` | 1,960 | 2,784 | 2,944 | 3,536 | 4,560 | **4,856** | **+2,896 (+147.8 %)** | +| `.bss` | 17,872 | 22,248 | 40,208 | 42,080 | 48,320 | **50,592** | **+32,720 (+183.1 %)** | + +The total kernel image is now **~88.0 KiB**, up from ~84.0 KiB at B4. The B5 per-milestone deltas are the smallest code-growth of any Phase-B milestone except the pure-refactor ones — the syscall boundary is a small surface by design (five syscalls, [ADR-0031](../../../decisions/0031-initial-syscall-set.md)'s minimal set). + +### Metric 2 — Test count + +| Crate | B4 closure (2026-05-28) | **B5 closure (2026-05-29)** | Δ | +|-------|------------------------:|----------------------------:|---| +| `tyrne-hal` (lib) | 43 | **43** | 0 | +| `tyrne-kernel` (lib) | 187 | **240** | **+53** | +| `tyrne-test-hal` (lib) | 53 | **53** | 0 | +| doc-tests | 3 | **3** | 0 | +| **Total** | **286** | **339** | **+53** | + +All +53 land in `tyrne-kernel`, decomposing into the two B5 tasks: + +- **T-020 (+9 net):** the kernel suite moved 187 → 196 — 6 existing `InvalidCapability` assertions remapped to their post-split variants + 9 new tests (5 `ipc` `WrongObjectKind`/`StaleHandle` variant tests across `ipc_send`/`ipc_recv`/`ipc_notify` + 2 `ipc_cancel_recv` variant tests + 2 `cap` redaction tests pinning that the slot index/generation does **not** appear in `Capability`/`CapObject` `Debug`). +- **T-021 (+44):** 196 → 240 — the `syscall` module's host suites: `error.rs` (status-encoding stability + `From`/`From` round-trips + the Cap/Ipc-block-no-collision guard), `abi.rs` (number decode incl. `0`/out-of-range + the debug-gate `5`-absent-in-release pair + `Message`/outcome/`Option` register packing), `user_access.rs` (in-range / out-of-range / overrun / zero-length / wrap validation), and `dispatch.rs` (per-syscall handlers incl. the four review-round dispatch tests — transfer-cap round-trip, stale-transfer-handle, recv-pending packing, exactly-one-chunk console_write). + +All 339 host tests pass at HEAD `afeed10`. **Reproduced live this session** (pinned `nightly-2026-01-15`): `cargo host-test` → 43 + 240 + 53 + 3 = **339 passed / 0 failed**; `cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt` → **339 passed / 0 failed / 0 UB** under Stacked Borrows (this closure is the first to run Miri **locally** — the B2–B4 closures recorded the CI-gate Miri result because Miri was not installed on the closure host; see §"Regression check"). + +### Metric 3 — Boot-to-end timing + +QEMU smoke at HEAD `afeed10`, QEMU 10.2.2, `-M virt -cpu cortex-a72 -m 128M -smp 1`, captured live 2026-05-29. + +#### Source A — single-run release smoke trace (canonical record) + +```text +tyrne: hello from kernel_main +tyrne: mmu activated +tyrne: pmm initialized (32602 frames available; 166 reserved) +tyrne: address-space-arena ready (1 / 8 slots used; bootstrap AS root = 0x40092000) +tyrne: image loaded (entry = 0x800000; sp = 0x802000; image bytes 8; stack bytes 4096; AS cap = idx 1) +tyrne: timer ready (62500000 Hz, resolution 16 ns) +tyrne: syscall smoke ok (console_write status=0x1, bytes=0; bad-number status=0x1) +tyrne: starting cooperative scheduler +tyrne: task B — waiting for IPC +tyrne: task A -- sending IPC +tyrne: task B — received IPC (label=0xaaaa); replying +tyrne: task A — received reply (label=0xbbbb); done +tyrne: all tasks complete +tyrne: boot-to-end elapsed = 25092992 ns +``` + +Single run: **~25.09 ms boot-to-end** (release, no `-d` flags). The single-run number is anecdotal; the harness band (Source B) is the load-bearing claim. + +**What the release trace proves — the debug-gate, at runtime.** The B5 marker line is `tyrne: syscall smoke ok (console_write status=0x1, bytes=0; bad-number status=0x1)`, inserted between `timer ready` and `starting cooperative scheduler`. In a **release** build, `console_write` (syscall number `5`) returns **`status=0x1` (`BadSyscallNumber`, `bytes=0`)** — the [ADR-0031](../../../decisions/0031-initial-syscall-set.md) release **debug-gate** drops number `5` from the surface even for a capability holder, so the buffer is never touched and no greeting is emitted. The second `SVC` (a reserved-invalid number) likewise returns `0x1`. The `tyrne: hello from the syscall boundary (...)` line is therefore **absent** in release — its presence/absence is a direct, observable proof of the debug-gate. (The **debug** build emits it; see Source A′.) + +Two other lines shifted numerically vs B4: + +- **`pmm initialized`** now reports **166 reserved** (B4: 165) and **32602 available** (B4: 32603). The +1 reserved frame is the grown kernel image (the syscall module's `.text`/`.rodata`) pushing the boot reservation set up by one 4 KiB frame. +- **bootstrap AS root** shifted `0x40091000` → `0x40092000` (one frame higher) for the same reason. + +Every other line is byte-stable from the B4 trace through `tyrne: all tasks complete`. + +#### Source A′ — single-run **debug** smoke trace (the syscall round-trip, emitting) + +```text +tyrne: timer ready (62500000 Hz, resolution 16 ns) +tyrne: hello from the syscall boundary (console_write via SVC) +tyrne: syscall smoke ok (console_write status=0x0, bytes=63; bad-number status=0x1) +tyrne: starting cooperative scheduler +``` + +In a **debug** build the debug-gate admits number `5`, so the first `SVC` runs the full `console_write` path: capability check passes → `copy_from_user` validates the 63-byte `.rodata` buffer against the active-AS window → the console emits `tyrne: hello from the syscall boundary (console_write via SVC)` → returns **`status=0x0`, `bytes=63`**. The second `SVC` (reserved-invalid number) returns `0x1`. This is the positive, functional proof of the EL1-self-`SVC` round-trip that the release build (correctly) gates off. (Debug build PMM banner: `170 reserved`, AS root `0x40096000` — the unoptimised image is larger; this is build-profile size, not a kernel change.) + +#### Source B — perf-harness band (release, 20 iterations, canonical) + +Recorded in [`docs/analysis/reports/perf-baseline-2026-05-29-B5-closure.md`](../../reports/perf-baseline-2026-05-29-B5-closure.md). 20/20 valid, wall-clock 102 s, HEAD `afeed10`: + +| Statistic | ns | ms | +|-----------|----:|----:| +| min | 17,334,000 | 17.334 | +| p10 | 17,645,008 | 17.645 | +| p50 | 20,300,000 | 20.300 | +| p90 | 24,706,000 | 24.706 | +| p99 | 26,265,008 | 26.265 | +| max | 26,265,008 | 26.265 | +| mean | 21,065,050 | 21.065 | +| stddev | 2,696,816 | 2.697 | + +`p99 == max` by nearest-rank construction for `n < 100` (see the [report's methodology note](../../reports/perf-baseline-2026-05-29-B5-closure.md#methodology)). + +**Δ vs B4 closure (the 2026-05-28-recorded band):** + +| Statistic | B4 closure doc (ms) | B5 closure (ms) | Δ ms | Δ % | +|-----------|---:|---:|---:|---:| +| p10 | 15.641 | 17.645 | +2.004 | +12.8 % | +| p50 | 17.587 | 20.300 | +2.713 | +15.4 % | +| p90 | 19.150 | 24.706 | +5.556 | +29.0 % | +| stddev | 1.429 | 2.697 | +1.268 | +88.7 % | + +The naive cross-session delta is **+2.0 / +2.7 / +5.6 ms** with a near-doubled stddev. Two distinct effects are tangled in it — a real B5 cost and a noisier host this session — and the §Measurement same-host control separates them. + +## Hotspot + +The boot-to-end increase is dominated by **one new one-time-at-boot cost: the `syscall_boundary_smoke` EL1 kernel-stub running at boot** — two `SVC` exception round-trips plus the first-time cold TCG translation of the new syscall path, amplified by QEMU TCG's expensive exception emulation. The same-host control (§Measurement) confirms this is a **real, deterministic ~+2.9 ms** B5 cost, not host jitter. + +### Hotspot 1 — the boot-time `SVC` smoke (dominant, deterministic) + +`syscall_boundary_smoke` runs once at `kernel_entry`, between the `timer ready` and `starting cooperative scheduler` banners. It grants a debug-console capability into `SYSCALL_STUB_TABLE` (a few cheap cap-table ops) and then issues **two `SVC #0`** instructions. Each `SVC`: + +1. **Takes a synchronous exception at the current-EL `VBAR_EL1 + 0x200` vector** (the EL1-stub executes at EL1, so it takes the current-EL-with-SPx slot, not the lower-EL `+0x400` EL0 slot — see [ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation)). Under QEMU TCG, taking an exception is a heavyweight operation: QEMU ends the in-flight translation block, computes the vector, switches EL/PSTATE, and translates the vector code cold. +2. **Saves the full 272-byte register frame** (`x0`–`x30` + `SP_EL0` + `ELR_EL1` + `SPSR_EL1`; 17 `stp`/`mrs` pairs in `tyrne_sync_trampoline`), decodes `ESR_EL1.EC`, and `bl`s into `syscall_entry` → `tyrne_kernel::syscall::dispatch`. +3. **Runs the dispatcher** (number decode → handler). In the **release** smoke both `SVC`s decode to `BadSyscallNumber` (number `5` is debug-gated off; the second is reserved-invalid), so the dispatch body is minimal — but the *first* execution of the trampoline + `syscall_entry` + the decode path is **cold-translated by TCG** on first entry. +4. **Restores the frame and `ERET`s** — another EL transition QEMU TCG emulates. + +The dominant cost is the **two `SVC` exception transitions + the cold TCG translation of the trampoline / dispatch path under QEMU TCG**. An exception under TCG is community-benchmarked as one of the more expensive emulated events (it breaks TB chaining and forces cold translation at the vector), and this is the **first** code in the kernel that takes a synchronous `SVC` exception at runtime (B1's IRQ path is asynchronous and, in the v1 demo, never fires — no `arm_deadline` caller). So the cost surfaces here for the first time. It is **one-time-at-boot** (the smoke runs once), and in **release** it does *not* even include the `console_write` copy-user + 63-byte console emit (those are debug-gated off) — the cost is almost entirely the two exception round-trips + their cold translation. + +> **Attribution caveat (honest).** The split between "TCG exception-transition overhead" and "cold first-time translation of the syscall path" is **not separately instrumented** this cycle — both are bundled into the single boot-to-end delta, and isolating them would need `-icount` or per-phase `now_ns()` probes around the smoke. The same-host control (§Measurement) establishes the **aggregate** is real and deterministic (~+2.9 ms); the finer split is a candidate for a future hypothesis-driven cycle if it ever becomes load-bearing (it will not until real hardware, where both costs collapse to ~µs). + +**Real-hardware projection:** a real `SVC` on a Cortex-A72 is **tens of cycles**; the trampoline's 17-register save/restore is a few dozen more; the cold I-cache fill of the syscall path is **microseconds** at the first touch and zero thereafter. The whole boot smoke is on the order of **single-digit microseconds** on real hardware. The ~+2.9 ms QEMU figure is a software-MMU/exception-emulation artefact, not a kernel performance defect — the same pattern every prior milestone showed (B2 MMU activation +1.5 ms, B3 PMM +6–7 ms, B4 loader +5.3 ms, all QEMU-TCG-only). + +### Hotspot 2 — first-time TCG translation of the larger `.text` (small) + +The post-B5 kernel image is +1,524 bytes `.text` larger. The boot-to-end window now includes more first-time-translated code (the syscall module + the trampoline), but only the code paths actually *executed at boot* are translated — in release that is the trampoline + the decode→`BadSyscallNumber` path, not the (un-run) `send`/`recv`/`console_write` handler bodies. Every iteration pays the full translation cost because the harness wraps each iteration in a fresh QEMU process. This is a minor contributor folded into Hotspot 1's cold-translation share. + +**Real-hardware projection:** zero — translation cost is a TCG artefact. + +### Hotspot 3 — UART output bandwidth (unchanged dominant on real hardware) + +The B5 release boot emits one additional banner line (`tyrne: syscall smoke ok (...)`). The `-d int,unimp,guest_errors` PL011 count rose from B4's 629 to B5's **712** (release) — the Δ is the new banner-line UART bytes (one `PL011 data written to disabled UART` warning per byte while QEMU's PL011 rides reset state). Bounded (< 1 ms aggregate). On *real* hardware UART output bandwidth remains the projected dominant boot-to-end cost (a PL011 flushing the multi-line trace at a real baud rate dwarfs the µs-scale syscall smoke) — the real-hardware baseline, once it exists, will be UART-bound, not syscall-bound. + +## Proposal + +**None this cycle.** This artefact records the baseline for B6+ regression checks; it does not propose an optimisation. The single new boot cost (the `SVC` smoke) is a deliberate B5 acceptance-criterion exerciser, not an inefficiency. + +**Rejected proposals (with reasoning):** + +- *"Drop the boot `SVC` smoke to recover the ~2.9 ms."* Rejected — the smoke is the **runtime proof of B5's acceptance criterion #7** (the EL1-self-`SVC` round-trip through the shared trampoline/dispatch mechanism, per [ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation)). Removing it to save a QEMU-TCG-only boot cost would delete the one piece of evidence that the dispatcher works end-to-end at runtime. The cost is `<<` real-hardware and one-time; the master plan's anti-pattern "rewriting the design under the banner of performance" applies. +- *"Gate the smoke behind a `cfg`/feature so release boots skip it."* Rejected for v1 — the release smoke is itself load-bearing (it proves the debug-gate at runtime: `console_write status=0x1` in release). The cost is bounded and one-time; B6 replaces the stub smoke with a real EL0 task anyway, at which point the stub smoke retires naturally. +- *"`-icount`-instrument the smoke to split exception-vs-translation cost."* Rejected as premature — the aggregate is proven real by the same-host control; the finer split has no consumer until real hardware (where both collapse to µs). Deferred to a future hypothesis-driven cycle if ever needed. + +## Measurement + +No optimisation proposal is under measurement — but the cycle **did** run one decisive measurement: a **same-host, same-session, back-to-back control** to test whether the boot-to-end delta versus B4 is a real B5 cost or this session's host jitter. The B4 closure binary ([`3ab029f`](https://github.com/HodeTech/Tyrne/commit/3ab029f)) was rebuilt in an isolated git worktree and re-measured with the identical 20-iteration release harness on the same host, immediately alongside the B5 measurement. + +### Control result + +| Statistic | **B4 binary** (`3ab029f`, this session) | **B5 binary** (`afeed10`, this session) | Δ (B5 − B4, same session) | B4 binary as recorded 2026-05-28 | +|-----------|---:|---:|---:|---:| +| min | 13.099 | 17.334 | +4.235 | 15.625 | +| p10 | 14.652 | 17.645 | **+2.993** | 15.641 | +| p50 | 17.376 | 20.300 | **+2.924** | 17.587 | +| p90 | 19.680 | 24.706 | +5.026 | 19.150 | +| stddev | 2.155 | 2.697 | +0.542 | 1.429 | + +Two conclusions, both load-bearing: + +1. **The +2.9 ms is real B5 code, not host jitter.** Back-to-back on the same host, the B5 binary is **+2.99 ms at p10 and +2.92 ms at p50** over the B4 binary — and the *lower* percentiles (least host-jitter, the most jitter-resistant comparators) carry the delta cleanly. A deterministic boot cost shifts the floor; that is exactly the signature here. This is the boot `SVC` smoke (Hotspot 1). +2. **The host is genuinely noisier this session — but that is *not* what the B4→B5 delta is.** The B4 binary's p50 reproduced almost exactly across sessions (**17.376** this session vs **17.587** recorded 2026-05-28; Δ −0.21 ms), proving the measurement is sound and host-state-stable at p50 — so the naive cross-session +2.7 ms p50 delta is *also* real B5 cost, corroborating conclusion 1. Separately, the B4 binary's **stddev rose 1.429 → 2.155** across sessions — *that* part is host state (this session has more scheduler jitter), and it accounts for the inflated upper-tail / p90 spread on top of the deterministic floor shift. B5 adds a further small variance bump (2.155 → 2.697) consistent with the variable cold-translation cost of the new path. + +This control is the cycle's most valuable output: the initial read of the raw band ("stddev nearly doubled — probably host jitter") was **wrong**, and the same-host control caught it. The B5 boot smoke adds a real, reproducible ~+2.9 ms QEMU-TCG boot cost; it is benign (one-time, TCG-amplified, ~µs on real hardware) but it is *not* noise, and recording it honestly is what makes this baseline trustworthy for B6's comparison. + +> **Baseline-trustworthiness note for B6.** Because this session's host is noisier than B4's (stddev 2.155 vs 1.429 even for the *same* B4 binary), B6's perf review should treat the **p10 / p50** of this B5 band (17.645 / 20.300 ms) as the stable comparators and re-run a same-host B5 control if B6's session differs, rather than reading the p90 (24.706 ms) as a fixed reference. The min/p10 floor shift is the reliable B5-cost signal. + +## Regression check + +- `cargo host-test`: **339 / 339 pass** (43 hal + 240 kernel + 53 test-hal + 3 doc-tests; +53 vs B4 closure, all in `tyrne-kernel`). 0 failed. Reproduced live at HEAD `afeed10`. +- `cargo fmt --check`: clean. +- `cargo host-clippy` (`clippy --all-targets -D warnings`): clean. +- `cargo kernel-clippy` (`-D warnings`): clean — the kernel crate's panic-free discipline confirms the dispatcher + handlers have no `panic!`/`unwrap`/`expect` on the register-supplied path. +- `cargo kernel-build` (release): clean. +- `cargo +nightly miri test --workspace --exclude tyrne-bsp-qemu-virt`: **339 / 339 pass, 0 UB** under Stacked Borrows (43 + 240 + 53 + 3). **Run locally this closure** (Miri is installed on this host's pinned `nightly-2026-01-15`) — the first closure to do so; B2–B4 recorded the CI-gate Miri result. The syscall module's two `unsafe` sites are in the **BSP** crate (excluded from this Miri run as it is `no_std`/bare-metal); the *kernel-side* `copy_from_user`/`copy_to_user` `unsafe` ([UNSAFE-2026-0030](../../../audits/unsafe-log.md#unsafe-2026-0030--validated-copy-fromto-user-byte-move-via-coreptrcopy_nonoverlapping)) **is** in `tyrne-kernel` and Miri-exercised via the `user_access` host tests (permissive-provenance pattern) — clean. +- **QEMU smoke trace:** verbatim in §"Source A" (release) + §"Source A′" (debug). Full demo through `tyrne: all tasks complete` in both. `-d int,unimp,guest_errors`: **release 712 events** (712 PL011 disabled-UART warnings + **2 `SVC` exceptions**), **debug 776 + 2 `SVC`**. The **2 `SVC` exceptions are new but expected** (the boot smoke) — both taken at the current-EL `+0x200` vector (`Taking exception 2 [SVC] ... from EL1 to EL1 ... with ESR 0x15/0x56000000` = EC `0x15` SVC64), each `ERET`ing cleanly. **Zero Translation faults, zero Permission faults, zero unimplemented/unallocated events.** The Δ from B4 (629 PL011, 0 SVC) is +83 PL011 (the new banner-line bytes) + the 2 expected SVCs; **no new fault class** — the syscall trap path enters and returns cleanly. +- **Security cross-reference:** see the [B5 security review](../security-reviews/2026-05-29-B5-syscall-boundary.md) — **Approve**, all eight axes pass; the EL0→EL1 boundary is panic-free, capability-gated, and copy-user-validated; UNSAFE-2026-0029 / 0030 are policy-conformant (second-reviewer-signed). No security-sensitive path regressed by perf-relevant changes (there were none — B5 added no optimisation). +- **`unsafe` diff:** the period added **two** audit entries — [UNSAFE-2026-0029](../../../audits/unsafe-log.md#unsafe-2026-0029--svc-sync-trap-trampoline--syscall_entry-register-frame-access) (the `SVC` trampoline asm + `syscall_entry` frame access; BSP) and [UNSAFE-2026-0030](../../../audits/unsafe-log.md#unsafe-2026-0030--validated-copy-fromto-user-byte-move-via-coreptrcopy_nonoverlapping) (the validated copy-from/to-user move; kernel, with the disjointness-correction Amendment). Total audit-log entries: **30** (0001–0030; 0012 `Removed` → **29 Active**). T-020 added **zero** `unsafe` (the `IpcError` split + the hand-written redacting `Debug` impls are safe Rust). Neither entry is performance-driven. + +**Drift flagged (non-blocking, handed to the business retro's §Adjustments):** + +- **`current.md` / mid-arc test-count drift.** `current.md` cited a mid-arc **236** (kernel suite at the T-021 review-round-1 snapshot) and the standalone security review cited **240** (the kernel crate's final count). The live workspace total at HEAD `afeed10` is **339** (43 + 240 + 53 + 3 doc-tests). The closure refreshes `current.md` to the live figures. + +## Verdict + +**Baseline — no proposal under measurement.** + +The B5 closure baseline records a **+1,524-byte `.text` / +296-byte `.rodata` / +2,272-byte `.bss`** image-footprint growth (total ~88.0 KiB, +4.76 % vs B4 — the smallest non-refactor milestone growth in Phase B, reflecting the syscall boundary's thin-validator design) and a **real, deterministic ~+2.9 ms p10/p50 boot-to-end increase under QEMU TCG**, proven by a same-host back-to-back control to be **B5 code, not host jitter**. The increase is the `syscall_boundary_smoke` EL1-stub running at boot — two `SVC` exception round-trips + the cold TCG translation of the new syscall path, amplified by QEMU TCG's expensive exception emulation. It is **one-time-at-boot**, projected at **single-digit microseconds on a real Cortex-A72**, and not a steady-state or per-syscall regression (the v1 IPC demo's steady-state is unchanged — the demo tasks never cross the syscall boundary in B5). The footprint growth is the kernel `syscall` module + the BSP trampoline, well within [ADR-0031](../../../decisions/0031-initial-syscall-set.md)'s minimal-surface budget. + +This is a **measured non-change** in the master-plan's sense (a re-baseline with no optimisation proposed) — but the same-host control elevates it above a pure re-baseline: it corrected an initial mis-attribution of the delta to host jitter and pinned the ~+2.9 ms as a benign, deterministic, TCG-only boot cost. Cite the band (p10 / p50 / p90 = 17.645 / 20.300 / 24.706 ms; release; 20-iteration harness; QEMU 10.2.2 on this Darwin/x86_64 host) when comparing B6 against this snapshot — treating **p10 / p50** as the stable cross-session comparators (this session's host is noisier than B4's; see the §Measurement trustworthiness note). + +### B4 closure §Forward-flagged items — closure status + +The [2026-05-28 B4 closure baseline](2026-05-28-B4-closure.md) carried forward-flags; status at B5 closure: + +| B4 forward-flag | Status (2026-05-29) | Note | +|---|---|---| +| Real-hardware perf measurement | **Still trigger-deferred — reinforced** | The QEMU-TCG band remains informational; this cycle's same-host control shows the per-milestone signal (~+2.9 ms) is now comparable to the session-to-session host-jitter noise (stddev 1.4→2.2 across sessions for the *same* binary). The harness is approaching its resolving floor for small milestones; real hardware is increasingly the only instrument that can measure sub-ms milestone costs. **Trigger:** the first Raspberry Pi 4 BSP. | +| Per-`cap_map` TLB-flush batching (loader) | **Still trigger-deferred** | B5 does not enter a loader-produced AS (running gates on B6); unchanged. **Trigger:** the first B5/B6 task that enters a loader-produced AS. | +| Loader cost linear in image size | **Still trigger-deferred** | B5 added no loader caller. **Trigger:** B6's first real `userland/hello` binary. | +| Activation-hook differ-path cost | **Still trigger-deferred** | All demo tasks still share the bootstrap AS; the activation hook still short-circuits. **Trigger:** the first B6+ task that owns and runs in a non-bootstrap AS. | +| `AddressSpaceArena` slot-count (N=8) | **Still trigger-deferred** | Still 2 / 8 slots at boot (bootstrap + the loader's AS). **Trigger:** first multi-task userspace AS pressure (B6). | + +### New B5 forward-flagged items + +- **Boot `SVC` smoke retires with B6.** The ~+2.9 ms boot cost is the EL1-stub smoke; B6 replaces it with a real EL0 task taking the `+0x400` vector. **Trigger:** B6's first EL0 task — at which point a fresh baseline captures the real EL0↔EL1 round-trip cost (and the smoke's stub cost disappears). +- **TCG exception-emulation cost is now a measured boot contributor.** The first synchronous-exception runtime cost in the project; B6's EL0 round-trip + any future preemption/IRQ-dispatch arc will add more exception transitions. **Trigger:** B6 EL0 round-trip; later, the first `arm_deadline` caller (the long-deferred IRQ-dispatch path). +- **Harness resolving floor for small milestones.** Recorded above (B4 forward-flag row 1) — the per-milestone signal and the session noise are now comparable for a small milestone. Not actionable until real hardware. diff --git a/docs/analysis/reviews/performance-optimization-reviews/README.md b/docs/analysis/reviews/performance-optimization-reviews/README.md index 04dec10..ce87e28 100644 --- a/docs/analysis/reviews/performance-optimization-reviews/README.md +++ b/docs/analysis/reviews/performance-optimization-reviews/README.md @@ -28,5 +28,6 @@ A dated file `YYYY-MM-DD-.md` in this folder, following the shape in [` | 2026-05-09 | B2 closure baseline — post-T-016 footprint (`.text +364` / `.rodata +16` / `.bss +17,952` — dominated by `.boot_pt` 16 KiB reservation); first release-build harness band p10/p50/p90 = 4.262 / 4.642 / 6.456 ms; `-d guest_errors` 379 events (all pre-existing PL011 noise) | [2026-05-09-B2-closure.md](2026-05-09-B2-closure.md) | | 2026-05-14 | B3 closure baseline — post-T-017 + T-018 footprint (`.text +1,624` / `.rodata +592` / `.bss +1,872`); release-build harness band p10/p50/p90 = 10.311 / 11.884 / 13.823 ms (+6 to +7 ms vs B2 — pure QEMU TCG translation overhead from new code paths; real-hardware projection sub-5 ms); `-d guest_errors` 526 events (all pre-existing PL011 noise; zero non-PL011) | [2026-05-14-B3-closure.md](2026-05-14-B3-closure.md) | | 2026-05-28 | B4 closure baseline — post-T-019 + master-review remediation footprint (`.text +9,116` / `.rodata +1,024` / `.bss +6,240`; total ~84.0 KiB, +25 % vs B3); release-build harness band p10/p50/p90 = 15.641 / 17.587 / 19.150 ms (+5.33 to +5.70 ms vs B3 — pure QEMU TCG software-MMU overhead from T-019's first post-bootstrap `cap_map` page-table walks + TLB flushes; real-hardware projection ~40 µs); `-d guest_errors` 629 events (all pre-existing PL011 noise; zero fault classes; loader's two real `cap_map` walks fault-clean) | [2026-05-28-B4-closure.md](2026-05-28-B4-closure.md) | +| 2026-05-29 | B5 closure baseline — post-T-020/T-021 footprint (`.text` +1,524 / `.rodata` +296 / `.bss` +2,272; ~88.0 KiB, +4.8 % vs B4); release band p10/p50/p90 = 17.645 / 20.300 / 24.706 ms; **same-host control** proves a real ~+2.9 ms one-time boot `SVC`-smoke cost (not host jitter) — the harness is nearing its resolving floor for small milestones; 712 (release) / 776 (debug) guest-errors (PL011 + 2 expected `SVC`, zero new fault class) | [2026-05-29-B5-closure.md](2026-05-29-B5-closure.md) | > First full hypothesis-driven cycle is now infrastructure-unblocked — T-009 + T-012 lit up `now_ns()` at EL1 and provide the measurement primitive IPC round-trip latency needs. The B1 closure baseline above records the static-only metrics; future hypothesis-driven cycles will add IPC round-trip wall-clock measurement, stack high-water-mark probes, and `TrapFrame` slimming for ack-and-ignore IRQ handlers. diff --git a/docs/analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md b/docs/analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md new file mode 100644 index 0000000..6cc4a61 --- /dev/null +++ b/docs/analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md @@ -0,0 +1,111 @@ +# Security review 2026-05-29 — B5 syscall boundary (T-020 + T-021) + +- **Change:** the B5 syscall-boundary arc on `main` — [T-020 syscall error taxonomy](../../tasks/phase-b/T-020-syscall-error-taxonomy.md) (the `IpcError::InvalidCapability` → `StaleHandle` / `WrongObjectKind` / `MissingRight` split + `Capability` / `CapObject` `Debug` redaction, K3-9) and [T-021 EL0→EL1 SVC dispatch](../../tasks/phase-b/T-021-syscall-dispatch.md) (the `SVC` trap trampoline + panic-free dispatcher + validated copy-from/to-user + the debug-console capability + `SyscallError`), merged together via **PR #34** ([merge `f98e1af`](https://github.com/HodeTech/Tyrne/pull/34)), preceded by [ADR-0030](../../../decisions/0030-syscall-abi.md) (syscall ABI + the K2-5 taxonomy split) and [ADR-0031](../../../decisions/0031-initial-syscall-set.md) (initial syscall set), both `Accepted` 2026-05-29. Period under review: the T-020/T-021 arc (commits `476710b` ADR propose → `d448540`/`2c713c0`/`1a7deab` review-round follow-ups → `f98e1af` merge). +- **Reviewer:** @cemililik (+ Claude Opus 4.8 (1M context) agent acting adversarial across the eight axes the [security-review master plan](master-plan.md) defines). +- **Separation from code review:** T-021 went through **three review-rounds on PR #34** — sourcery-ai + coderabbitai + an explicit **adversarial multi-agent pass** that raised 17 candidate findings (3 false positives; 14 confirmed, all reducing to documented-B6-latent / test-gap / nit — none a live B5 defect) — plus the per-round fixes (overlap-safe copy-user, the const-generic payload guard, the discharge-scope + MD028 + audit-amendment corrections). This artefact is the **standalone security pass** scoped to the B5 syscall boundary, performed with a fresh checklist after the merge. Per the user's sequencing it is conducted **ahead of** the business + performance legs; it is therefore a **change-scoped** security review (it does *not* claim closure semantics and is exempt from the [master-plan §Closure-trio smoke gate](master-plan.md)), but it **can serve as the security leg of the eventual B5 closure trio**, and the verbatim QEMU smoke evidence is included below (axis 4) since the change produced it. +- **Unsafe audit cross-reference:** [UNSAFE-2026-0029](../../../audits/unsafe-log.md#unsafe-2026-0029--svc-sync-trap-trampoline--syscall_entry-register-frame-access) (**new**; the `SVC` sync trap trampoline asm + `syscall_entry`'s `*mut SyscallTrapFrame` reads/writes — full register-file frame, `ESR_EL1.EC` decode, result write-back; distinct from the IRQ path's UNSAFE-2026-0020; security-sensitive → second-reviewer per unsafe-policy §Review.4) and [UNSAFE-2026-0030](../../../audits/unsafe-log.md#unsafe-2026-0030--validated-copy-fromto-user-byte-move-via-coreptrcopy_nonoverlapping) (**new**; the validated copy-from/to-user byte move, with the append-only **disjointness-correction Amendment** carrying commit `2c713c0` — the operation is `core::ptr::copy`, the soundness basis is the user/kernel disjointness invariant). T-020 introduced **no** new `unsafe` (the `IpcError` split + the hand-written redacting `Debug` impls are safe Rust). Entries 0001..0030 (0012 `Removed`) re-verified against the merged source: append-only discipline holds; the only in-PR refinement was finalising 0030's Amendment before merge (SHA + over-claim correction). + +> **Standalone scope note.** This review covers the *change* (the syscall boundary built by T-020 + T-021), not a milestone closure. B5's business retrospective + performance baseline are deferred (the maintainer chose "security review first"). When those land, this artefact is the trio's security leg; the closure-trio smoke gate is then satisfied by the business retro's verbatim trace (the axis-4 evidence here is a preview). + +--- + +## 1. Capability correctness + +Adversarial frame: *can a caller perform a syscall's privileged effect without holding the capability it should require? Does the new syscall surface (or the IpcError split) widen any check, open an authority-leak path, or break a revocation property?* + +- **Every object-naming syscall is capability-gated, before any observable effect ([P1 / P4](../../../standards/architectural-principles.md)).** OK — the v1 set ([ADR-0031](../../../decisions/0031-initial-syscall-set.md)) is uniformly gated: + - `send` (1) / `recv` (2) validate the endpoint capability **inside** [`ipc_send`](../../../../kernel/src/ipc/mod.rs) / `ipc_recv` via `validate_ep_cap` (resolve → `CapObject::Endpoint` kind → `SEND`/`RECV` right), unchanged from the cap-gated IPC primitives the A4/B0 reviews cleared. + - `console_write` (5) validates a **debug-console capability** in [`dispatch::validate_debug_console_cap`](../../../../kernel/src/syscall/dispatch.rs): `table.lookup(cons_cap)` → `cap.kind() == CapKind::DebugConsole` → `cap.rights().contains(CapRights::CONSOLE_WRITE)`. This runs **before** the range check and **before any `Console::write_bytes`** — a failed check returns a typed `SyscallError::Cap(_)` with the console untouched (the [P1/P4] authority gate, present in *all* builds). + - `task_yield` (3) / `task_exit` (4) take **no object-capability argument** — they act only on the caller's own task via the trusted current-task identity, which is the caller's inherent authority over its own execution thread (ADR-0031's "trust-boundary check is *is there a valid current task?*, plus the kernel never letting the caller name a *different* task"). No ambient authority over another object. + - **Adversarial probe — was `console_write` ever ambient authority?** It was, in the *first* ADR-0030/0031 draft (a P1/P4 violation), and was caught by the same-day maintainer review and folded into the ADR bodies **before Accept** (recorded in the 2026-05-29 `current.md` banner). The shipped code gates it on the debug-console cap. Verified by `dispatch::tests::console_write_with_no_cap_returns_cap_invalid_handle_no_output` (+ `…_wrong_kind…`, `…_without_write_right…`), each asserting the console captured **zero** bytes on a failed check. +- **The new capability surface is the narrowest possible addition, with no widening.** OK — `console_write` introduces exactly one new `CapObject::DebugConsole` (a **unit** variant — the debug console is a singleton with no per-instance kernel-object storage, so it carries no handle, the smallest object addition ADR-0031 names) + one new right `CapRights::CONSOLE_WRITE` (`1 << 7`, added to `KNOWN_BITS` so `from_raw` masks it correctly). The check requires *both* (right kind **and** the `CONSOLE_WRITE` right) — the narrowest sufficient authority; a debug-console cap minted with `CapRights::empty()` is correctly rejected (`…_without_write_right…` test). No existing check was widened. +- **`CapHandle::from_raw` (the new ABI-decode constructor) cannot forge authority.** OK — the syscall ABI decodes a userspace-supplied register word into a `CapHandle` via `CapHandle::from_raw(index, generation)` ([`cap/table.rs`](../../../../kernel/src/cap/table.rs)). The reconstructed handle is **not trusted**: every `CapHandle` is validated against the live slot's generation by `CapabilityTable::lookup` before use, so a forged / stale / out-of-range `(index, generation)` simply fails lookup with `CapError::InvalidHandle` → `SyscallError::Cap(InvalidHandle)`; it can never alias a live capability. The capability table remains per-subject and unforgeable ([ADR-0014](../../../decisions/0014-capability-representation.md)); `from_raw` only re-materialises a handle *into the caller's own table*. **Adversarial probe — can the null-handle sentinel collide with a live handle?** No — `encode_cap_handle` packs `(generation: u32) << 16 | (index: u16)` (bits 0..48); the sentinel `NULL_CAP_HANDLE = u64::MAX` sets bits 48..64. A `const _: () = assert!(NULL_CAP_HANDLE > (((u32::MAX as u64) << 16) | (u16::MAX as u64)))` in [`abi.rs`](../../../../kernel/src/syscall/abi.rs) locks this at build time (added in the review-round), so a future `CapHandle` widening that could collide fails the build. +- **Capability transfer through `send` is move-only and authority-preserving.** OK — `send`'s `x5` carries an optional transfer handle (null sentinel = none). It is moved out of the sender's table via `ipc_send`'s `cap_take` (a `lookup`-then-`cap_take` ordering that enforces the `TRANSFER` right *before* the irreversible move) and installed into the receiver's table via `insert_root` on the matching `recv`, surfacing in `x6`. The dispatch-level round-trip is pinned by `dispatch::tests::send_with_transfer_cap_then_recv_returns_cap_in_x6` (added in review-round 1; **mutation-verified non-vacuous** — breaking the x6 pack makes it fail). A stale transfer handle returns `Ipc(InvalidTransferCap)` (`send_with_stale_transfer_handle_returns_invalid_transfer_cap`). No accidental cloning; no authority the sender did not hold. +- **The K2-5 `IpcError` split (T-020) is a granularity *increase*, not an authority change, and is safe to reveal.** OK — `InvalidCapability` → `StaleHandle` / `WrongObjectKind` / `MissingRight`, validation order resolve → type-check → authority-check (mirroring `CapError`'s `InvalidHandle`/`WrongKind`/`InsufficientRights`). The three variants reveal only *which check failed* on the caller's **own** per-subject unforgeable handle — no enumeration of another subject's caps, no forgery aid, no confused-deputy lever ([ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md)). `SyscallError` composes `CapError`/`IpcError` via `From` (no flattening), so the userspace error space stays in agreement with the in-kernel one. The order is observable only for a cap that is *both* wrong-kind and missing-right; the dedicated `send/recv/notify` `WrongObjectKind` tests (T-020) use exactly such a cap to pin kind-before-rights. +- **`task_exit`/`task_yield` confer no authority over another task; the B5 stand-in does not over-grant.** OK — `dispatch` returns a `SyscallEffect::Terminate(code)` / `Reschedule` *directive*; it never names or mutates another task. The BSP B5 stand-in (`syscall_entry`) writes `Ok` and returns (no real EL0 task to terminate/reschedule yet) — it grants nothing. Real termination/yield (and the authority question of *which* task) lands in B6 with a real EL0 scheduler task. + +## 2. Trust boundaries + +Adversarial frame: *this is the milestone that builds the userspace → kernel boundary. What untrusted input crosses it, and is every byte/word validated before use? What does the +0x200 proxy prove vs. defer?* + +- **B5 lands the boundary *mechanism*; the real EL0 privilege transition is deferred to B6 — explicitly, and smoke-bounded.** **Flagged (forward; non-blocking; the central scope statement).** The shared `tyrne_sync_trampoline` is installed at **both** `VBAR_EL1+0x200` (current-EL) and `+0x400` (lower-EL AArch64). In B5 the only `SVC` comes from an **EL1 kernel-stub** (`syscall_boundary_smoke`), which — executing at the *current* EL — takes the `+0x200` vector, **not** the lower-EL `+0x400` vector (an `SVC` issued at EL1 cannot take the lower-EL slot). So B5 runtime-proves the *shared* save → decode → dispatch → `ERET` mechanism and the dispatcher logic; it does **not** prove the `+0x400` vector entry, the EL0↔EL1 privilege transition, or copy-user against a *separate* userspace `TTBR0_EL1` AS. Those are **B6's** runtime verification ([ADR-0030 §Simulation row-to-verification mapping](../../../decisions/0030-syscall-abi.md#simulation)). The `+0x400` handler is installed now so B6 adds only the EL0 task, not new trap plumbing. **The three B6 carry-forward gates are tracked in [phase-b.md §B6 "T-021 carry-forward gates"](../../../roadmap/phases/phase-b.md#milestone-b6--first-userspace-hello)** (see Verdict). +- **Every untrusted register word crossing the boundary is decoded into a typed structure before use — no raw trust.** OK — `syscall_entry` reads `x8` + `x0`–`x5` from the saved `SyscallTrapFrame` and hands them to `dispatch` as `SyscallArgs`. The number is decoded by `SyscallNumber::decode` (a total `match`; `0` reserved-invalid, out-of-range, and the debug-gated `5`-in-release all → `None` → `BadSyscallNumber`). Argument words become typed values: `Message { label, params[3] }` from `x1`–`x4`; `CapHandle` from `x0` (and the optional transfer handle from `x5`) via the validated `from_raw` + `lookup`. There is **no structured-metadata parser** in the syscall path — the widest input is a four-word `Message` carried in registers, not a pointer-buffer (ADR-0031's deliberate "register-passing keeps the common path copy-free" choice), so there is no attacker-controllable parse surface in v1. +- **The one user pointer (`console_write`'s buffer) is range-validated against the active address space before the kernel touches a byte.** OK — `console_write` validates the **whole** `[ptr, ptr+len)` range up front via `UserAccessWindow::validate` (containment in the active-AS window + wrap rejection via `checked_add` + zero-length short-circuit) **before** any output, so a faulting buffer emits *nothing* (no partial output). It then copies + emits in bounded 256-byte chunks through a kernel stack buffer, each chunk re-validated by `copy_from_user`. The raw user pointer is **never** dereferenced before both the capability check and the range check succeed (`dispatch::tests::console_write_out_of_range_buffer_faults_without_output` asserts a faulting buffer with a *valid* cap emits zero bytes and returns `FaultAddress`). Buffer length (`x2`) is range-checked by the same `validate`. +- **The kernel never dereferences a raw userspace pointer outside a validated mapping.** OK — the only raw-pointer dereference in the syscall data path is the single `core::ptr::copy` inside `copy_from_user`/`copy_to_user`, gated by `validate`. Range validity is proven first; the v1 dereference relies on the bootstrap AS's identity map ([ADR-0027](../../../decisions/0027-kernel-virtual-memory-layout.md) §Decision outcome (a)), the same identity-mapping dependence every kernel-resident raw-pointer site shares (UNSAFE-2026-0025/0026/0027). The B6 forward path replaces the int-to-pointer deref with a per-page user-VA → kernel-VA translation and a per-task window — without changing the `copy_*` call-site contract (module docs + the UNSAFE-2026-0030 Amendment). +- **The soundness of the copy hinges on user/kernel *disjointness*, which holds structurally — and is the load-bearing invariant, not the copy primitive.** OK (with the deep finding recorded) — an *overlapping* `(user_ptr, kernel-slice)` pair is **UB regardless of `core::ptr::copy` vs `copy_nonoverlapping`**, because `copy_from_user`'s `dst: &mut [u8]` (resp. `copy_to_user`'s `src: &[u8]`) parameter is exclusive/shared, so an aliasing access through the exposed `user_ptr` violates that borrow. This was **verified empirically under Miri** during the review (`error: Undefined Behavior: not granting access to tag … strongly protected`). The actual soundness basis is therefore the **user/kernel disjointness invariant**: `user_ptr` names userspace memory while the kernel slice is a distinct allocation (v1 — `console_write`'s fresh 256-byte stack buffer) / a separate address space (B6); `validate` proves *bounds*, the address-space split proves *disjointness*. `core::ptr::copy` is retained as the conservative primitive; UNSAFE-2026-0030's Amendment records exactly this (an earlier over-claim that "`copy` makes aliasing safe" was corrected). All v1 callers are disjoint by construction. **Adversarial probe — can a safe caller construct an overlapping call?** Yes, in principle (`copy_from_user`/`copy_to_user` are `pub fn`s), but the *only* in-tree caller is `console_write` with a fresh kernel stack buffer; there is no external consumer (the BSP calls `dispatch`, not the `copy_*` helpers). Defence: documented disjointness invariant + Miri catches an overlapping call (so a future regression that introduced one would fail the gate). Re-flagged as the B6 per-task-window gate. +- **Cross-task IPC grants no authority the sender did not hold.** OK — unchanged from T-020/the IPC primitives: `ipc_send` enforces `TRANSFER` via a non-mutating `lookup` *before* the irreversible `cap_take`; `ipc_recv` pre-checks `ReceiverTableFull` before the state-moving `replace`. The syscall layer adds only the EL0 boundary, not new IPC authority semantics. + +## 3. Memory safety + +Adversarial frame: *can a hostile register value, or an unexpected control-flow sequence, cause either new `unsafe` region (the trampoline asm or the copy-user move) to violate its invariants? Are the two new entries policy-conformant?* + +- **UNSAFE-2026-0029 (SVC trampoline asm + `syscall_entry` frame access) is policy-conformant; the asm/`repr(C)` agreement is compile-time-guarded.** OK — the entry covers the hand-written `tyrne_sync_trampoline` ([`vectors.s`](../../../../bsp-qemu-virt/src/vectors.s)) saving the **full** `x0`–`x30` + `SP_EL0` + `ELR_EL1` + `SPSR_EL1` (272-byte frame), the `ESR_EL1.EC == SVC64 (0x15)` route, and `syscall_entry`'s reads/writes of the `*mut SyscallTrapFrame`. Enumerated invariants, each held: (a) **asm/Rust frame-layout agreement** — the `stp` offsets mirror `SyscallTrapFrame`'s `#[repr(C)]` field order, and a `const _: () = assert!(size_of::() == 272)` fails the build on drift (the same discipline as the 192-byte IRQ frame); (b) **frame-pointer validity** — `sp` constructed by the `stp` sequence immediately before the `bl`, exclusively owned for the call; (c) **no re-entrancy** — `SVC` is synchronous, exception entry masks `DAIF`, and the handler does not re-enable interrupts, so no peer mutates the frame or borrowed statics mid-handler (single-core cooperative); (d) **result-write touches only `x0`–`x7`** — `x8`–`x30` + `SP_EL0` + `ELR_EL1` + `SPSR_EL1` are restored to their trapped values, so the caller's preserved registers + return PSTATE/PC survive exactly; (e) **statics initialised before first `SVC`** (the smoke runs after the IPC statics are published, before `start()`). **Adversarial probe — I hand-mapped every `stp`/`ldp`:** `x0/x1@0x00 … x28/x29@0xE0, x30/SP_EL0@0xF0, ELR_EL1/SPSR_EL1@0x100`; `x0/x1` saved first (before `mrs` clobbers them) and restored **last** (they are scratch for the system-register restores); `sub sp,#272` keeps 16-byte alignment (272 = 17×16); `eret` on every SVC path; the non-SVC branch → `panic_entry`. Security-sensitive (the EL0→EL1 trust boundary asm) → second-reviewer per unsafe-policy §Review.4, satisfied. Smoke-verified (axis 4). +- **UNSAFE-2026-0030 (validated copy-from/to-user byte move) is policy-conformant; the disjointness basis is recorded honestly after an empirical correction.** OK — the entry covers the single `core::ptr::copy(src, dst, len)` in each of `copy_from_user` (read-from-user) and `copy_to_user` (write-to-user). Invariants: range validity (proven by `validate` *before* the `unsafe` block — never derefs on the failure paths), the v1 identity map, no single-core interleaving, and — the load-bearing one — **disjointness** of the userspace range and the kernel slice (axis 2). The Amendment (commit `2c713c0`) supersedes the original `copy_nonoverlapping` wording, marks the rejected-alternatives bullet that named it as historical, and corrects the over-claim — the merged record states `core::ptr::copy` is the conservative primitive and disjointness (not the primitive) is the soundness basis. **Adversarial probe — is there a reachable UB on the failure paths?** No: `validate` returns `FaultAddress` (out-of-range / wrap) or `Ok` (zero-length short-circuit, no deref) *before* the `unsafe` block; the dereference is reached only on a validated, non-empty, in-window range. Host-tested (in-range / out-of-range / overrun / zero-length / wrap) + Miri-clean under permissive provenance (the exposed-provenance int-to-pointer pattern matches the established `pmm.rs`/`task_loader.rs` test discipline). +- **The dispatcher's value-packing has no reachable panic / out-of-bounds, and the one indexing site is compile-time-bounded.** OK — `SyscallReturn::with_payload::` is a **const-generic** with a `const { assert!(IDX < 7) }`, so an out-of-range payload index is a **compile error at the call site**, not a runtime panic (the review-round hardening — closes the unchecked-index nit). `SyscallNumber::decode` is a total match; `validate` uses `checked_add` (no overflow); the `console_write` chunk loop uses `checked_add` + `min` + `buf[..chunk]` with `chunk ≤ 256` (no OOB). All `panic!`/`unwrap`/`expect` in the syscall module are confined to `#[cfg(test)]` modules (verified by grep against the merged source). +- **No uninitialized memory exposed; no use-after-free; no aliasing violation.** OK — the `SyscallTrapFrame` writes every one of its 34 register slots in the trampoline (unlike the IRQ frame's deliberately-uninitialised `_reserved`, the syscall frame has no padding slot), so no stale stack bytes are exposed; the `console_write` chunk buffer is a fresh `[0u8; 256]` local. The momentary `&mut`/`&` borrows of the BSP statics in `syscall_entry` are scoped to the single `dispatch` call and do not cross a context switch (the data-plane syscalls do not switch; the control-plane ones return a directive *before* any switch — the [ADR-0021](../../../decisions/0021-raw-pointer-scheduler-ipc-bridge.md) discipline). **`cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` is clean (0 UB)** over the full 240-kernel + 53-test-hal + 43-hal suite — the authoritative verifier of the copy-user `unsafe` + the exposed-provenance pattern. + +## 4. Kernel-mode discipline + +Adversarial frame: *can any register-supplied input stall, panic, or deadlock the kernel through the syscall path? Is the dispatcher panic-free as B0's hardening pattern requires?* + +- **The dispatcher is panic-free on every untrusted input — the central T-021 property, independently verified.** OK — the hard rule (ADR-0030 / B0 hardening): no path may `panic!`/`unwrap`/`expect`/overflow on any register-supplied value. Traced end-to-end: a bad number (incl. `0` / out-of-range / debug-gated `5`-in-release) → `BadSyscallNumber`; a missing/stale/wrong-kind capability → typed `Cap(_)`/`Ipc(_)`; an out-of-bounds or wrapping pointer → `FaultAddress`; all as **values**, never traps. The provably-panic-capable sites — `SyscallReturn::with_payload` (now const-generic, OOB = compile error) and `ipc_send`'s `unreachable!()` (a *temporal* invariant safe under single-core masked-IRQ cooperative semantics — there is no yield between the peek and the commit) — are unreachable from untrusted input. The adversarial multi-agent pass + first-hand tracing reached the same conclusion: **no register-supplied input crashes the kernel.** +- **No allocation, no unbounded loop, minimal critical section in the syscall path.** OK — the dispatcher allocates nothing (no heap; the 256-byte chunk buffer is a fixed stack local). The only loop is `console_write`'s chunking, bounded by `len`, which is itself bounded by the validated window. The `SVC` handler runs with `DAIF` masked (exception entry) as a bounded save → decode → dispatch → restore → `ERET`; it is not an ISR and does not re-enable interrupts. No new kernel panic on the hot path (`cargo kernel-clippy -D warnings`, enforcing `#![deny(clippy::panic, clippy::unwrap_used, clippy::expect_used, clippy::arithmetic_side_effects)]`, is clean at HEAD). +- **The control-plane stand-ins (`task_yield`/`task_exit`) cannot hang the boot.** OK — `dispatch` returns a directive; the B5 BSP glue writes `Ok` (yield) / `Ok` defensively (exit — "does not return" is the ABI contract, but there is no EL0 context to drop in v1, so the stand-in resumes rather than wedging the boot if a stray stub `task_exit` ever fired). The dispatcher routing (number → directive) is host-tested; real yield/termination (and their scheduler interaction) is B6. +- **QEMU smoke (the runtime evidence; closure-trio gate previewed).** OK — the debug-build smoke (the change produced it) shows the EL1 kernel-stub issuing two `SVC #0`s, with the trace gaining exactly two lines after `timer ready` and before `starting cooperative scheduler`: + - `tyrne: hello from the syscall boundary (console_write via SVC)` — emitted **by** the `console_write` syscall (proving trap → save → cap-check → `copy_from_user` → console output → encode → `ERET`). + - `tyrne: syscall smoke ok (console_write status=0x0, bytes=63; bad-number status=0x1)` — the round-trip status (`console_write` Ok, 63 bytes written; the reserved-invalid number returned `BadSyscallNumber`). + + `-d int,unimp,guest_errors` shows **exactly two `SVC` exceptions** taken at the current-EL vector — `Taking exception 2 [SVC] … from EL1 to EL1 … with ESR 0x15/0x56000000` (EC = SVC64, the exact value the trampoline routes on), each followed by a clean `Exception return … EL1 to EL1` — plus only the pre-existing PL011-disabled-UART noise (**no** new Translation/Permission/unimp fault class). The full cooperative demo still runs to `tyrne: all tasks complete`. This is the verbatim trace the eventual closure-trio business retro will carry; recorded here as the axis-4 evidence. + +## 5. Cryptography + +Not applicable — no cryptographic primitives, no RNG, no key handling introduced or touched. The syscall set is `send`/`recv`/`console_write`/`task_yield`/`task_exit`; none performs hashing, encryption, signing, key derivation, or randomness. Unchanged since the B4 closure's N/A finding; the model's ADR-per-primitive + separate-security-review + redacted-key-types gates remain un-triggered. + +## 6. Secrets and logging + +Adversarial frame: *can the syscall surface (or the error taxonomy) leak kernel-internal identity through a diagnostic / userspace-reachable channel?* + +- **T-020's `Capability` / `CapObject` `Debug` redaction (K3-9) is the headline secrets-handling win — and it lands exactly where B5 needs it.** OK — the prior B4 review forward-flagged (X1-F5) that `Capability`'s `Debug` + the B5 ABI layer needed reconciling before a userspace-reachable log path existed. T-020 closes it: `Capability`'s hand-written `Debug` prints `rights` (authority bits — not unforgeable) but redacts the named object as `object: ` (via `format_args!`, so no surrounding quotes); `CapObject`'s hand-written `Debug` prints the *kind* (benign diagnostic) but redacts the wrapped typed handle (slot index + generation). Both are pinned by `cap::tests::debug_redacts_named_object_but_keeps_rights` + `capobject_debug_redacts_handle_but_shows_kind` (which assert the index `0xAB = 171` does **not** appear). So a `Capability` or `CapObject` formatted into the now-existing `console_write` userspace-reachable log path cannot leak kernel-internal object identity. **Adversarial probe — is the wrapped handle's *inner* `Debug` still derived?** Yes — `TaskHandle`/`EndpointHandle`/etc. keep their derived `Debug` for kernel-internal traces (scheduler, arena) where the slot/generation is the useful information and never crosses to userspace; the redaction is applied at `CapObject` (the type a capability carries toward a log boundary), which is the correct cut point. +- **The `SyscallError` status encoding leaks no secret — and the taxonomy split's "which check failed" disclosure is the explicit, justified trade.** OK — the stable status word is an error *code* (`0`=Ok; `1`–`3` top-level; `0x10x` Cap; `0x20x` Ipc). It carries no capability bits, no handle, no object contents. The `IpcError`/`CapError` split reveals *which* validation step failed (stale vs wrong-kind vs missing-right), which ADR-0030 §"Security of the taxonomy split" accepts explicitly: the facts revealed are about the caller's **own** per-subject unforgeable handles and aid no forgery / enumeration / confused-deputy attack. The `SyscallError` derived `Debug` shows the variant + composed inner error — no secret. +- **No trapped register state leaks via `Debug` or logging.** OK — the BSP `SyscallTrapFrame` has **private fields and no derived `Debug`** (a deliberate choice — it holds a full register snapshot; keeping it un-`pub` + un-`Debug` avoids exposing or accidentally logging trapped register contents). The `console_write` smoke banner emits status codes + a byte count (no secret); the emitted bytes are the user buffer itself (a `.rodata` greeting in the smoke). No `panic!`/log site in the syscall path embeds capability state, table contents, or frame contents. + +## 7. Dependencies + +- **Workspace remains zero-extern.** OK — T-020 + T-021 add no crate. The `CapRights` bitfield stays hand-rolled (the new `CONSOLE_WRITE` bit is one more `const`, not a `bitflags` dependency); the syscall module is pure `core` (`core::ptr::copy`, `core::cmp::min`, `core::arch::asm!`). `Cargo.lock` is unchanged (four in-tree workspace crates, no `source`/`checksum`). ADR-0006's zero-extern stance — the strongest supply-chain position — is preserved. No `add-dependency` invocation; `cargo-vet`/`cargo-audit` remain dormant-because-no-ops. + +## 8. Threat-model impact + +Adversarial frame: *does the syscall boundary reshape what the system defends against? Are all gaps reconciled with [`security-model.md`](../../../architecture/security-model.md) and honestly documented?* + +- **This is the milestone that *builds* the userspace → kernel boundary (model boundary 1) — but the EL0 transition it guards opens in B6.** OK (with the boundary forward-built). The change lands the boundary *machinery*: a panic-free, capability-gated, validated-copy-user dispatcher reachable through `SVC`. In v1 the only caller is the trusted EL1 kernel-stub (the `+0x200` proxy); the *untrusted* EL0 caller (model adversary #1, malicious/compromised userspace) does not exist until B6. So the boundary code is forward-built and smoke-proven via the proxy; the first attacker-observable crossing is B6 (the `+0x400` vector + the EL0↔EL1 transition + copy-user against a separate userspace `TTBR0_EL1`). +- **The panic-free dispatcher + capability gating + redacted `Debug` are the substrate the B6 EL0 threat model rests on.** OK — when B6's untrusted EL0 task arrives, the properties this change establishes are exactly what defends against it: no register-supplied input crashes the kernel; every object-naming syscall is capability-gated; the kernel never derefs an unvalidated user pointer; a logged capability cannot leak kernel-internal identity. The K2-5 granular `IpcError` gives a userspace binding a handleable error space from the start. +- **Reconciliation against security-model.md §Invariants** — each re-confirmed for the B5 surface: + - "No privileged operation without the authorizing capability" — OK (§1: `console_write` debug-console-cap-gated; `send`/`recv` endpoint-cap-gated; `task_yield`/`task_exit` act only on the caller's own task). + - "No ambient authority" — OK (§1; the `console_write` ambient-authority slip in the first ADR draft was caught and fixed before Accept). + - "Capabilities unforgeable / per-subject" — OK (§1; `from_raw` re-materialises into the caller's own table and is validated by `lookup`; the null sentinel cannot collide). + - "The kernel never dereferences raw userspace pointers" — OK (§2: `copy_*` validate before deref; disjointness invariant). + - "`unsafe` is audited" — OK (§3: UNSAFE-2026-0029/0030 land under the Operation/Invariants/Rejected-alternatives shape; 0029 second-reviewer-signed). + - "Bounded kernel state / no unbounded allocation / panic-free on untrusted input" — OK (§4: typed `SyscallError` on every path; bounded chunk loop; no heap). + - "Fault containment does not leak authority" — **partially exercised; the dispatcher is panic-free, but full fault containment is Phase E.** A *dispatcher* failure returns a typed error (done). But a crashing EL0 task's fault (illegal instruction, unmapped deref) still routes to `panic_entry` → halt — the supervisor-endpoint `TaskFault` delivery is Phase E / flag K3-4 (recorded in [phase-b.md §B5 flags](../../../roadmap/phases/phase-b.md#flags-to-resolve-during-b5)). Non-blocking for B5 (no EL0 task to crash yet); confirmed-deferred per the B5 flag. +- **Phase-deferred placeholders unchanged.** ADR-0033 (high-half migration — gates the per-task `TTBR0_EL1` swap + kernel mappings in the userspace AS, the prerequisite for ever running a real EL0 task; the slot is reserved in [ADR-0027 §Decision outcome](../../../decisions/0027-kernel-virtual-memory-layout.md), no file yet) and ADR-0034 (kernel-image / per-section permissions — gates the first attacker-observable EL0 execution) remain slot-reserved. +- **`security-model.md` SMMUv3-CI-gate staleness (post-ADR-0036) — reconciled in this pass.** The B4 closure forward-flagged that [`security-model.md`](../../../architecture/security-model.md) §threat-model #7 + §Open questions still described QEMU `virt` as "launched with SMMUv3 and used in CI," contradicting [ADR-0036](../../../decisions/0036-qemu-virt-gicv2-no-iommu-v1.md) (QEMU `virt` is GICv2 / no-IOMMU in v1). This was N/A as a v1 *defect* (no bus-master driver exists, so the DMA boundary is inactive) but a live doc-contradiction with an Accepted ADR. **Closed here:** both sentences now state the v1 GICv2/no-IOMMU reality, point at ADR-0036, and reframe the SMMU-in-CI gate as a future IOMMU-equipped-target (Jetson Orin) item — preserving the model's IOMMU intent, correcting only the stale QEMU-`virt`-has-SMMUv3-in-CI claim. (This is the one finding in this review actionable outside a phase gate; the rest are correctly B6 / Phase-E deferred.) + +--- + +## Verdict + +**Approve.** + +All eight applicable axes pass (cryptography N/A). The B5 syscall boundary (T-020 + T-021 / PR #34) builds the most security-sensitive surface in the system — the EL0→EL1 trap/dispatch path — and does so with the properties the boundary exists to provide: a **panic-free dispatcher** (no register-supplied input crashes the kernel — independently verified by tracing + an adversarial multi-agent pass), **uniform capability gating** of every object-naming syscall (the `console_write` debug-console cap closes the ambient-authority slip caught before ADR Accept; `send`/`recv` gate inside the IPC primitives; `task_yield`/`task_exit` act only on the caller's own task), **validated copy-from/to-user** that never dereferences an unvalidated user pointer (with the soundness basis honestly recorded as the user/kernel disjointness invariant after an empirical Miri probe disproved an earlier "overlap-tolerant" over-claim), and **T-020's `Capability`/`CapObject` `Debug` redaction** that closes the K3-9 secrets-leak path exactly where B5 first creates a userspace-reachable log channel. The two new `unsafe` entries are policy-conformant — UNSAFE-2026-0029 (trampoline asm) carries a hand-verified asm/`repr(C)` agreement with a compile-time size guard and the required second-reviewer sign-off; UNSAFE-2026-0030 (copy-user) records the disjointness basis via an append-only Amendment. `cargo fmt`/`host-clippy`/`kernel-clippy`/`kernel-build` are clean; **host-test 240**; `cargo test --release` green (the debug-gate release path); **`cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` clean (0 UB)**; the QEMU smoke proves the `+0x200` proxy round-trip with no new fault class and the demo intact. + +Crucially, **B5 builds the boundary mechanism but does not open the real EL0 transition** — the `+0x400` vector is installed and smoke-bounded via the EL1-stub `+0x200` proxy, but the EL0↔EL1 privilege transition and copy-user against a separate userspace `TTBR0_EL1` are B6's runtime verification. No new attacker-observable surface is reachable in v1. + +### Forward-flagged items (carry-forward; non-blocking) + +- **B6 carry-forward gates (must close before a real EL0 task runs)** — tracked in [phase-b.md §B6 "T-021 carry-forward gates"](../../../roadmap/phases/phase-b.md#milestone-b6--first-userspace-hello): **(1)** `console_write`'s user window must become **per-task** (derived from the EL0 task's mapped region, not the whole RAM extent) and the int-to-pointer deref must become a per-page user-VA → kernel-VA translation, returning `FaultAddress` (never panic) on failure — *the single most important gate*: an EL0 holder of a debug-console cap could otherwise read arbitrary kernel memory; **(2)** `SP_EL1` must be initialised for the `+0x400` entry in the EL0 context-init; **(3)** the `SYSCALL_STUB_TABLE` must be swapped for the scheduler's current-task capability table (fail-closed if forgotten). +- **Fault containment (K3-4 / Phase E)** — the dispatcher is panic-free, but a crashing EL0 task's *non-`SVC`* fault still halts (no supervisor endpoint). Confirmed-deferred per the B5 flag; opens with the first real driver task. +- **`ipc_send`'s `unreachable!()` under preemption** — a temporal invariant, safe under single-core masked-IRQ cooperative semantics; becomes a release panic-from-userspace only under future preemption/SMP. Harden to `Err(QueueFull)` when preemption lands (ADR-0032 / note C3-009). +- **`cap_map`/`cap_unmap` per-operation rights gap** — carried unchanged from B3/B4; the kind-only check is the v1 design (no `MAP`/`UNMAP` bit exists). Trigger: the B5+ ADR pairing `CapRights::{MAP, UNMAP, ACTIVATE}` with `CapKind::MemoryRegion`. +- **`security-model.md` SMMUv3-CI-gate staleness (post-ADR-0036) — RECONCILED in this pass** (no longer carried forward): the two stale "QEMU `virt` has SMMUv3 / is the CI gate" claims (§threat-model #7, §Open questions) now state the v1 GICv2/no-IOMMU reality and point at ADR-0036. See §8. + +This standalone security pass can serve as the **security leg of the B5 closure trio**; the business retrospective + performance baseline are the remaining legs (the maintainer chose "security review first"). diff --git a/docs/analysis/reviews/security-reviews/README.md b/docs/analysis/reviews/security-reviews/README.md index 17fd3f4..5063d6b 100644 --- a/docs/analysis/reviews/security-reviews/README.md +++ b/docs/analysis/reviews/security-reviews/README.md @@ -38,3 +38,4 @@ A security review is a **separate pass** from the code review — it is performe | 2026-05-09 | B2 closure consolidated pass (T-016 MMU activation + identity-mapped kernel + `MapperFlush` flush-token discipline; ADR-0027 + ADR-0009 §Revision rider; UNSAFE-2026-0022/0023/0024/0025 introduced + 0023/0024 bootstrap-Amendments + 0022/0023/0024 smoke-verification Amendments) | Approve — eight axes pass; MMU on with identity-only layout; one new forward-flagged item (UNSAFE-2026-0025 per-call `Mmu::map`/`unmap` smoke verification — gates on first B3+ post-bootstrap caller) | [2026-05-09-B2-closure.md](2026-05-09-B2-closure.md) | | 2026-05-14 | B3 closure consolidated pass (T-017 PMM + T-018 `AddressSpace` kernel object + cap-gated wrappers + activation-on-context-switch; ADR-0035 + ADR-0028; UNSAFE-2026-0026 introduced; UNSAFE-2026-0014 5th Amendment for activation hook + BSP closure; UNSAFE-2026-0025 body-correction Amendment for `MmuError::BlockMapped` variant split) | Approve — eight axes pass; AS kernel-object scaffold lands without widening attack surface; PR #28 five-round arc closed three load-bearing memory-safety items pre-merge; one forward-flagged item (`cap_map`/`cap_unmap` per-op rights gap — deferred to B5+ ADR pairing `CapRights::{MAP,UNMAP,ACTIVATE}` with `CapKind::MemoryRegion`) | [2026-05-14-B3-closure.md](2026-05-14-B3-closure.md) | | 2026-05-28 | B4 closure consolidated pass (T-019 task loader + ADR-0029; master-review PR #32 remediation; UNSAFE-2026-0027 introduced + 4 boundary-hardening Amendments; UNSAFE-2026-0028 introduced via MR-011 / X3-001 audit-trail completion; UNSAFE-2026-0025/0026 `Pending QEMU smoke verification` notes lifted — T-019 first runtime exerciser; MR-005 d8–d15 `ContextSwitch` contract + ADR-0020 rider) | Approve — eight axes pass; capability-gated `load_image` produces a `LoadedImage` but mints no runnable task / opens no EL0 boundary (B5/B6); reconciles with master-review security PASS + audit-log-in-sync; MR-009 (Miri-as-Phase-B-exit-prerequisite) closed in-branch — Miri is a blocking CI job and is now written into the phase-b.md exit bar, so all 24 findings are resolved | [2026-05-28-B4-closure.md](2026-05-28-B4-closure.md) | +| 2026-05-29 | B5 syscall boundary (T-020 `IpcError` split + `Capability`/`CapObject` `Debug` redaction K3-9; T-021 EL0→EL1 SVC dispatch — trampoline + panic-free dispatcher + validated copy-from/to-user + debug-console cap + `SyscallError`; ADR-0030/0031; PR #34 merge `f98e1af`; UNSAFE-2026-0029 trampoline asm + UNSAFE-2026-0030 copy-user introduced) | Approve — eight axes pass; panic-free, uniformly capability-gated dispatcher; copy-user never derefs an unvalidated user pointer (soundness basis = user/kernel disjointness, Miri-confirmed); K3-9 redaction closes the secrets-leak path; both new `unsafe` entries policy-conformant (0029 second-reviewer-signed). B5 builds the boundary *mechanism* but the real EL0 `+0x400` transition is B6; three B6 carry-forward gates tracked in phase-b.md §B6. Standalone pass — serves as the security leg of the eventual B5 closure trio | [2026-05-29-B5-syscall-boundary.md](2026-05-29-B5-syscall-boundary.md) | diff --git a/docs/architecture/boot.md b/docs/architecture/boot.md index 340ffe2..e43f618 100644 --- a/docs/architecture/boot.md +++ b/docs/architecture/boot.md @@ -190,7 +190,7 @@ Properties the boot flow maintains. These are the claims a reader can rely on an - **EL3 → EL2 → EL1 chain.** v1 hardware targets do not boot at EL3; if a future BSP requires it, a follow-up task adds the EL3→EL2 transition on top of the existing EL2→EL1 logic per ADR-0024 §Open questions. - **DTB parsing and `BootInfo`.** The kernel's typed boot-info contract, probably introduced with Pi 4 support. - **Multi-core start.** PSCI `CPU_ON` for secondary cores. -- **High-half kernel migration.** v1 maps the kernel identity-only via `TTBR0_EL1`; the future ADR-0033 placeholder (per [ADR-0027 §Decision outcome (a)](../decisions/0027-kernel-virtual-memory-layout.md)) introduces the high-half mapping when B5 surfaces the per-task `TTBR0_EL1` swap. +- **High-half kernel migration.** v1 maps the kernel identity-only via `TTBR0_EL1`; the future ADR-0033 placeholder (per [ADR-0027 §Decision outcome (a)](../decisions/0027-kernel-virtual-memory-layout.md)) introduces the high-half mapping when B6 surfaces the per-task `TTBR0_EL1` swap (B5 closed without it). - **Guard-page stacks.** With the MMU now active (T-016), guard-page stacks become reachable — pending a follow-on B-phase task that remaps a stack region's bottom page as invalid. - **Measured boot / attestation.** Hardware-dependent; deferred per [ADR-0012](../decisions/0012-boot-flow-qemu-virt.md). diff --git a/docs/architecture/memory-management.md b/docs/architecture/memory-management.md index 66a5105..4068326 100644 --- a/docs/architecture/memory-management.md +++ b/docs/architecture/memory-management.md @@ -79,7 +79,7 @@ v1's `TCR_EL1` value commits to the layout shape: | `IPS` | bits 34:32 | 0b010 | 40-bit Intermediate Physical Address — matches QEMU virt + Cortex-A72 | | `AS` | bit 36 | 0 | 8-bit ASID field; v1 uses ASID=0 globally | -The ADR-0033 placeholder (the future high-half ADR — slot reserved in [ADR-0027 §Decision outcome (a)](../decisions/0027-kernel-virtual-memory-layout.md), not yet a real ADR file) flips `EPD1=1 → 0` and populates `TTBR1_EL1` when B5 needs per-task `TTBR0_EL1` swap; the rest of `TCR_EL1` stays byte-stable across that transition because the v1 settings already commit to the high-half-friendly shape. +The ADR-0033 placeholder (the future high-half ADR — slot reserved in [ADR-0027 §Decision outcome (a)](../decisions/0027-kernel-virtual-memory-layout.md), not yet a real ADR file) flips `EPD1=1 → 0` and populates `TTBR1_EL1` when B6 needs per-task `TTBR0_EL1` swap (B5 closed without it); the rest of `TCR_EL1` stays byte-stable across that transition because the v1 settings already commit to the high-half-friendly shape. ### Page-table entry encoding (block descriptor at L2) diff --git a/docs/architecture/security-model.md b/docs/architecture/security-model.md index a68cfbb..40f7623 100644 --- a/docs/architecture/security-model.md +++ b/docs/architecture/security-model.md @@ -57,7 +57,7 @@ These are *out of scope* in the current model. Being explicit means that when a - **Social engineering of contributors or operators.** Out of scope — human process, not system property. - **Rowhammer and other DRAM-level attacks.** Out of scope pending targeted mitigations. - **Denial of service through resource exhaustion at the deployment level.** A bad actor flooding a network port or saturating CPU from outside is a deployment concern (firewall, rate-limit upstream). The kernel's *internal* bounds against local resource exhaustion are a different matter and are in scope — see *Bounded kernel resources* below. -- **Peripheral DMA on boards without an IOMMU.** Raspberry Pi 4 has no SMMU: any bus-master device that the kernel has enabled can, in principle, read or write arbitrary DRAM. QEMU `virt` can be launched with SMMUv3 and is used in CI to catch driver-side misbehaviour against SMMU semantics. Jetson Orin has an SMMU and is in scope once that port lands. Until an ADR brings a no-IOMMU board into the model with explicit mitigations (physical-contract trust, driver constraint, device disablement), such boards trust their bus masters implicitly and release notes record this per target. +- **Peripheral DMA on boards without an IOMMU.** Raspberry Pi 4 has no SMMU: any bus-master device that the kernel has enabled can, in principle, read or write arbitrary DRAM. Tyrne's v1 QEMU `virt` target is **GICv2 with no IOMMU** (per [ADR-0036](../decisions/0036-qemu-virt-gicv2-no-iommu-v1.md)) — QEMU `virt` exposes an SMMUv3 only when explicitly launched with `iommu=smmuv3`, which Tyrne does not do, so there is **no** SMMU-in-CI gate today; an SMMU-driven CI gate against driver misbehaviour is a future item for an IOMMU-equipped target. Jetson Orin has an SMMU and is in scope once that port lands. Until an ADR brings a no-IOMMU board into the model with explicit mitigations (physical-contract trust, driver constraint, device disablement), such boards trust their bus masters implicitly and release notes record this per target. ### Trust boundaries @@ -325,7 +325,7 @@ Each of these is a future ADR. - First cryptographic primitives: hash function, signature scheme, AEAD — per-primitive ADR when the need arises. - Measured / secure boot design for Tier 2 hardware (Pi 4, Pi 5) and Tier 3 (Jetson). - Resource-exhaustion DoS policy within the scheduler (quotas, priority inheritance, deadline inheritance). -- **IOMMU / SMMU policy per target.** Raspberry Pi 4 has no SMMU — do we accept implicit trust of all enabled bus masters, refuse to enable DMA-capable devices and force PIO, or gate DMA-capable devices behind a deployment-time opt-in? QEMU `virt` has SMMUv3 and should be the CI gate that catches driver regressions against IOMMU expectations. Jetson Orin has an SMMU and adopts the same model when its port lands. ADR required before the first driver that enables bus-master DMA. +- **IOMMU / SMMU policy per target.** Raspberry Pi 4 has no SMMU — do we accept implicit trust of all enabled bus masters, refuse to enable DMA-capable devices and force PIO, or gate DMA-capable devices behind a deployment-time opt-in? Tyrne's v1 QEMU `virt` target is GICv2 / no-IOMMU ([ADR-0036](../decisions/0036-qemu-virt-gicv2-no-iommu-v1.md)); an SMMU-driven CI gate against IOMMU-expectation regressions is a future item for an IOMMU-equipped target (e.g. Jetson Orin, which has an SMMU and adopts this model when its port lands), **not** a v1 QEMU-`virt` capability. ADR required before the first driver that enables bus-master DMA. - **Concrete bounds** for the quotas under *Bounded kernel resources*: numeric defaults for each, per-target tuning policy, and how upgrades change them without invalidating running systems. - **Cross-table capability derivation tree (CDT).** Whether IPC-transferred capabilities should retain a parent-child link to the sender's entry so that the sender can revoke the copy post-transfer, and — if so — how per-task-table CDT storage scales. seL4's answer is a whole-system CDT; Phase B needs to decide before the first multi-task system uses transfer as a revoke-retained grant. See the v1 qualification on *Revocation is transitive* above. - **Early IRQ masking in BSP reset vectors.** ✅ **Closed by [T-013](../analysis/tasks/phase-b/T-013-el-drop-to-el1.md) (Done 2026-04-27, ADR-0024, [UNSAFE-2026-0017](../audits/unsafe-log.md)).** The `_start` symbol in [`bsp-qemu-virt/src/boot.s`](../../bsp-qemu-virt/src/boot.s) now begins with `msr daifset, #0xf` as the **literal first instruction** before stack/BSS setup, and the [BSP boot checklist §1a](../standards/bsp-boot-checklist.md) records "mask DAIF first" as a standard reset-vector prologue every future BSP must observe. The previous "per-platform accident" framing is retired; DAIF masking is now a structural property of every Tyrne reset vector, with the audit trail captured under UNSAFE-2026-0017's first Amendment block. Future BSPs (`bsp-pi4`, etc.) inherit the rule via the boot checklist. diff --git a/docs/architecture/task-loader.md b/docs/architecture/task-loader.md index 7ad09c0..8392b87 100644 --- a/docs/architecture/task-loader.md +++ b/docs/architecture/task-loader.md @@ -14,11 +14,11 @@ The loader is also the first runtime exerciser of three audit-log entries that p T-019 produces a [`LoadedImage`](../../kernel/src/obj/task_loader.rs) **descriptor**, not a `CapHandle{CapObject::Task(...)}` (a runnable task cap). The reasons are architectural, not implementation laziness: -- The current [`Task`](../../kernel/src/obj/task.rs) struct carries `id: u32` + `address_space_handle: AddressSpaceHandle` only — there is **no** PC/SP context register file on it, so `ContextSwitch::init_context` cannot consume a `LoadedImage` until B5 adds a per-task context surface. -- The loader's new address space contains **only** the image + stack mappings. No kernel mappings are installed — an EL1 exception taken while the userspace AS is active would translation-fault on the kernel-side vector fetch. The kernel-in-userspace-AS problem is the future [ADR-0033 high-half migration placeholder](../decisions/0027-kernel-virtual-memory-layout.md)'s responsibility, gated on B5 surfacing per-task `TTBR0_EL1` swap. +- The current [`Task`](../../kernel/src/obj/task.rs) struct carries `id: u32` + `address_space_handle: AddressSpaceHandle` only — there is **no** PC/SP context register file on it, so `ContextSwitch::init_context` cannot consume a `LoadedImage` until B6 adds a per-task EL0 context surface (B5 landed the syscall ABI, not the EL0 context). +- The loader's new address space contains **only** the image + stack mappings. No kernel mappings are installed — an EL1 exception taken while the userspace AS is active would translation-fault on the kernel-side vector fetch. The kernel-in-userspace-AS problem is the future [ADR-0033 high-half migration placeholder](../decisions/0027-kernel-virtual-memory-layout.md)'s responsibility, gated on B6 surfacing per-task `TTBR0_EL1` swap (B5 closed via the syscall boundary without it — the real EL0 round-trip is B6). - The syscall entry path that lets a userspace task make its first kernel call is ADR-0030 / ADR-0031 work, also B5. -The `task_create_from_image` wrapper that turns a `LoadedImage` into a runnable task cap lands with B5 (syscall ABI per ADR-0030) and B6 (first userspace "hello") per [phase-b §B4 §Revision-notes](../roadmap/phases/phase-b.md#milestone-b4--task-loader). +The `task_create_from_image` wrapper that turns a `LoadedImage` into a runnable task cap lands with B6 (first userspace "hello"), building on B5's now-landed syscall ABI (ADR-0030 / ADR-0031), per [phase-b §B4 §Revision-notes](../roadmap/phases/phase-b.md#milestone-b4--task-loader). ## Pipeline (one §Simulation row at a time) diff --git a/docs/roadmap/current.md b/docs/roadmap/current.md index ae99007..d337a14 100644 --- a/docs/roadmap/current.md +++ b/docs/roadmap/current.md @@ -4,6 +4,8 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ --- +> **2026-05-29 update — B5 CLOSED via the closure trio; B6 (first userspace "hello") is next.** The B5 milestone (Syscall boundary) is formally **Closed** via its closure trio — [security review](../analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md) (**Approve**, eight axes) + [business retrospective](../analysis/reviews/business-reviews/2026-05-29-B5-closure.md) + [performance baseline](../analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md) — **the canonical source for B5's closing metrics** (not duplicated here). [ADR-0030](../decisions/0030-syscall-abi.md) (syscall ABI + the K2-5 `IpcError` split) + [ADR-0031](../decisions/0031-initial-syscall-set.md) (the five-syscall v1 set) Accepted; **T-020** (error taxonomy + `Capability`/`CapObject` `Debug` redaction) + **T-021** (EL0→EL1 `SVC` dispatch) merged together via [PR #34](https://github.com/HodeTech/Tyrne/pull/34) (merge `f98e1af`) and are now **Done**. Headline (reproduced live at HEAD `afeed10`): **339 host tests** (43 hal + 240 kernel + 53 test-hal + 3 doc; +53 kernel) + `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` clean (**0 UB, run locally** — first closure to do so); fmt / host-clippy / kernel-clippy / kernel-build clean; release ELF `.text 34,648 / .rodata 4,856 / .bss 50,592` (~88.0 KiB, +4.8 % vs B4); release perf band p10/p50/p90 = **17.645 / 20.300 / 24.706 ms** — a **same-host control** (the B4 binary `3ab029f` rebuilt + re-measured this session) proves the ~+2.9 ms vs B4 is a real, one-time boot `SVC`-smoke cost, **not** host jitter. QEMU smoke clean to `tyrne: all tasks complete`; `-d int,unimp,guest_errors` = **712 (release) / 776 (debug)** pre-existing PL011 warnings + **2 expected `SVC` exceptions** (EL1→EL1, `ESR 0x15/0x56000000` = SVC64, clean `ERET`), zero new fault class. Audit log **30 entries (29 Active)** — UNSAFE-2026-0029 (`SVC` trampoline) + 0030 (copy-user, disjointness-corrected). **B5 builds the boundary *mechanism* (the EL1-stub `+0x200` proxy); the real EL0 `+0x400` round-trip + the three [T-021 carry-forward gates](phases/phase-b.md#milestone-b6--first-userspace-hello) are B6.** This banner supersedes the In-Review banner below. +> > **2026-05-29 update — T-021 (EL0→EL1 SVC dispatch) implemented; In Review on [PR #34](https://github.com/HodeTech/Tyrne/pull/34).** The security-critical hardware-facing half of B5 has landed. **PR #34** (base `main`, branch `t-021-syscall-dispatch`, 9 commits) bundles **T-020 + T-021** in one review per the maintainer's call — the commit map in the PR body delineates the two tasks (T-020 = the `IpcError` split + `Debug` redaction + ADR-0030/0031; T-021 = the SVC dispatch). A new architecture-agnostic, **panic-free** kernel `syscall` module ([`kernel/src/syscall/`](../../kernel/src/syscall/)) instantiates [ADR-0030](../decisions/0030-syscall-abi.md) (register ABI + `SyscallError` composing `CapError`/`IpcError` with a stable numeric status encoding, `0 = Ok`) and [ADR-0031](../decisions/0031-initial-syscall-set.md) (the five-syscall v1 set): `error.rs` (`SyscallError`), `abi.rs` (number decode + register packing + the null-handle sentinel), `user_access.rs` (`UserAccessWindow` + validated `copy_from_user`/`copy_to_user`), `dispatch.rs` (the dispatcher + handlers + the **debug-console capability** check). The BSP gained the **`SVC` sync trap trampoline** ([`vectors.s`](../../bsp-qemu-virt/src/vectors.s) `tyrne_sync_trampoline`) installed at **both** `VBAR_EL1+0x200` (the current-EL path B5 exercises) and `+0x400` (the real-EL0 path B6 verifies), saving the full `x0`–`x30`+`SP_EL0` frame and routing `ESR_EL1.EC == SVC64` to a Rust [`syscall_entry`](../../bsp-qemu-virt/src/syscall.rs). New cap surface: `CapObject::DebugConsole` (singleton, no handle) + `CapRights::CONSOLE_WRITE` (bit 7) + `CapHandle::from_raw` (ABI decode). The `console_write` syscall carries **two independent gates** — the capability check (all builds) and the release **debug-gate** (`cfg!(debug_assertions)`; number `5` → `BadSyscallNumber` in release). **Gates:** fmt / host-clippy / kernel-clippy / kernel-build clean; **host tests 236** (was 196, +40 syscall tests); `cargo test --release` green; `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` clean. **QEMU smoke (debug):** an EL1 kernel-stub issues two `SVC #0`s — `console_write` (emits a greeting via the syscall path; status `0x0`, 63 bytes) + a reserved-invalid number (status `0x1` = `BadSyscallNumber`); `-d int,unimp,guest_errors` shows exactly two `SVC` exceptions at the current-EL vector (`ESR 0x15/0x56000000` = SVC64, `EL1→EL1`), clean `ERET`, no new fault class; the cooperative demo still runs to `tyrne: all tasks complete`. Two new audit entries: [UNSAFE-2026-0029](../audits/unsafe-log.md#unsafe-2026-0029--svc-sync-trap-trampoline--syscall_entry-register-frame-access) (trap-frame asm) + [UNSAFE-2026-0030](../audits/unsafe-log.md#unsafe-2026-0030--validated-copy-fromto-user-byte-move-via-coreptrcopy_nonoverlapping) (copy-user). **Security-relevant — flagged for explicit security review.** This banner supersedes the 2026-05-29 B5-opened banner below. > > **2026-05-29 update — B5 opened: ADR-0030/0031 Accepted; T-020 (error taxonomy + Debug redaction) In Review; T-021 (SVC dispatch) Ready.** The B5 syscall boundary is now active on branch `t-020-syscall-error-taxonomy` (off `main`). [ADR-0030](../decisions/0030-syscall-abi.md) settles the syscall ABI (`x8` = number, `x0`–`x5` args, `x0` status + `x1`–`x7` payload, `SVC #0`) and the **K2-5** taxonomy split of `IpcError::InvalidCapability` → `StaleHandle` / `MissingRight` / `WrongObjectKind`; [ADR-0031](../decisions/0031-initial-syscall-set.md) fixes the five-syscall v1 set (`send` / `recv` / `console_write` / `task_yield` / `task_exit`; number `0` reserved-invalid; every object-naming syscall capability-gated per [P1/P4](../standards/architectural-principles.md)). Both **Accepted 2026-05-29** (Propose → careful-re-read + maintainer-review Accept). **[T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** — the pure-Rust foundation (the `IpcError` split + `Capability`/`CapObject` `Debug` redaction, K3-9) — is **In Review** (implementation complete, all gates green incl. Miri; kernel host tests 196); **[T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md)** — the EL0→EL1 `SVC` trampoline + panic-free dispatcher + copy-from/to-user, the security-critical hardware-facing half — is opened **Ready**, deferred to its own arc per CLAUDE.md §6. A same-day maintainer review surfaced two items, folded into the ADR bodies **before Accept** (the ADRs were re-drafted at `Proposed` and Accepted in a separate commit, so no Accepted body was edited post-Accept): (a) the B5 kernel-stub `SVC` exercises the **current-EL** `VBAR_EL1+0x200` vector, not the lower-EL `+0x400` (EL0) vector — so the real-EL0 round-trip is runtime-verified in **B6**, not B5; (b) `console_write` is **capability-gated** on a debug-console capability (it was ambient authority, a P1/P4 violation). This banner supersedes the 2026-05-28 banner below. @@ -57,15 +59,18 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ --- -- **Active phase:** B — opened 2026-04-21. **B0 closed 2026-04-27**; **B1 closed 2026-05-07**; **B2 closed 2026-05-09**; **B3 closed 2026-05-14** via PR #29's closure trio (business + security + performance baseline; merge commit `b425dc1`). All four closures lifted `Done` after a verbatim QEMU smoke trace + clean `-d guest_errors` count per the [business master-plan §Acceptance criteria](../analysis/reviews/business-reviews/master-plan.md#acceptance-criteria) rule. **The 2026-04-28 implementation-complete claim for B1 was rolled back on 2026-05-06 by the smoke regression and re-issued 2026-05-07 as a smoke-verified Done** — that remains the only re-open arc to date; B2 and B3 both closed cleanly on first attempt. -- **Active milestone:** **B5 — Syscall boundary (opens next).** B4 (Task loader) was formally **Closed 2026-05-28** via its closure trio (see the top banner + the [B4 closure retrospective](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md)). B5 per [phase-b.md §B5](phases/phase-b.md): ADR-0030 (syscall ABI + `IpcError::InvalidCapability` split into `StaleHandle` / `MissingRight` / `WrongObjectKind`) + ADR-0031 (initial syscall set: `send`, `recv`, `console_write`, `task_yield`, `task_exit`), then EL0→EL1 SVC dispatch, a panic-free syscall dispatcher, validated copy-from/to-user through the active AS, and `Capability` `Debug` redaction. B5 is the prerequisite for the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper (B4 §3) that turns a `LoadedImage` into a runnable `CapHandle{CapObject::Task(...)}`, then B6 (first userspace "hello"). -- **Active task:** **T-021 — EL0→EL1 SVC dispatch, In Review** on branch `t-021-syscall-dispatch` (off the `t-020-syscall-error-taxonomy` HEAD so it carries the T-020 `IpcError` split + `Debug` redaction it depends on; stacks on T-020's PR). Implementation complete, all gates green (fmt / host-clippy / kernel-clippy / kernel-build / host-test 236 / `test --release` / Miri); QEMU smoke shows the EL1-kernel-stub `SVC` round-trip (see the top banner). Lands the kernel `syscall` module + the BSP `SVC` trampoline + the debug-console capability + two audit entries (UNSAFE-2026-0029 / 0030). **Last task Done: T-019 — Task loader, 2026-05-16** (PR #31, merge `7f876af`; branch `t-019-task-loader` retired): `pub fn load_image(...) -> Result` in `kernel/src/obj/task_loader.rs` produces a `LoadedImage { as_cap, entry_va, stack_top_va, image_bytes, stack_bytes }` of a freshly populated userspace AS — **not** a runnable `CapHandle{CapObject::Task(...)}` (runnability gates on B5/B6); 10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry. -- **In review:** **T-021 — EL0→EL1 SVC dispatch** (on `t-021-syscall-dispatch`) **and T-020 — Syscall error taxonomy** (the `IpcError::InvalidCapability` → `StaleHandle`/`MissingRight`/`WrongObjectKind` split + `Capability`/`CapObject` `Debug` redaction, on `t-020-syscall-error-taxonomy`). Both implementation-complete with all gates green incl. Miri; T-021 stacks on T-020. Awaiting maintainer review/merge. -- **In progress:** none (T-021 implementation complete → In Review). -- **Working branch:** `t-021-syscall-dispatch` (based on `t-020-syscall-error-taxonomy` HEAD, itself off `main`; ADRs Accepted). Pushed to `origin`; **[PR #34](https://github.com/HodeTech/Tyrne/pull/34)** open against `main` carries **both T-020 + T-021** (9 commits) in one combined review — the maintainer chose the bundled-PR shape over stacked PRs. (`t-020-syscall-error-taxonomy` is **not** separately pushed; its commits ride in PR #34.) No rebase pending. -- **Last completed milestone:** **B4 — Task loader, Closed 2026-05-28** via the closure trio ([business](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md) + [security](../analysis/reviews/security-reviews/2026-05-28-B4-closure.md) Approve + [performance](../analysis/reviews/performance-optimization-reviews/2026-05-28-B4-closure.md) baseline). Required task Done: T-019 (2026-05-16, PR #31 `7f876af`). The trio is the **canonical source for B4's closing metrics**; headline: **286** host tests, QEMU smoke clean (629 guest-errors, all pre-existing PL011, zero fault classes), release perf band 15.641 / 17.587 / 19.150 ms, audit log 28 entries. The 2026-05-22 master review + PR #32 remediation (all 24 Blocker+Major findings resolved, MR-009 closed at this closure) landed in this period. **Previous closures:** **B3** 2026-05-14 (PR #29 `b425dc1`); **B2** 2026-05-09; **B1** 2026-05-07 (PR #15 `e9fa019` + PR #16 `95b15aa`); **B0** 2026-04-27 (PR #9 `9a66e8b`). -- **Last completed tasks:** **T-019 — Done 2026-05-16, merged to `main` via PR #31** (branch `t-019-task-loader`, merge commit `7f876af`) — Task loader: `load_image` produces a `LoadedImage` descriptor of a freshly populated userspace AS (10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry); does **not** mint a runnable `TaskCap` (B5/B6 prerequisite). **Earlier:** **T-018 — Done 2026-05-11, live on `main` 2026-05-14 via PR #28** (branch `t-018-address-space-kernel-object`, merge commit `47b0a86`). T-018 implementation: [`AddressSpace`](../../kernel/src/mm/address_space.rs) kernel-object struct + per-type [`AddressSpaceArena`](../../kernel/src/mm/address_space.rs) (ADR-0016 pattern); `CapKind::AddressSpace` + `CapObject::AddressSpace(AddressSpaceHandle)` variants in [`kernel/src/cap/mod.rs`](../../kernel/src/cap/mod.rs); capability-gated wrappers `cap_create_address_space` / `cap_map` / `cap_unmap` with step-by-step preflights (DERIVE rights → no-widening → depth preflight → arena/cap-table capacity → PMM alloc → arena commit → `cap_derive` cap-table insert); `Task` struct extension with `address_space_handle`; activation-on-context-switch hook threaded through `yield_now` / `start` / `ipc_recv_and_yield` / `ipc_send_and_yield` (closure-as-parameter, fires only when outgoing and incoming task ASes differ — short-circuits in v1's bootstrap-shared topology); BSP wiring in [`bsp-qemu-virt/src/main.rs`](../../bsp-qemu-virt/src/main.rs) wraps the already-live bootstrap root via the new `QemuVirtAddressSpace::from_existing_root` `pub unsafe fn` companion. Cross-cutting additions during the review-round arc: `MmuError::BlockMapped` variant (commit `8b9f52e`) so unmap into a bootstrap block descriptor surfaces a distinct typed error from `AlreadyMapped`; `CapabilityTable::depth_of` `pub(crate)` preflight helper closing the PMM-leak path; UNSAFE-2026-0014 fifth Amendment scope-extends the umbrella to the activation hook + BSP-side activation closure (zero new audit entries — additive scope on the existing `&mut Scheduler` momentary-borrow umbrella). Smoke trace gains one new line `tyrne: address-space-arena ready (1 / 8 slots used; bootstrap AS root = 0x4008d000)` immediately after `tyrne: pmm initialized (...)` and before `tyrne: timer ready (...)`. Full demo runs to `tyrne: all tasks complete`; `-d int,unimp,guest_errors` reports only the pre-existing PL011-disabled-UART noise (unchanged baseline). **Earlier:** T-017 — Done 2026-05-10 (PR #27, branch `t-017-physical-memory-manager`) — Physical Memory Manager (`Pmm` bitmap allocator + `FrameProvider` trait + UNSAFE-2026-0026 zero-fill audit). **Earlier:** T-016 — Done 2026-05-08 (branch `t-016-mmu-activation`) — MMU activation, VMSAv8 descriptor encoders, `MapperFlush` flush-token, UNSAFE-2026-0022 / 0023 / 0024 / 0025 introduced. **Earlier:** T-015 — Done 2026-05-07 (PR #17, branch `t-015-endpoint-rollback-cancel-recv`) — `ipc_cancel_recv` recovery primitive + symmetric scheduler+endpoint rollback in `ipc_recv_and_yield`'s Phase 2 Deadlock branch (ADR-0032). **Earlier:** T-014 (2026-05-07 via PR #15), T-012 (2026-04-28 via PR #10), T-013 (2026-04-27 via PR #9). +- **Active phase:** B — opened 2026-04-21. **B0 closed 2026-04-27**; **B1 closed 2026-05-07**; **B2 closed 2026-05-09**; **B3 closed 2026-05-14** via PR #29's closure trio (merge `b425dc1`); **B4 closed 2026-05-28** via its closure trio (T-019, PR #31 `7f876af`); **B5 closed 2026-05-29** via its closure trio (T-020 + T-021, PR #34 `f98e1af`). All six closures lifted `Done` after a verbatim QEMU smoke trace + clean `-d guest_errors` count per the [business master-plan §Acceptance criteria](../analysis/reviews/business-reviews/master-plan.md#acceptance-criteria) rule. **The 2026-04-28 implementation-complete claim for B1 was rolled back on 2026-05-06 by the smoke regression and re-issued 2026-05-07 as a smoke-verified Done** — that remains the only re-open arc to date; B2 and B3 both closed cleanly on first attempt. +- **Active milestone:** **B6 — First userspace "hello".** B5 (Syscall boundary) was formally **Closed 2026-05-29** via its closure trio (see the top banner + the [B5 business retrospective](../analysis/reviews/business-reviews/2026-05-29-B5-closure.md)). B6 per [phase-b.md §B6](phases/phase-b.md#milestone-b6--first-userspace-hello): a real EL0 task — loaded by the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) bridge (B4 §3, the `LoadedImage` → runnable `CapHandle{CapObject::Task(...)}` wrapper) — runs in its own AS, makes a `console_write` syscall through the lower-EL `VBAR_EL1+0x400` vector (the real EL0↔EL1 round-trip B5's `+0x200` proxy could not prove), and exits via `task_exit`. B6 **must** close the three [T-021 carry-forward gates](phases/phase-b.md#milestone-b6--first-userspace-hello) (per-task `console_write` window + per-page user-VA translation; `SP_EL1` init for `+0x400`; `SYSCALL_STUB_TABLE` → current-task table) before a real EL0 task runs, and pairs with the **ADR-0033 high-half** placeholder opening. B6 is the **Phase-B-closing** milestone (its review doubles as the Phase B retrospective). +- **Active task:** none — B5 closed via the closure trio. **Next to open:** the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper (the `LoadedImage` → runnable `TaskCap` bridge), paired with the **ADR-0033 high-half** ADR opening; then `userland/hello` + the `tyrne-user` safe-wrapper crate. **Last tasks Done: T-020 + T-021 — 2026-05-29** (PR #34, merge `f98e1af`): T-020 split `IpcError::InvalidCapability` → `StaleHandle`/`WrongObjectKind`/`MissingRight` + redacted `Capability`/`CapObject` `Debug` (zero new `unsafe`); T-021 landed the architecture-agnostic kernel `syscall` module (`SyscallError` / ABI / `UserAccessWindow` / dispatcher) + the BSP `tyrne_sync_trampoline` (current-EL `+0x200` + lower-EL `+0x400`) + `CapObject::DebugConsole` + `CapRights::CONSOLE_WRITE` + `CapHandle::from_raw`, exercised at B5 by an EL1-kernel-stub `SVC` (current-EL path; the real EL0 `+0x400` round-trip deferred to B6) — UNSAFE-2026-0029 / 0030. +- **In review:** none — T-020 + T-021 merged via PR #34 (`f98e1af`); the B5 closure trio (security + business + performance) is complete. +- **In progress:** none. +- **Working branch:** `sec-review-b5-syscall-boundary` (off `main` at `f98e1af`) carries the B5 security review (`c424dcb`) + the `security-model.md` SMMUv3 reconcile (`afeed10`) + this closure trio (business + performance). **[PR #34](https://github.com/HodeTech/Tyrne/pull/34)** (T-020 + T-021, 9 commits) **merged to `main` 2026-05-29** (`f98e1af`); branches `t-020-syscall-error-taxonomy` / `t-021-syscall-dispatch` retired. +- **Last completed milestone:** **B5 — Syscall boundary, Closed 2026-05-29** via the closure trio ([security](../analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md) Approve + [business](../analysis/reviews/business-reviews/2026-05-29-B5-closure.md) + [performance](../analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md)). Tasks Done: T-020 + T-021 (PR #34 `f98e1af`). The trio is the **canonical source for B5's closing metrics**; headline: **339** host tests (incl. local Miri 0 UB), QEMU smoke clean (712 release / 776 debug guest-errors, all pre-existing PL011, + 2 expected `SVC` exceptions, zero new fault class), release perf band 17.645 / 20.300 / 24.706 ms (same-host control proves a real ~+2.9 ms one-time boot `SVC`-smoke cost, not host jitter), audit log 30 entries (29 Active). **Previous closures:** **B4** 2026-05-28 (closure trio; T-019 PR #31 `7f876af`); **B3** 2026-05-14 (PR #29 `b425dc1`); **B2** 2026-05-09; **B1** 2026-05-07 (PR #15 `e9fa019` + PR #16 `95b15aa`); **B0** 2026-04-27 (PR #9 `9a66e8b`). +- **Last completed tasks:** **T-020 + T-021 — Done 2026-05-29 via PR #34 (`f98e1af`)** — the B5 syscall boundary: T-020 (the `IpcError::InvalidCapability` → `StaleHandle`/`WrongObjectKind`/`MissingRight` split + `Capability`/`CapObject` `Debug` redaction, zero new `unsafe`) + T-021 (the architecture-agnostic kernel `syscall` module — `SyscallError` / ABI / `UserAccessWindow` / dispatcher — + the BSP `tyrne_sync_trampoline` at `+0x200`/`+0x400` + `CapObject::DebugConsole` + `CapRights::CONSOLE_WRITE` + `CapHandle::from_raw`; two audit entries UNSAFE-2026-0029 / 0030; the real EL0 `+0x400` round-trip deferred to B6). **Earlier:** **T-019 — Done 2026-05-16, merged to `main` via PR #31** (branch `t-019-task-loader`, merge commit `7f876af`) — Task loader: `load_image` produces a `LoadedImage` descriptor of a freshly populated userspace AS (10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry); does **not** mint a runnable `TaskCap` (B5/B6 prerequisite). **Earlier:** **T-018 — Done 2026-05-11, live on `main` 2026-05-14 via PR #28** (branch `t-018-address-space-kernel-object`, merge commit `47b0a86`). T-018 implementation: [`AddressSpace`](../../kernel/src/mm/address_space.rs) kernel-object struct + per-type [`AddressSpaceArena`](../../kernel/src/mm/address_space.rs) (ADR-0016 pattern); `CapKind::AddressSpace` + `CapObject::AddressSpace(AddressSpaceHandle)` variants in [`kernel/src/cap/mod.rs`](../../kernel/src/cap/mod.rs); capability-gated wrappers `cap_create_address_space` / `cap_map` / `cap_unmap` with step-by-step preflights (DERIVE rights → no-widening → depth preflight → arena/cap-table capacity → PMM alloc → arena commit → `cap_derive` cap-table insert); `Task` struct extension with `address_space_handle`; activation-on-context-switch hook threaded through `yield_now` / `start` / `ipc_recv_and_yield` / `ipc_send_and_yield` (closure-as-parameter, fires only when outgoing and incoming task ASes differ — short-circuits in v1's bootstrap-shared topology); BSP wiring in [`bsp-qemu-virt/src/main.rs`](../../bsp-qemu-virt/src/main.rs) wraps the already-live bootstrap root via the new `QemuVirtAddressSpace::from_existing_root` `pub unsafe fn` companion. Cross-cutting additions during the review-round arc: `MmuError::BlockMapped` variant (commit `8b9f52e`) so unmap into a bootstrap block descriptor surfaces a distinct typed error from `AlreadyMapped`; `CapabilityTable::depth_of` `pub(crate)` preflight helper closing the PMM-leak path; UNSAFE-2026-0014 fifth Amendment scope-extends the umbrella to the activation hook + BSP-side activation closure (zero new audit entries — additive scope on the existing `&mut Scheduler` momentary-borrow umbrella). Smoke trace gains one new line `tyrne: address-space-arena ready (1 / 8 slots used; bootstrap AS root = 0x4008d000)` immediately after `tyrne: pmm initialized (...)` and before `tyrne: timer ready (...)`. Full demo runs to `tyrne: all tasks complete`; `-d int,unimp,guest_errors` reports only the pre-existing PL011-disabled-UART noise (unchanged baseline). **Earlier:** T-017 — Done 2026-05-10 (PR #27, branch `t-017-physical-memory-manager`) — Physical Memory Manager (`Pmm` bitmap allocator + `FrameProvider` trait + UNSAFE-2026-0026 zero-fill audit). **Earlier:** T-016 — Done 2026-05-08 (branch `t-016-mmu-activation`) — MMU activation, VMSAv8 descriptor encoders, `MapperFlush` flush-token, UNSAFE-2026-0022 / 0023 / 0024 / 0025 introduced. **Earlier:** T-015 — Done 2026-05-07 (PR #17, branch `t-015-endpoint-rollback-cancel-recv`) — `ipc_cancel_recv` recovery primitive + symmetric scheduler+endpoint rollback in `ipc_recv_and_yield`'s Phase 2 Deadlock branch (ADR-0032). **Earlier:** T-014 (2026-05-07 via PR #15), T-012 (2026-04-28 via PR #10), T-013 (2026-04-27 via PR #9). - **Last reviews:** + - [B5 closure security review (2026-05-29)](../analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md) — Approve, eight axes; the EL0→EL1 boundary is panic-free, capability-gated, copy-user-validated; UNSAFE-2026-0029/0030 policy-conformant; reconciled the `security-model.md` SMMUv3-CI staleness vs ADR-0036 + - [B5 closure business retrospective (2026-05-29)](../analysis/reviews/business-reviews/2026-05-29-B5-closure.md) — Syscall boundary (T-020 + T-021); one-day milestone via the pure-Rust/hardware-boundary split; adversarial pass + Miri corrected a copy-user soundness over-claim + - [B5 closure performance baseline (2026-05-29)](../analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md) — same-host control proves a real ~+2.9 ms one-time boot `SVC`-smoke cost (not host jitter); band p10/p50/p90 = 17.645 / 20.300 / 24.706 ms - [B4 closure retrospective (2026-05-28)](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md) — Task loader (T-019) + the 2026-05-22 master-review interlude + PR #32 remediation (23/24 verified findings closed) - [B4 closure consolidated security review (2026-05-28)](../analysis/reviews/security-reviews/2026-05-28-B4-closure.md) — Approve, eight axes pass - [B4 closure performance baseline (2026-05-28)](../analysis/reviews/performance-optimization-reviews/2026-05-28-B4-closure.md) — re-baseline; release band p10/p50/p90 = 15.641 / 17.587 / 19.150 ms @@ -95,8 +100,8 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ - [ADR-0026 — Idle dispatch via separate fallback slot](../decisions/0026-idle-dispatch-fallback.md) — `Accepted` (2026-05-06). Supersedes ADR-0022's *idle-task-location* axis only (Option A → Option B: dedicated `Scheduler::idle: Option` slot, dispatched via `ready.dequeue().or(s.idle)` only when the ready queue is empty). ADR-0022's *typed-error* axis (Option G — `SchedError::Deadlock` + `IpcError::PendingAfterResume` + `start`'s panic) stands. Implemented by T-014 (Done 2026-05-07). Includes a queue-state simulation table that ADR-0022 lacked; this discipline (simulation tables on multi-step state-machine ADRs) is the central learning of the [B1 smoke-regression arc](../analysis/reviews/business-reviews/2026-05-06-B1-smoke-regression.md). - [ADR-0032 — Endpoint state rollback + `ipc_cancel_recv` primitive](../decisions/0032-endpoint-rollback-and-cancel-recv.md) — `Accepted` (2026-05-07). Adds a recovery primitive that reverses an `Idle → RecvWaiting` transition, called by `ipc_recv_and_yield`'s Phase 2 Deadlock branch so both *scheduler* and *endpoint* state restore to pre-call shape on `SchedError::Deadlock`. Kernel-internal in v1 (no userspace caller); future consumers are the userspace-driven endpoint destroy drain (B2+), multi-waiter wake (ADR-0019 §Open questions), and preemption-rollback (B5+). Implemented by T-015 (Done 2026-05-07). Includes a Phase-2 Deadlock simulation table; ADR-0017 §Revision notes rider records the additive recovery primitive (user-observable surface unchanged). The Accept commit is the first project-side application of [`write-adr` skill](../../.agents/skills/write-adr/SKILL.md) step 10's *careful re-read* discipline as a separate diff from the Propose commit. - [ADR-0027 — Kernel virtual memory layout (B2 — identity-mapped MMU activation)](../decisions/0027-kernel-virtual-memory-layout.md) — **`Accepted` (2026-05-08)**. B2 commits to identity-only mapping (kernel in `TTBR0_EL1`; `TTBR1_EL1` reserved with `EPD1=1` for future high-half ADR-0033 placeholder when B5 surfaces per-task `TTBR0_EL1` swap), 4 KiB granule + 48-bit VA + 4-level translation, MAIR indices 0/1 for device-nGnRnE / normal-cached, four bootstrap page-table frames in a new `.boot_pt` section, and a typed [`MapperFlush`](../../hal/src/mmu/mod.rs) flush-token discipline at the `Mmu` trait surface (additive change to `map`/`unmap` return types, recorded in ADR-0009 §Revision notes rider via T-016). Includes a five-row §Simulation table walking the SCTLR.M=1 transition (Steps 0–4). **First non-recovery-primitive state-machine ADR drafted under [`write-adr` skill §Simulation](../../.agents/skills/write-adr/SKILL.md) discipline** — ADR-0026's table was the empirical retro-source; ADR-0032's table was the first application but its subject is a recovery primitive; ADR-0027 is the first productive-design state machine to use the rule. Implementation: T-016 (Draft, opens with the Propose commit). Accept landed as a separate commit (`bb0a6ba`) per `write-adr` §10. -- **Next task to open:** the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper (B4 §3) that turns a `LoadedImage` into a runnable `CapHandle{CapObject::Task(...)}`, then **B6 (first userspace "hello")** — where the real EL0 round-trip through the lower-EL `VBAR_EL1+0x400` vector (EL0↔EL1 transition + copy-user against a separate userspace `TTBR0_EL1`) is finally runtime-verified (T-021's B5 proxy only drove the current-EL `+0x200` path; the `+0x400` handler is installed but unexercised). B6 also grants the debug-console capability to the first EL0 task and wires the real `task_yield` / `task_exit` semantics. The [phase-b.md §B6](phases/phase-b.md#milestone-b6--first-userspace-hello) plan describes the milestone shape. **T-020 + T-021 (both In Review) must merge to `main` first.** -- **Next review trigger:** **B5 closure trio** — produced when the first B5 milestone reaches `In Review`. (The B4 closure trio fired 2026-05-28.) Possible interim triggers: a mini-retro if EL0/syscall bring-up surfaces a learning worth capturing mid-arc; a maintainer-initiated review or a second on-demand full-tree master review if the corpus drifts again before B5 closes. Forward-flag audit notes: UNSAFE-2026-0025 / 0026's `Pending QEMU smoke verification` notes were lifted by T-019 (first post-bootstrap `cap_map` / `cap_create_address_space` runtime exerciser); UNSAFE-2026-0019 / 0020 / 0021 continue to gate on the first deadline-arming caller (B5+). +- **Next task to open:** the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper (B4 §3) that turns a `LoadedImage` into a runnable `CapHandle{CapObject::Task(...)}`, then **B6 (first userspace "hello")** — where the real EL0 round-trip through the lower-EL `VBAR_EL1+0x400` vector (EL0↔EL1 transition + copy-user against a separate userspace `TTBR0_EL1`) is finally runtime-verified (T-021's B5 proxy only drove the current-EL `+0x200` path; the `+0x400` handler is installed but unexercised). B6 also grants the debug-console capability to the first EL0 task and wires the real `task_yield` / `task_exit` semantics. The [phase-b.md §B6](phases/phase-b.md#milestone-b6--first-userspace-hello) plan describes the milestone shape. **T-020 + T-021 merged to `main` 2026-05-29 (PR #34, `f98e1af`) and B5 is Closed via this trio — so `task_create_from_image` is the first B6 task to open.** +- **Next review trigger:** **B6 closure trio + Phase B retrospective** — produced when B6 (the first real userspace "hello") reaches `In Review`. (The B5 closure trio fired 2026-05-29.) Possible interim triggers: a mini-retro if EL0/syscall bring-up surfaces a learning worth capturing mid-arc; a maintainer-initiated review or a second on-demand full-tree master review if the corpus drifts again before B5 closes. Forward-flag audit notes: UNSAFE-2026-0025 / 0026's `Pending QEMU smoke verification` notes were lifted by T-019 (first post-bootstrap `cap_map` / `cap_create_address_space` runtime exerciser); UNSAFE-2026-0019 / 0020 / 0021 continue to gate on the first deadline-arming caller (B5+). ## Notes diff --git a/docs/roadmap/phases/phase-b.md b/docs/roadmap/phases/phase-b.md index 034aaba..460fc50 100644 --- a/docs/roadmap/phases/phase-b.md +++ b/docs/roadmap/phases/phase-b.md @@ -201,7 +201,7 @@ Load a userspace binary into an address space. For B4 the binary is statically e Traps from EL0 into EL1 via `SVC` (or the chosen mechanism). Syscall dispatch validates the caller's capabilities. Establish the initial syscall set and the calling convention. -**Status (2026-05-29): implementation in review.** [ADR-0030](../../decisions/0030-syscall-abi.md) (syscall ABI + the K2-5 `IpcError` taxonomy split) and [ADR-0031](../../decisions/0031-initial-syscall-set.md) (the five-syscall v1 set) are **Accepted**. **[T-020](../../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** (the `IpcError::InvalidCapability` → `StaleHandle`/`MissingRight`/`WrongObjectKind` split + `Capability`/`CapObject` `Debug` redaction — sub-breakdown §6) is **In Review**. **[T-021](../../analysis/tasks/phase-b/T-021-syscall-dispatch.md)** (the EL0→EL1 `SVC` trap trampoline + panic-free dispatcher + copy-from/to-user + the debug-console capability + `SyscallError` — sub-breakdown §§3–5 + 7) is **In Review** on branch `t-021-syscall-dispatch`: a new architecture-agnostic kernel `syscall` module + the BSP `SVC` sync trampoline (installed at `VBAR_EL1+0x200` and `+0x400`), exercised at B5 by an EL1 kernel-stub `SVC` (current-EL `+0x200` path) with all gates green (host tests 236; `test --release`; Miri clean; QEMU smoke shows the round-trip + `console_write` emitted bytes). The **real EL0 `+0x400` round-trip** is carried to B6 per the acceptance criteria below. Remaining after T-020/T-021 merge: the deferred [`task_create_from_image`](#milestone-b4--task-loader) wrapper, then B6. +**Status (2026-05-29): CLOSED via the closure trio.** B5 (syscall boundary) is formally **Closed** — [security review](../../analysis/reviews/security-reviews/2026-05-29-B5-syscall-boundary.md) (**Approve**, eight axes) + [business retrospective](../../analysis/reviews/business-reviews/2026-05-29-B5-closure.md) + [performance baseline](../../analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md), the **canonical source for B5's closing metrics** (339 host tests incl. local Miri 0 UB; release band p10/p50/p90 = 17.645/20.300/24.706 ms — a same-host control proves a real ~+2.9 ms one-time boot `SVC`-smoke cost, not host jitter; audit log 30 entries / 29 Active; QEMU smoke clean + 2 expected `SVC` exceptions, zero new fault class). T-020 + T-021 merged via [PR #34](https://github.com/HodeTech/Tyrne/pull/34) (`f98e1af`). **The real EL0 `+0x400` round-trip + the three T-021 carry-forward gates are carried to B6** (the §B6 acceptance criteria + "T-021 carry-forward gates" subsection below). **Pre-closure status (preserved as the design record):** implementation in review. [ADR-0030](../../decisions/0030-syscall-abi.md) (syscall ABI + the K2-5 `IpcError` taxonomy split) and [ADR-0031](../../decisions/0031-initial-syscall-set.md) (the five-syscall v1 set) are **Accepted**. **[T-020](../../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** (the `IpcError::InvalidCapability` → `StaleHandle`/`MissingRight`/`WrongObjectKind` split + `Capability`/`CapObject` `Debug` redaction — sub-breakdown §6) is **In Review**. **[T-021](../../analysis/tasks/phase-b/T-021-syscall-dispatch.md)** (the EL0→EL1 `SVC` trap trampoline + panic-free dispatcher + copy-from/to-user + the debug-console capability + `SyscallError` — sub-breakdown §§3–5 + 7) is **In Review** on branch `t-021-syscall-dispatch`: a new architecture-agnostic kernel `syscall` module + the BSP `SVC` sync trampoline (installed at `VBAR_EL1+0x200` and `+0x400`), exercised at B5 by an EL1 kernel-stub `SVC` (current-EL `+0x200` path) with all gates green (host tests 236; `test --release`; Miri clean; QEMU smoke shows the round-trip + `console_write` emitted bytes). The **real EL0 `+0x400` round-trip** is carried to B6 per the acceptance criteria below. Remaining after T-020/T-021 merge: the deferred [`task_create_from_image`](#milestone-b4--task-loader) wrapper, then B6. ### Sub-breakdown @@ -233,6 +233,28 @@ Traps from EL0 into EL1 via `SVC` (or the chosen mechanism). Syscall dispatch va A real userspace task, loaded by B4, running in EL0 in its own address space, makes a `console_write` syscall, and exits cleanly via `task_exit`. +### B6 opening sequence & prerequisites + +> Added at B5 closure (2026-05-29) as the careful pre-B6 plan. The *decisions* named below are settled by their ADRs when B6 opens; this section fixes the **order and the dependency rationale**, not the decisions. The §Sub-breakdown that follows is the original B6 task list; steps 5–7 below map onto it. + +**The gating prerequisite — the kernel must stay reachable from every task's active translation.** Today the loader's userspace AS holds **only** image + stack ([`task_loader.rs`](../../../kernel/src/obj/task_loader.rs)): no kernel mappings. The moment a real EL0 task issues an `SVC` (or takes any exception), the CPU vectors to `VBAR_EL1` and *fetches* the trampoline instruction — which lives at a kernel PA **not mapped in that task's `TTBR0_EL1`** → translation-fault on the vector fetch, unrecoverable. (B5's smoke worked only because the EL1 stub runs in the *bootstrap* AS, where the kernel is identity-mapped.) **Nothing in B6 runs until this is solved** (`TCR_EL1.EPD1 = 1` today — TTBR1 disabled). + +**ADRs that open B6** (per [ADR-0025 §Rule 1](../../decisions/0025-adr-governance-amendments.md), the first implementation task opens in the same commit as the ADR's *Dependency chain*): + +1. **ADR-0033 — kernel reachable from every AS.** Settle: full **high-half** (kernel → `TTBR1_EL1`, `EPD1 = 0`; per-task userspace → `TTBR0_EL1`) vs. **map-the-kernel-into-each-`TTBR0`**. The [ADR-0027 §Decision-outcome](../../decisions/0027-kernel-virtual-memory-layout.md) cost analysis (Option C vs D — high-half costs a linker-script + jump-to-high-half dance + identity teardown + ~4 `unsafe` entries; the map-into-each-AS path trades that for per-AS kernel-mapping bookkeeping) is the starting point. High-half is the named, standard shape; the ADR must also settle `TCR_EL1.A1`/ASID per ADR-0027 §"ASID". +2. **EL0 task context decision** (folded into ADR-0033 or a sibling ADR). [`Task`](../../../kernel/src/obj/task.rs) carries only `id + address_space_handle` today; running EL0 needs an entry context (`ELR_EL1` = entry, `SPSR_EL1` = EL0t, `SP_EL0` = stack top) + a per-task `SP_EL1` kernel stack + an *enter-EL0 / `ERET`-into-EL0* path distinct from the cooperative [`ContextSwitch`](../../../hal/src/context_switch.rs) (which saves only the kernel callee-saved set). The syscall trampoline already provides the *return*-to-EL0 half; the *first-entry* half is new. +3. **ADR-0034 — kernel-image section permissions** (optional in B6). Its trigger (first attacker-observable EL0 execution) fires at B6, but it is hardening, **not** a functional blocker (the v1 `hello` is code-only, mapped `USER | EXECUTE`). Decide in B6 whether to harden now or defer. + +**Dependency-ordered task sequence** (each rides on the prior): + +1. **ADR-0033 + the kernel-in-every-AS implementation**, plus the per-task `TTBR0_EL1` swap on context switch going live (the T-018 activation differ-path that short-circuits in v1). +2. **EL0 task context register file + the enter-EL0 path + per-task `SP_EL1`** (closes [T-021 carry-forward gate #2](#milestone-b6--first-userspace-hello)). +3. **`task_create_from_image`** — `LoadedImage` → runnable `CapHandle{CapObject::Task(...)}` (composes steps 1 + 2; the deferred [§B4 §3](#milestone-b4--task-loader) bridge). +4. **Close the remaining T-021 carry-forward gates:** the per-task `console_write` window + per-page user-VA → kernel-VA translation returning `FaultAddress` (**gate #1 — security-critical**; without it an EL0 debug-console-cap holder reads arbitrary kernel memory), and `SYSCALL_STUB_TABLE` → the scheduler's current-task table (**gate #3**). +5. **`tyrne-user` crate** (safe wrappers) + **`userland/hello/` crate** + the `cargo build → objcopy -O binary → include_bytes!` pipeline + the shared `userland-layout` source-of-truth ([ADR-0029 §"Build pipeline (B6)"](../../decisions/0029-initial-userspace-image-format.md)). *(§Sub-breakdown items 1 + 3.)* +6. **Wire-up + QEMU smoke** *(§Sub-breakdown items 2 + 4)*: a true EL0 task takes the lower-EL `+0x400` vector, the dispatcher copies `console_write` from the task's `TTBR0_EL1`, `ERET` returns to EL0, and `task_exit` terminates it — the EL0↔EL1 round-trip B5's `+0x200` proxy could not prove. +7. **Closure = the Phase B retrospective** *(§Sub-breakdown items 5–7)*: the guide, the first hypothesis-driven performance cycle (real EL0 round-trip / IPC / context-switch vs the A6 baseline — and a same-host control, given the [B5 perf leg](../../analysis/reviews/performance-optimization-reviews/2026-05-29-B5-closure.md)'s finding that the harness is nearing its resolving floor), and a security review of the now-attacker-observable boundary. + ### Sub-breakdown 1. **Userspace "hello" program** — a minimal `no_std, no_main` binary living in `userland/hello/` (new crate) that calls the syscall ABI directly. @@ -287,11 +309,11 @@ When B6 is Done, run a business review. Phase C becomes active after that review | ADR-0027 | Kernel virtual memory layout (B2 — identity-mapped MMU activation) | B2 (**Accepted 2026-05-08**) | was ADR-0025 in the pre-2026-04-27 plan; renumbered down by 2 because ADR-0025 (governance) and ADR-0026 (T-012 reservation) consumed slots. Drives [T-016](../../analysis/tasks/phase-b/T-016-mmu-activation.md) (Draft 2026-05-08; moves to In Progress with this Accept). First ADR to apply [`write-adr` skill §Simulation](../../../.agents/skills/write-adr/SKILL.md) discipline forward (rather than retro-extracted as for ADR-0026 / ADR-0032). Accept landed as a separate commit per `write-adr` §10. Companion architecture doc: [`docs/architecture/memory-management.md`](../../architecture/memory-management.md). | | ADR-0028 | Address-space data structure (B3 — kernel-object + capability-gated `Mmu::map` wrappers + activation-on-context-switch) | B3 (**Accepted 2026-05-11**) | was ADR-0026 in the pre-2026-04-27 plan. Drives [T-018 (Draft 2026-05-11; moves to In Progress with the same-day Accept)](../../analysis/tasks/phase-b/T-018-address-space-kernel-object.md). Chosen shape: **Option A — Generic `AddressSpace` wrapping `M::AddressSpace` inline; per-type `AddressSpaceArena`**. Reuses [ADR-0016](../../decisions/0016-kernel-object-storage.md)'s per-type fixed-size-block arena pattern; propagates the existing `M: Mmu` generic axis from [ADR-0019](../../decisions/0019-scheduler-shape.md) / [ADR-0020](../../decisions/0020-cpu-trait-v2-context-switch.md); zero new `unsafe` audit-log entries (the activation borrow rides UNSAFE-2026-0014's existing umbrella); zero HAL trait surface change (post-T-016 [`Mmu`](../../../hal/src/mmu/mod.rs) trait stays stable). Includes the §Simulation table walking bootstrap-AS wrap / create / map / activation-on-context-switch state transitions per [`write-adr` skill §Simulation](../../../.agents/skills/write-adr/SKILL.md). | | ADR-0029 | Initial userspace image format | B4 | was ADR-0027 | -| ADR-0030 | Syscall ABI (includes `IpcError` taxonomy per K2-5) | B5 | was ADR-0028; scope still enlarged to cover error taxonomy | -| ADR-0031 | Initial syscall set | B5 | was ADR-0029 | +| ADR-0030 | Syscall ABI (includes `IpcError` taxonomy per K2-5) | B5 (**Accepted 2026-05-29**) | was ADR-0028. Settles the register convention (`x8`=number, `x0`–`x5` args, `SVC #0`, `x0`=status) + the dedicated-status-register encoding + `SyscallError` composition + the K2-5 `IpcError` split; drives [T-020](../../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md) + [T-021](../../analysis/tasks/phase-b/T-021-syscall-dispatch.md) (merged PR #34, `f98e1af`). | +| ADR-0031 | Initial syscall set | B5 (**Accepted 2026-05-29**) | was ADR-0029. Fixes the five-syscall v1 set (`send` / `recv` / `task_yield` / `task_exit` / `console_write`; `0` reserved-invalid); numbers `1`–`5` are a fixed ABI decision regression-verified by T-021's host tests, not chosen by the dispatcher. | | ADR-0032 | Endpoint state rollback on `ipc_recv_and_yield` Deadlock + `ipc_cancel_recv` primitive | B2 prep (**Accepted 2026-05-07**) | drove [T-015 (Done 2026-05-07)](../../analysis/tasks/phase-b/T-015-endpoint-rollback-cancel-recv.md) via PR #17. Surfaced as Track A non-blocker in the [2026-05-06 comprehensive review](../../analysis/reviews/code-reviews/2026-05-06-full-tree-comprehensive.md) and a forward-flagged item in the [2026-05-07 B1 closure security review](../../analysis/reviews/security-reviews/2026-05-07-B1-closure.md). Closed before B-phase task lands the first userspace-driven endpoint destroy. ADR-0017 §Revision notes rider records the additive recovery primitive (user-observable surface unchanged). | -| ADR-0033 | Kernel high-half migration | B5+ (placeholder; named-but-unallocated) | named in [ADR-0027](../../decisions/0027-kernel-virtual-memory-layout.md) §Decision outcome (Option D) as the future home of the `TTBR0_EL1`-swap discipline that arrives with userspace. No file today; opens with the first B5 task whose userspace requires per-task address-space switching. Mirrors the slot-naming pattern of ADR-0028 / 0029 / 0030 / 0031. | -| ADR-0034 | Kernel-image section permissions (.text RX / .rodata R / .bss/.data RW) | B-late (placeholder; named-but-unallocated) | named in [ADR-0027 §Decision outcome (a)](../../decisions/0027-kernel-virtual-memory-layout.md) as the future home of finer-grained kernel-image permissions. v1 maps the entire 128 MiB RAM range as kernel R/W/X via 2 MiB blocks; T-016 §Out of scope and [`memory-management.md` §"v1 layout"](../../architecture/memory-management.md) defer the re-map. Opens with the first B-phase task whose threat model includes a kernel R/W of `.text` as a meaningful surface — likely paired with the B5+ first userspace destroy that introduces an attacker-controlled execution context. | +| ADR-0033 | Kernel high-half migration (kernel reachable from every task AS) | **B6 (placeholder; opens with B6)** | named in [ADR-0027](../../decisions/0027-kernel-virtual-memory-layout.md) §Decision outcome (Option D) as the future home of the `TTBR0_EL1`-swap discipline that arrives with userspace. No file today; **B5 closed via the syscall boundary without surfacing the per-task swap** (B5's `SVC` proxy ran in the bootstrap AS), so the trigger is now **B6** — the first milestone whose userspace AS must keep the kernel reachable so an EL0 task's `SVC` vector fetch translates (see [§B6 opening sequence](#b6-opening-sequence--prerequisites)). The gating B6 prerequisite. Mirrors the slot-naming pattern of ADR-0028 / 0029 / 0030 / 0031. | +| ADR-0034 | Kernel-image section permissions (.text RX / .rodata R / .bss/.data RW) | B-late (placeholder; named-but-unallocated) | named in [ADR-0027 §Decision outcome (a)](../../decisions/0027-kernel-virtual-memory-layout.md) as the future home of finer-grained kernel-image permissions. v1 maps the entire 128 MiB RAM range as kernel R/W/X via 2 MiB blocks; T-016 §Out of scope and [`memory-management.md` §"v1 layout"](../../architecture/memory-management.md) defer the re-map. Opens with the first B-phase task whose threat model includes a kernel R/W of `.text` as a meaningful surface — likely **B6** — the first attacker-observable EL0 execution context (the v1 `hello` is code-only mapped `USER\|EXECUTE`, so ADR-0034 is hardening, not a B6 functional blocker; decide in B6 whether to harden now or defer). | | ADR-0035 | Physical Memory Manager (B3 prerequisite — bitmap allocator) | B3 (**Accepted 2026-05-09**) | new — drove the realisation that B3's "Address space abstraction" milestone has a foundational prerequisite (a real `FrameProvider` impl over physical RAM) which deserves its own ADR rather than being absorbed into ADR-0028 (address-space data structure). Drives [T-017 (Draft 2026-05-09; moves to In Progress with this Accept)](../../analysis/tasks/phase-b/T-017-physical-memory-manager.md). Bitmap allocator with hint pointer; 4 KiB metadata for QEMU virt's 32 K frames; reservation-list at init + cached for `free_frame` defensive validation per the §Simulation §Step 2 Critical row; forward-portable to high-half kernel without algorithm rewrite. Includes the §Simulation table walking init / alloc / free / exhaustion / recovery state transitions per [`write-adr` skill §Simulation](../../../.agents/skills/write-adr/SKILL.md). Accept landed as a separate commit per `write-adr` §10 after a careful re-read pass that surfaced and corrected three substantive drafting issues (broken anchor, safe-Rust-vs-`unsafe` zeroing contradiction, muddled "undefined-vs-error" wording in §Simulation row 2; the row-2 fix tightened the Pmm struct contract to add a cached reserved-range list for defensive `free_frame` validation, propagated to T-017). | | ADR-0036 | QEMU virt is GICv2 / no-IOMMU in v1 (corrects ADR-0004 / 0006 / 0012) | post-B1 (**Accepted 2026-05-22**) | new — surfaced by the [2026-05-22 full-tree master review](../../analysis/reviews/master-review/2026-05-22-152729/consolidated.md): the foundational ADRs carried GICv3 / SMMUv3 statements that do not match the GICv2, no-IOMMU reality of QEMU `virt` that B1's GIC work (above) actually assumed. **Corrects** (append-only redirect rider; does **not** supersede) [ADR-0004](../../decisions/0004-target-platforms.md) / [ADR-0006](../../decisions/0006-workspace-layout.md) / [ADR-0012](../../decisions/0012-boot-flow-qemu-virt.md). Ratifies the GICv2 fact stated in the B1 milestone. |