From 476710b2408cf2cb259addb889ae66025ae8a9ca Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 07:03:41 +0300 Subject: [PATCH 01/12] =?UTF-8?q?docs(adr):=20propose=20ADR-0030/0031=20?= =?UTF-8?q?=E2=80=94=20syscall=20ABI=20+=20initial=20syscall=20set?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-0030 settles the EL0->EL1 syscall calling convention (x8 = number, x0-x5 args, x0 status + x1-x7 payload, SVC #0), the dedicated-status error encoding, and the K2-5 split of IpcError::InvalidCapability into StaleHandle / WrongObjectKind / MissingRight (with the per-subject-cap security argument and the arena-staleness ordering caveat). ADR-0031 fixes the v1 syscall set (send, recv, console_write [capability-gated + release debug-gated], task_yield, task_exit), reserves number 0 as invalid, and pins each call's register layout; every object-naming syscall performs a capability check (P1/P4). Opens T-020 (error taxonomy + Capability/CapObject Debug redaction — the pure-Rust foundation, In Progress) and T-021 (SVC trap trampoline + panic-free dispatcher + copy-from/to-user — Ready, the security-critical hardware-facing half) to ground both ADRs' dependency chains per ADR-0025 Rule 1. Both ADRs land at Proposed; Accept follows in a separate commit. Refs: ADR-0030, ADR-0031 Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/analysis/tasks/phase-b/README.md | 2 + .../phase-b/T-020-syscall-error-taxonomy.md | 77 ++++++ .../tasks/phase-b/T-021-syscall-dispatch.md | 60 +++++ docs/decisions/0030-syscall-abi.md | 221 ++++++++++++++++++ docs/decisions/0031-initial-syscall-set.md | 163 +++++++++++++ docs/decisions/README.md | 4 +- 6 files changed, 526 insertions(+), 1 deletion(-) create mode 100644 docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md create mode 100644 docs/analysis/tasks/phase-b/T-021-syscall-dispatch.md create mode 100644 docs/decisions/0030-syscall-abi.md create mode 100644 docs/decisions/0031-initial-syscall-set.md diff --git a/docs/analysis/tasks/phase-b/README.md b/docs/analysis/tasks/phase-b/README.md index c49630d..dd95288 100644 --- a/docs/analysis/tasks/phase-b/README.md +++ b/docs/analysis/tasks/phase-b/README.md @@ -19,5 +19,7 @@ Tasks belonging to [Phase B — Real userspace](../../../roadmap/phases/phase-b. | [T-017](T-017-physical-memory-manager.md) | Physical Memory Manager (PMM): bitmap allocator + reservation tracking + `FrameProvider` impl (implements ADR-0035) | B3 | Done (2026-05-10) | | [T-018](T-018-address-space-kernel-object.md) | `AddressSpace` kernel object + capability-gated `Mmu::map`/`unmap` wrappers + activation-on-context-switch (implements ADR-0028) | B3 | Done (2026-05-11; live on `main` 2026-05-14 via PR #28) | | [T-019](T-019-task-loader.md) | Task loader: embedded raw-flat userspace image → `LoadedImage` metadata (implements ADR-0029) | B4 | Done (2026-05-16 via PR #31 merge) | +| [T-020](T-020-syscall-error-taxonomy.md) | Syscall error taxonomy: split `IpcError::InvalidCapability` + redact `Capability` `Debug` (implements ADR-0030 K2-5 / K3-9) | B5 | In Progress | +| [T-021](T-021-syscall-dispatch.md) | EL0→EL1 `SVC` dispatch: trap trampoline + panic-free dispatcher + copy-from/to-user (implements ADR-0030 / ADR-0031) | B5 | Ready | Tasks are added here as they become active. See [`../../../roadmap/phases/phase-b.md`](../../../roadmap/phases/phase-b.md) for the full phase plan. diff --git a/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md new file mode 100644 index 0000000..5f2429e --- /dev/null +++ b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md @@ -0,0 +1,77 @@ +# T-020 — Syscall error taxonomy: split `IpcError::InvalidCapability` + redact `Capability` `Debug` + +- **Phase:** B +- **Milestone:** B5 — Syscall boundary (this task is B5's pure-Rust foundation: the userspace-facing error taxonomy + capability-Debug redaction that the dispatcher in [T-021](T-021-syscall-dispatch.md) builds on; [ADR-0030](../../../decisions/0030-syscall-abi.md) settles the taxonomy) +- **Status:** In Progress +- **Created:** 2026-05-29 +- **Author:** @cemililik (+ Claude Opus 4.8 agent) +- **Dependencies:** [ADR-0030](../../../decisions/0030-syscall-abi.md) — must be `Accepted` before code lands (settles the `StaleHandle` / `MissingRight` / `WrongObjectKind` split and the §"Security of the taxonomy split" rationale). No prior task gates this; it is pure-Rust over the existing `kernel/src/ipc` + `kernel/src/cap` surfaces. +- **Informs:** Grounds [ADR-0030 §Dependency chain step 1](../../../decisions/0030-syscall-abi.md#dependency-chain) and discharges [ADR-0030 §Simulation row 3](../../../decisions/0030-syscall-abi.md#simulation). Unblocks [T-021](T-021-syscall-dispatch.md), whose dispatcher composes `SyscallError` from the now-granular `IpcError` and must never log an unredacted `Capability`. Closes the [2026-04-21 Phase-A code review](../../reviews/code-reviews/2026-04-21-tyrne-to-phase-a.md)'s `InvalidCapability`-collapse follow-up (K2-5) and the security review §6 redaction item (K3-9). +- **ADRs required:** [ADR-0030](../../../decisions/0030-syscall-abi.md). Adds an additive §Revision-notes rider to [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) (the IPC primitive set whose error taxonomy is refined — **not** superseded; the three-primitive surface is unchanged). No supersession. + +--- + +## User story + +As a future userspace caller (and the [T-021](T-021-syscall-dispatch.md) dispatcher that serves it), I want IPC capability failures to be reported as three distinct, handleable errors — a stale handle, a wrong-kind object, a missing right — instead of one collapsed `InvalidCapability`, and I want a `Capability`'s `Debug` output to never leak the kernel object it names, so the syscall error space is honest and a userspace-reachable log path cannot disclose capability internals. + +## Context + +[ADR-0030](../../../decisions/0030-syscall-abi.md) bundles the **K2-5** error-taxonomy decision with the syscall ABI so the in-kernel and userspace error spaces agree from the start. Today [`IpcError::InvalidCapability`](../../../../kernel/src/ipc/mod.rs) collapses three failure modes the [error-handling standard §"design checklist"](../../../standards/error-handling.md) says a caller would handle differently. This task performs the split — pure-Rust, host-testable, with no dependency on the (yet-unwritten) trap trampoline — so the error space is exercised by the existing IPC test suite well ahead of the first syscall. + +The same milestone's security item (**K3-9**, B5 sub-item 6, [security review §6](../../reviews/security-reviews/2026-04-21-tyrne-to-phase-a.md)) requires `Capability`'s derived `Debug` to be redacted before any userspace-reachable log path (`console_write`, T-021) can format one. The redaction is equally pure-Rust and is bundled here because it touches the same `kernel/src/cap` surface and shares the "make the userspace-observable error/diagnostic surface safe before the dispatcher exists" framing. + +This task deliberately **excludes** the trap trampoline, the dispatcher, `SyscallError`, and copy-from/to-user — those are [T-021](T-021-syscall-dispatch.md). Splitting the milestone keeps the security-critical hardware-facing boundary in its own task with its own review, per CLAUDE.md §6 ("do not dump entire subsystems in a single pass"). + +## Acceptance criteria + +- [ ] `IpcError::InvalidCapability` is removed and replaced by `IpcError::StaleHandle`, `IpcError::MissingRight`, `IpcError::WrongObjectKind`, each with a doc-comment describing its distinct meaning per [ADR-0030](../../../decisions/0030-syscall-abi.md). `IpcError` stays `#[non_exhaustive]`. +- [ ] Every production site is mapped to the correct variant: `validate_ep_cap` / `validate_notif_cap` (`ipc/mod.rs`) and `resolve_ep_cap` (`sched/mod.rs`) resolve in the order `StaleHandle → WrongObjectKind → MissingRight`; arena `get`/`get_mut` staleness failures map to `StaleHandle`. +- [ ] Every existing test asserting `InvalidCapability` is updated to its correct post-split variant (rights failures → `MissingRight`; stale-handle/destroyed-object → `StaleHandle`), and the `sched` bridge test is updated. +- [ ] New host tests pin each new variant on a path that does **not** already prove it — at minimum a `WrongObjectKind` test for an endpoint operation and for `ipc_notify` (a wrong-kind cap), and a `StaleHandle` test for `ipc_send`/`ipc_recv` against a destroyed endpoint. +- [ ] `Capability`'s `Debug` impl is custom (not derived) and prints `Capability { rights: , object: }` — `rights` visible, the named object redacted — per [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split) and the K3-9 redaction requirement. A host test pins that the output contains the rights and the literal `` and does **not** contain the object's handle. +- [ ] All gates green: `cargo fmt --all -- --check`, `cargo host-test`, `cargo host-clippy`, `cargo kernel-clippy`, `cargo kernel-build`, and `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt`. +- [ ] Docs updated: [`docs/architecture/ipc.md`](../../../architecture/ipc.md) §"`IpcError` taxonomy", [`docs/architecture/security-model.md`](../../../architecture/security-model.md) redaction rule broadened to capabilities, [`docs/glossary.md`](../../../glossary.md) syscall terms, the ADR index, and the [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) §Revision-notes rider. + +## Out of scope + +- The EL0→EL1 `SVC` trap trampoline, the panic-free dispatcher, `SyscallError`, and copy-from/to-user — all [T-021](T-021-syscall-dispatch.md). +- Splitting `IpcError::InvalidTransferCap` (note C3-008) — deferred to a future ADR when a userspace transfer consumer needs the `TransferCapHasChildren` distinction. +- Redacting `CapObject` / `CapHandle` / `SlotId` `Debug` impls themselves — the redaction is at the `Capability` boundary, where the rights+object pairing is the sensitive unit; the handle types remain `Debug` for kernel-internal diagnostics that never cross to userspace. +- Any userspace crate, EL0 context, or real syscall invocation — Phase B6. + +## Approach + +### Error split (`kernel/src/ipc/mod.rs`) + +Replace the single `InvalidCapability` variant with the three new ones (doc-commented). Rewrite `validate_ep_cap` and `validate_notif_cap` to the `lookup→StaleHandle`, `kind→WrongObjectKind`, `rights→MissingRight` order (resolve, then type-check, then authority-check — matching `CapError`'s `InvalidHandle`/`WrongKind`/`InsufficientRights` shape). Map the four arena `get`/`get_mut` staleness sites (`ipc_send`/`ipc_recv`/`ipc_notify`/`ipc_cancel_recv`) to `StaleHandle`. Update each operation's `# Errors` doc section to name the three variants. + +### Scheduler bridge (`kernel/src/sched/mod.rs`) + +`resolve_ep_cap` maps `lookup→StaleHandle` and `kind→WrongObjectKind` (it performs no rights check; rights are validated inside `ipc_send`/`ipc_recv`). `SchedError::Ipc(IpcError)` + its `From` impl propagate the split transparently through `?`; the only test to update is the bridge's send-error-preserves-state test (a correct-kind endpoint cap lacking `SEND` → `MissingRight`). + +### Capability `Debug` redaction (`kernel/src/cap/mod.rs`) + +Remove `#[derive(Debug)]` from `Capability`; add a hand-written `impl core::fmt::Debug` that emits `rights` and a `` placeholder for `object`. `EndpointState` (which embeds `Option`) derives only `Default`, so no cascade. Update the struct doc-comment to describe the redaction instead of "Debug is derived … exposes typed handles". + +### Simulation + +This task is a refactor of an existing state machine, not a new one; the relevant state-machine simulation is [ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation). This task **discharges row 3** of that table (the IPC-error-taxonomy mapping) via the per-variant host tests; the row-to-verification mapping is recorded in §Review history on completion. + +### Error handling + +Per [error-handling standard §2](../../../standards/error-handling.md): the enum stays `#[non_exhaustive]`, derives `Debug, Copy, Clone, Eq, PartialEq`, and each new variant is a distinct handleable case. No `From` impl changes (the split is within `IpcError`; `SchedError::Ipc`/`SyscallError::Ipc` wrap it unchanged). + +## Definition of done + +All acceptance criteria checked; gates green (incl. Miri — a [Phase-B exit prerequisite](../../../roadmap/phases/phase-b.md) with weight on `sched`/`ipc`); docs updated; ADR-0017 rider added; `current.md` reflects T-020 Done and B5 in progress. **Security-relevant** (capabilities + IPC): flagged for explicit review per CLAUDE.md. + +## Design notes + +- The validation **order change** (kind-before-rights) is observable only for a capability that is both wrong-kind *and* missing-right; all existing rights-failure tests use correct-kind caps and remain `MissingRight`. Documented in [ADR-0030 §"The K2-5 `IpcError` split"](../../../decisions/0030-syscall-abi.md#the-k2-5-ipcerror-split-lands-now-in-t-020). +- The security argument for revealing the failure mode (per-subject, unforgeable handles ⇒ no forgery/enumeration aid) is in [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split). The redaction keeps the *object identity* hidden even as the *failure mode* becomes visible — the two are independent surfaces. +- Redaction approach is a custom `impl Debug`, not a `Redacted` wrapper, matching the codebase's direct-impl style and avoiding a cascading wrapper refactor; no code structurally depends on `Capability: Debug`. + +## Review history + +- _(filled on close)_ diff --git a/docs/analysis/tasks/phase-b/T-021-syscall-dispatch.md b/docs/analysis/tasks/phase-b/T-021-syscall-dispatch.md new file mode 100644 index 0000000..42d53d6 --- /dev/null +++ b/docs/analysis/tasks/phase-b/T-021-syscall-dispatch.md @@ -0,0 +1,60 @@ +# T-021 — EL0→EL1 `SVC` dispatch: trap trampoline, panic-free dispatcher, copy-from/to-user + +- **Phase:** B +- **Milestone:** B5 — Syscall boundary (this task is B5's trap/dispatch implementation — the EL0→EL1 `SVC` path that instantiates [ADR-0030](../../../decisions/0030-syscall-abi.md)'s convention and [ADR-0031](../../../decisions/0031-initial-syscall-set.md)'s syscall set) +- **Status:** Ready +- **Created:** 2026-05-29 +- **Author:** @cemililik (+ Claude Opus 4.8 agent) +- **Dependencies:** [ADR-0030](../../../decisions/0030-syscall-abi.md) + [ADR-0031](../../../decisions/0031-initial-syscall-set.md) (both `Accepted`); [T-020](T-020-syscall-error-taxonomy.md) (the granular `IpcError` + redacted `Capability` `Debug` the dispatcher composes/relies on); [T-012](T-012-exception-and-irq-infrastructure.md) (the `VBAR_EL1` vector table the EL0-sync vector slots into); [T-013](T-013-el-drop-to-el1.md) (EL drop to EL1). +- **Informs:** Closes [ADR-0030 §Dependency chain steps 2–5](../../../decisions/0030-syscall-abi.md#dependency-chain) and [ADR-0031 §Dependency chain steps 2–5](../../../decisions/0031-initial-syscall-set.md#dependency-chain), and discharges every [ADR-0031 §Simulation](../../../decisions/0031-initial-syscall-set.md#simulation) row + [ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation) rows 0/1/2/4/5. Unblocks Phase B6 (first userspace "hello"); the deferred `task_create_from_image` wrapper ([phase-b §B4 §Revision-notes](../../../roadmap/phases/phase-b.md#milestone-b4--task-loader)) composes on top. +- **ADRs required:** [ADR-0030](../../../decisions/0030-syscall-abi.md), [ADR-0031](../../../decisions/0031-initial-syscall-set.md). Will introduce at least one new `UNSAFE-YYYY-NNNN` audit entry for the trap-frame save/restore asm (per [unsafe-policy](../../../standards/unsafe-policy.md)). + +--- + +## User story + +As the kernel, I want a userspace `SVC #0` to land in the EL1 sync vector, save the caller's registers, decode the syscall number, validate the caller's capabilities, perform the operation through an existing kernel primitive, encode a typed result, and `ERET` back to EL0 — **never panicking on any untrusted input** — so that EL0 code can call the kernel safely and a bad number / missing capability / out-of-bounds pointer returns a typed `SyscallError` instead of taking down the kernel. + +## Context + +[T-020](T-020-syscall-error-taxonomy.md) landed the pure-Rust foundation (the granular `IpcError`, the redacted `Capability` `Debug`). This task lands the **hardware-facing** half of B5 and is deliberately a separate task: the EL0→EL1 trap is the single most security-sensitive boundary in the system, involves hand-written register-save asm and `unsafe`, and warrants its own focused review rather than being bundled with the error-taxonomy refactor (CLAUDE.md §6). + +A structural constraint shapes this task's *runtime* verification, and the vector path it can actually exercise. A **real** EL0 task cannot yet take the trap, because the loaded userspace address space holds only image + stack (no kernel mappings, so the EL1 vector fetch would translation-fault) and the `Task` struct carries no EL0 context register file — both gated on the [ADR-0033 high-half placeholder](../../../decisions/0027-kernel-virtual-memory-layout.md) and Phase B6. + +Crucially, the only `SVC` this milestone can drive comes from an **EL1 kernel-stub**, and an `SVC` issued at EL1 takes the **current-EL-with-SPx** sync vector at `VBAR_EL1 + 0x200` — **not** the lower-EL (EL0) sync vector at `+0x400`. So B5's acceptance criterion #7 proves the *shared* dispatcher / trap-frame / `ERET` mechanism via the `0x200` self-`SVC` path; it does **not** prove the `0x400` vector entry, the EL0↔EL1 privilege transition, or copy-user against a separate userspace `TTBR0_EL1` AS. Those are runtime-verified in **B6** with the first real EL0 task, per [ADR-0030 §Simulation row-to-verification mapping](../../../decisions/0030-syscall-abi.md#simulation). This task therefore installs the dispatcher at *both* the `0x200` and `0x400` sync slots (the handler is privilege-entry-agnostic) but only the `0x200` path runs at B5; host tests carry the rest of the dispatcher's correctness. + +## Acceptance criteria + +- [ ] The Rust dispatcher is installed at **both** sync exception-vector slots — current-EL-with-SPx (`VBAR_EL1 + 0x200`, the EL1 self-`SVC` path B5 exercises) and lower-EL-AArch64 (`VBAR_EL1 + 0x400`, the real EL0 path verified in B6). The vector entry saves `x0`–`x30` + `SP_EL0` to a trap frame and, on `ESR_EL1.EC == SVC64`, routes to the dispatcher; other sync causes route to the existing fault path (out of scope here). +- [ ] A panic-free dispatcher decodes `x8`: number `0` and any number outside the v1 set return `SyscallError::BadSyscallNumber`; numbers `1`–`5` dispatch to handlers for `send` / `recv` / `task_yield` / `task_exit` / `console_write` per [ADR-0031](../../../decisions/0031-initial-syscall-set.md). No path can `panic!`/`unwrap`/`expect` on register-supplied input. +- [ ] **Every object-naming syscall performs a capability check** ([P1 / P4](../../../standards/architectural-principles.md)): `send`/`recv` validate the endpoint cap; `console_write` validates a **debug-console capability** (its `x0` arg) — a new `CapObject` kind introduced here — before any output. `task_yield`/`task_exit` act only on the trusted current-task identity (no object-cap argument). +- [ ] `SyscallError` (per [ADR-0030](../../../decisions/0030-syscall-abi.md)) lands with `From` / `From` impls and a stable numeric status encoding host-tested against the fixed [ADR-0031](../../../decisions/0031-initial-syscall-set.md) numbers; `0` is reserved for `Ok`. +- [ ] `copy_from_user` / `copy_to_user` validate the byte range against the **active** address space and never dereference a raw user pointer outside a validated mapping; `console_write`'s buffer goes through `copy_from_user` **after** its capability check passes. +- [ ] `console_write` carries **two independent gates**: the capability check above (all builds) and the release debug-gate — absent (returns `BadSyscallNumber`) in non-debug builds (mechanism chosen here, recorded in §Design notes). +- [ ] Host ABI encode/decode tests cover: number decode (incl. `0`/out-of-range), the debug-console **capability-check-fails** path, `From`/`From` round-trips, `RecvOutcome`+`Message`+`Option` register packing, and copy-from/to-user range validation (in-range, out-of-range, zero-length, wrap). +- [ ] QEMU smoke: an EL1 kernel-stub issues an `SVC` (taking the current-EL `0x200` sync vector) and the trace shows the round-trip (and, for `console_write` with a granted debug-console cap, the emitted bytes). New `UNSAFE-YYYY-NNNN` audit entry for the trap-frame asm. **(The real EL0 `0x400` round-trip is B6's smoke, not this task's.)** +- [ ] All gates green incl. `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt`. + +## Out of scope + +- A real EL0 task taking the trap, the per-task EL0 context register file, kernel mappings in the userspace AS, and therefore the **runtime proof of the lower-EL `0x400` vector + EL0↔EL1 transition + userspace-AS copy-user** — Phase B6 + the [ADR-0033 high-half placeholder](../../../decisions/0027-kernel-virtual-memory-layout.md). (This task installs the `0x400` handler but only runtime-exercises the `0x200` current-EL path.) +- Granting the debug-console capability to a userspace task (this task defines the cap kind + the check; the grant-at-load wiring is B6) — Phase B6. +- The `tyrne-user` safe wrapper crate and the `userland/hello` binary — Phase B6. +- `notify` / capability-management / address-space syscalls — not in the [ADR-0031](../../../decisions/0031-initial-syscall-set.md) v1 set. +- Full fault containment / supervisor endpoint (a crashing task's parent observes the fault) — Phase E per [phase-b §B5 flag K3-4](../../../roadmap/phases/phase-b.md#flags-to-resolve-during-b5). + +## Approach + +_(Settled at the ADR level; detailed approach filled when the task moves to In Progress.)_ The vector entry mirrors [T-012](T-012-exception-and-irq-infrastructure.md)'s trampoline discipline (save GPRs to a frame, call Rust, restore, `ERET`); the dispatcher is a `match` over the decoded number into thin handlers over `ipc_send`/`ipc_recv`/`yield_now`/console/terminate; copy-from/to-user walks the active translation to bound-check before any access. The §Simulation tables in [ADR-0030](../../../decisions/0030-syscall-abi.md#simulation) and [ADR-0031](../../../decisions/0031-initial-syscall-set.md#simulation) are the row-by-row spec; this task discharges all rows except ADR-0030 row 3 (T-020's). + +## Definition of done + +All acceptance criteria checked; gates green (incl. Miri); audit-log entry added; `current.md` updated; **security-relevant** — flagged for explicit security review per CLAUDE.md. + +## Design notes + +- _(Filled when the task moves to In Progress — debug-gate mechanism choice, trap-frame layout, copy-user bound-check strategy.)_ + +## Review history + +- _(filled on close)_ diff --git a/docs/decisions/0030-syscall-abi.md b/docs/decisions/0030-syscall-abi.md new file mode 100644 index 0000000..a98059e --- /dev/null +++ b/docs/decisions/0030-syscall-abi.md @@ -0,0 +1,221 @@ +# 0030 — Syscall ABI and userspace error taxonomy + +- **Status:** Proposed +- **Date:** 2026-05-29 +- **Deciders:** @cemililik + +## Context + +[Phase B § B5 — Syscall boundary](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) opens the first synchronous entry path from EL0 (userspace) into EL1 (kernel). The EL-drop machinery ([ADR-0024](0024-el-drop-policy.md), T-013) and the exception-vector table ([T-012](../analysis/tasks/phase-b/T-012-exception-and-irq-infrastructure.md), [`exceptions.md`](../architecture/exceptions.md)) already exist; what is missing is the **contract** a userspace binary and the kernel agree on when control crosses that boundary: which register carries the syscall number, which registers carry arguments, how a result and an error are returned, and what the userspace-facing error space looks like. + +This ADR must be settled **before** any dispatcher or trap trampoline is written, because every later artefact rides on it: the EL0-side `SVC` stub in the future `tyrne-user` crate (B6), the EL1-side register save/restore frame, the dispatcher's argument decode, and the host-side ABI encoder/decoder tests. Choosing the convention after the trampoline lands means re-churning all of them. The same front-loading discipline that [ADR-0027](0027-kernel-virtual-memory-layout.md) (MMU activation) and [ADR-0029](0029-initial-userspace-image-format.md) (image format) applied to their boundaries applies here. + +A second, tightly-coupled question lands in this ADR by deliberate bundling (the **K2-5** roadmap item): the **userspace error taxonomy**. Today [`IpcError::InvalidCapability`](../../kernel/src/ipc/mod.rs) collapses three distinct failure modes — a stale/absent handle, a wrong-kind object, and a missing right — behind one variant. The [2026-04-21 Phase-A code review](../analysis/reviews/code-reviews/2026-04-21-tyrne-to-phase-a.md) flagged this as a future improvement; the [error-handling standard §"Error-type design checklist"](../standards/error-handling.md) says each variant must "represent a distinct case a caller could handle differently." The moment a syscall returns these errors to userspace is the moment the distinction becomes load-bearing — a language binding wants to map a stale handle (use-after-free; do not retry) differently from a missing right (permission denied; acquire the right) differently from a wrong-kind handle (a type error in the caller). Designing the syscall error space and splitting the in-kernel `IpcError` **in the same ADR** keeps the two spaces in agreement "from the start", which is exactly what the [phase-b §B5 sub-breakdown](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) requires. + +The stakes of getting the ABI wrong are bounded but cross-cutting: the register convention is the widest interface in the system (every syscall, every userspace binding) and is hard to change once binaries depend on it. The stakes of getting the taxonomy wrong are lower (`IpcError` is `#[non_exhaustive]`, so it can grow), but the security review's redaction concern (the companion `Capability` `Debug` redaction — B5 sub-item 6 / K3-9 — landed in [T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md) and discussed under §"Security of the taxonomy split" below) means the error space is part of the userspace-observable surface and deserves a deliberate decision rather than incremental drift. + +## Decision drivers + +- **AAPCS64 alignment.** Userspace stubs and the kernel dispatcher are both compiled by the same Rust/LLVM aarch64 backend. A convention that reuses the procedure-call register roles (`x0`–`x7` argument/result, `x8` indirect-result/syscall slot by Linux precedent, `x19`–`x29` callee-saved) minimises impedance: the EL0 stub can be a thin `asm!` wrapper and the EL1 side can hand the saved frame to a normal Rust function. Fighting AAPCS64 costs hand-written shuffling on both sides. +- **Panic-free return of every error.** The [dispatcher must be panic-free on every untrusted input](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) (B0's hardening pattern). The error-return encoding must therefore be able to represent *every* failure as a value in a register — never as a trap, never as a sentinel that aliases a valid result. This rules out "return `-1` and read a thread-local errno later" style schemes that need extra state. +- **Agreement between in-kernel and userspace error spaces.** The kernel already has granular per-module error enums ([`CapError`](../../kernel/src/cap/mod.rs) with `InvalidHandle` / `InsufficientRights` / `WrongKind`; [`IpcError`](../../kernel/src/ipc/mod.rs); `AddressSpaceError`; `LoadError`; `SchedError`). The syscall error space should *compose* from these via `From` impls per the [error-handling standard §3 / §7](../standards/error-handling.md), not re-invent a parallel flat space that drifts out of sync. The `IpcError` collapse is the one place where the in-kernel space is *less* granular than userspace needs, so it is split here. +- **Result/value disambiguation.** A capability kernel routinely returns a fresh `CapHandle` (an opaque `u32`-ish index) or a byte count from a syscall. A Linux-style "negative = -errno, non-negative = value" scheme forces every such return through a signedness reinterpretation and steals the high bit from the value space. A capability handle or a length that happens to look like `-EFAULT` is a latent confusion. A *dedicated status register* removes that ambiguity at the cost of one register. +- **Argument width.** The widest v1 syscall (`send`) needs an endpoint handle + a message label + three payload words + an optional transfer handle = up to six argument words. The convention must carry at least six arguments without spilling to the stack, because copy-from-user of a stack-spilled argument block is exactly the unvalidated-pointer hazard B5 sub-item 5 exists to prevent. +- **Forward-portability.** The convention is chosen for aarch64 but should not bake in anything that a second architecture (a future RISC-V or x86-64 port) could not mirror with its own register file. "Syscall number in a general register, args in argument registers, status + payload in result registers, a synchronous trap instruction" is a shape every architecture can express; "syscall number in the `SVC` 16-bit immediate" is aarch64-specific. +- **No information leak that aids forgery.** Splitting `InvalidCapability` reveals *which* validation step failed. For a capability system this is safe (see §Decision outcome → "Security of the taxonomy split"), but the decision must say *why* explicitly rather than assume it. + +## Considered options + +### Syscall-number location + +1. **Number in `x8`, args in `x0`–`x5`, `SVC #0`** (Linux-aarch64 shape). A general register carries the syscall number; the `SVC` immediate is always `0`. +2. **Number in the `SVC` immediate (`SVC #n`), args in `x0`–`x7`.** The trap instruction itself encodes the syscall. +3. **Hybrid: small "syscall class" in the `SVC` immediate, sub-number in `x8`.** Two-level dispatch. + +### Result / error encoding + +A. **Dedicated status register: `x0` = status word (0 = `Ok`, non-zero = stable error code), `x1`+ = payload.** Result and error never alias. +B. **Signed `x0`: negative = `-errno`, non-negative = value** (Linux). One register carries both. +C. **Condition-flag based: `PSTATE.C` set on error, `x0` = value-or-code.** The carry bit discriminates. + +## Decision outcome + +Chosen options: **Syscall-number Option 1** (`x8` = number, `x0`–`x5` = args, `SVC #0`) and **error-encoding Option A** (dedicated status register `x0`, payload in `x1`–`x7`), plus the **K2-5 `IpcError` split** and the **`SyscallError` composition type**. + +### Register calling convention (v1) + +| Register | On entry (EL0 → EL1) | On return (EL1 → EL0) | +|----------|----------------------|------------------------| +| `x8` | Syscall number (see [ADR-0031](0031-initial-syscall-set.md)) | clobbered | +| `x0`–`x5` | Argument words 0–5 (syscall-specific; see ADR-0031) | `x0` = **status** (`0` = `Ok`; non-zero = `SyscallError` code), `x1`–`x7` = return payload (syscall-specific; v1 uses at most `x1`–`x6`, for `recv`) | +| `x6`, `x7` | reserved (must be ignored by the kernel in v1) | clobbered | +| `x9`–`x18` | caller-saved scratch (AAPCS64) | clobbered | +| `x19`–`x29`, `SP_EL0`, `x30` (LR) | preserved by the kernel across the trap | preserved | +| `PSTATE` | — | restored from `SPSR_EL1` by `ERET` | + +The trap instruction is **`SVC #0`**. The `SVC` immediate is not used to carry information in v1 — keeping it `0` leaves the immediate free for a future fast-path class split (Option 3) without re-encoding existing stubs. The kernel reads the syscall number from `x8`, the arguments from `x0`–`x5`, validates the caller's capabilities, performs the operation, writes `x0` (status) and `x1`–`x7` (payload), and `ERET`s. No syscall reads or writes userspace memory implicitly: any pointer passed in an argument register is validated by copy-from/to-user against the **active** address space (B5 sub-item 5, T-021) before the kernel touches the bytes. + +The concrete per-syscall argument and return-register layout, and the concrete syscall numbers, are settled by [ADR-0031](0031-initial-syscall-set.md); this ADR fixes only the *convention* those layouts instantiate. + +### Error-return encoding and the `SyscallError` space + +`x0` is the **status word**: `0` means `Ok` (read the payload from `x1`–`x7`, syscall-specific); any non-zero value is a stable `SyscallError` discriminant and the payload registers are undefined. The kernel-side error type is: + +```rust +#[non_exhaustive] +#[derive(Copy, Clone, Debug, Eq, PartialEq)] +pub enum SyscallError { + BadSyscallNumber, // x8 named no syscall in the v1 set + BadArgument, // an argument was structurally invalid (e.g. label width) + FaultAddress, // a user pointer fell outside the active address space + Cap(CapError), // capability-table failure (composed via From) + Ipc(IpcError), // IPC failure (composed via From) + // address-space / loader variants land with their first syscall consumer +} +``` + +`SyscallError` is built by `From` / `From` impls per the [error-handling standard §7 "preserve root cause"](../standards/error-handling.md) — the dispatcher does not collapse distinct IPC failures into a generic "internal error". The flat numeric encoding (which non-zero integer each variant maps to) is a stable contract pinned by host ABI tests **when the dispatcher lands (T-021)**; this ADR fixes that the encoding *exists*, is stable, is composed (not re-invented), and that `0` is reserved for `Ok`. `SyscallError` is **not** introduced as dead code ahead of its first producer: the type lands in [T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md) alongside the dispatcher that constructs it, consistent with the codebase's no-speculative-surface discipline ([`CapRights::from_raw`](../../kernel/src/cap/rights.rs) §"Forward-API note C1-007" is the analogous "land it with its first ABI consumer" precedent). + +### The K2-5 `IpcError` split (lands now, in T-020) + +The one in-kernel error that is *less* granular than the userspace space needs is split immediately, because it is pure-Rust and host-testable without any trampoline: + +`IpcError::InvalidCapability` → **`IpcError::StaleHandle`** + **`IpcError::MissingRight`** + **`IpcError::WrongObjectKind`**. + +The mapping at each validation site: + +- **`StaleHandle`** — the capability handle did not resolve in the table (`lookup` failed), or the named kernel object was destroyed (arena `get`/`get_mut` returned `None` after a generation bump). "The reference is dead; re-acquire, do not retry." +- **`WrongObjectKind`** — the capability resolved but names the wrong kind of object for the operation (e.g. a `Notification` cap handed to `ipc_send`). "Programming error in the caller; you passed the wrong handle." +- **`MissingRight`** — the capability resolved and is the right kind, but does not carry the right the operation requires (`SEND` / `RECV` / `NOTIFY`). "You hold the object but lack authority; obtain the right." + +The validation **order** is `StaleHandle → WrongObjectKind → MissingRight` (resolve → type-check → authority-check). This is the natural diagnostic precedence — a wrong-kind handle is a more fundamental error than a missing right — and it matches [`CapError`](../../kernel/src/cap/mod.rs)'s existing `InvalidHandle` / `WrongKind` / `InsufficientRights` shape so the two spaces read the same way. Re-ordering is observable only for a capability that is *both* wrong-kind and missing-right; all existing rights-failure tests use correct-kind capabilities and therefore continue to return `MissingRight` unchanged. + +There are **two distinct `StaleHandle` sources with different precedence**. The first — a *cap-table* `lookup` miss — is the resolve step and ranks first, ahead of kind and rights. The second — an *arena* staleness miss (the cap resolves and is the right kind, but the named kernel object was destroyed and `arena.get` returns `None`) — is structurally a *later* check: it needs an already-resolved, kind-checked, **rights-authorized** handle to index the arena, so in the operation bodies (`ipc_send` / `ipc_recv` / `ipc_notify` / `ipc_cancel_recv`) it runs *after* the rights check. Consequently a capability that is the right kind, lacks the right, **and** names a destroyed object reports `MissingRight`, not `StaleHandle`. This is intentional and harmless: both facts concern the caller's own handle, and "you don't hold the right" is the more actionable answer; the strict `StaleHandle`-first total order applies to the *resolve* step, not to the post-authorization arena-liveness re-check. + +`IpcError::InvalidTransferCap` is **not** split in this ADR. Its collapse (`InvalidHandle` vs `HasChildren`, documented as note C3-008 in [`ipc/mod.rs`](../../kernel/src/ipc/mod.rs)) is a separate, transfer-side distinction; `IpcError` is `#[non_exhaustive]`, so a `TransferCapHasChildren` variant can be added by a later ADR without a breaking change when a userspace transfer consumer needs it. Splitting it now would be speculative. + +### Security of the taxonomy split + +Collapsing the three failure modes was originally defended as attacker-resistance — the caller cannot learn *which* check failed. Splitting reverses that, so the decision must justify why it is safe: + +A capability table is **per-subject and unforgeable** ([ADR-0014](0014-capability-representation.md)). A task only ever sees handles into its **own** table; generation tags prevent a stale handle from aliasing a live one. The three new variants therefore reveal only facts about handles the caller already possesses and controls: that *its own* handle is stale, names the wrong kind, or lacks a right. None of this helps forge a capability, enumerate another subject's caps, or mount a confused-deputy attack — the information is about the caller's own authority, not the kernel's secrets. The diagnostic value (a genuine [error-handling §checklist](../standards/error-handling.md) "handleable distinction") therefore dominates the negligible residual leak. This is the explicit trade ADR-0030 accepts. (The *object identity* a capability names — the slot index / generation — remains redacted from `Capability`'s `Debug` impl per B5 sub-item 6 / K3-9, landed in T-020; the taxonomy split does not expose it.) + +### Simulation + +A worst-case EL0 `SVC` handshake through the chosen convention (`(state-pre, action, state-post, observable)`): + +| Step | State pre | Action | State post | Observable effect | +|------|-----------|--------|------------|-------------------| +| 0 | EL0 task running; `x8`=NR, `x0`–`x5`=args | `SVC #0` | `ELR_EL1`←PC+4, `SPSR_EL1`←PSTATE, `PSTATE.EL`←1, `PC`←`VBAR_EL1`+0x400 (lower-EL aarch64 sync) | trap into EL1 sync vector; **no kernel state mutated yet** | +| 1 | EL1 sync vector; user GPRs live | save `x0`–`x30` + `SP_EL0` to the trap frame; read `ESR_EL1.EC`; `0b010101` (SVC64) → syscall dispatch, else → fault routing (B5 out-of-scope) | trap frame holds user regs; Rust dispatcher entered with `(nr=x8, args=x0..x5)` | dispatcher runs at EL1 on the kernel stack | +| 2 | dispatcher; `nr` not in the v1 set | `decode(nr)` → `None` | `x0`←`SyscallError::BadSyscallNumber` code; payload registers undefined | **panic-free**; no capability touched; falls through to ERET | +| 3 | dispatcher; `nr`=`send`; `ep_cap` stale / wrong-kind / missing `SEND` | `ipc_send` → `Err(IpcError::{StaleHandle\|WrongObjectKind\|MissingRight})` → `From` → `SyscallError::Ipc(_)` | `x0`←non-zero status; payload undefined; **endpoint + caller-table state unchanged** (pre-flight returns before mutation) | typed error, no panic, observable state byte-identical to pre-call | +| 4 | dispatcher; `nr`=`console_write(cons_cap, ptr, len)`; `cons_cap` valid but `[ptr,ptr+len)` not mapped in the active AS | capability check on `cons_cap` passes (debug-console cap, per [ADR-0031](0031-initial-syscall-set.md)); then `copy_from_user` validates the range against the active translation → out of range | `x0`←`SyscallError::FaultAddress`; **no raw deref of `ptr`** | panic-free; capability-gated; the kernel never dereferences an unvalidated user pointer | +| 5 | dispatcher; `nr`=`send`; ok | `ipc_send` → `Ok(outcome)`; restore frame | `x0`←`0` (`Ok`); `x1`←outcome encoding; `ERET` | return to EL0; `SPSR_EL1`/`ELR_EL1` restore PSTATE+PC; results in `x0`/`x1` | + +#### Simulation row-to-verification mapping + +Per the [`write-adr` skill §Procedure step 5 sub-bullet](../../.agents/skills/write-adr/SKILL.md), each row maps to a concrete verification artefact in an implementing task: + +- **Row 3 (IPC error taxonomy) → [T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** host tests: the per-variant `ipc_send`/`ipc_recv`/`ipc_notify`/`ipc_cancel_recv` tests pinning `StaleHandle` / `WrongObjectKind` / `MissingRight`, plus the `From for SyscallError` round-trip test (the latter lands with `SyscallError` in T-021). T-020 discharges this row **now**. +- **Rows 0, 1 (`SVC` trap + register frame save/restore) — split across two milestones.** The dispatcher and trap-frame asm are **shared** code that is privilege-entry-agnostic; the *only* difference between the two entry paths is which `VBAR_EL1` slot the handler is installed at and which mode `SPSR_EL1` restores. The table's row 0 describes the **real EL0 ABI** path — an `SVC` from EL0 takes the **lower-EL-AArch64** sync vector at `VBAR_EL1 + 0x400` and `ERET`s back to EL0. **A real EL0 task cannot take this trap until B6** (it needs kernel mappings in its address space so the vector fetch translates, plus an EL0-ready context register file — both gated on the [ADR-0033 high-half placeholder](0027-kernel-virtual-memory-layout.md)). So **rows 0/1 are runtime-verified in B6**, not B5. **T-021's B5-reachable proxy** is an **EL1 kernel-stub** that issues `SVC #0`, which — because it executes at the *current* EL — takes the **current-EL-with-SPx** sync vector at `VBAR_EL1 + 0x200`, **not** the lower-EL vector. T-021 therefore installs the same dispatcher at *both* the `0x200` and `0x400` sync slots, host-tests the dispatcher logic directly, and smoke-tests the shared trap-frame-save → decode → `ERET` mechanism via the `0x200` self-`SVC` path (new `UNSAFE-YYYY-NNNN` audit entry). What the `0x200` proxy does **not** prove — the `0x400` vector entry itself, the EL0↔EL1 privilege transition, and copy-user against a *separate* userspace `TTBR0_EL1` AS — is exactly what B6's real EL0 round-trip closes. +- **Row 2 (unknown number → `BadSyscallNumber`) → T-021**: host dispatcher decode test. +- **Row 4 (capability check + copy-from-user bounds) → T-021**: host copy-from/to-user range-validation tests + the debug-console capability check ([ADR-0031](0031-initial-syscall-set.md)). +- **Row 5 (success `ERET`) → T-021** for the `0x200` proxy round-trip (host ABI encode/decode + smoke); the real EL0 `0x400` round-trip → **B6**. + +T-020 discharges row 3 in this milestone. T-021 discharges rows 2/4 and the *mechanism* half of rows 0/1/5 (via the `0x200` current-EL proxy + host tests). The *real-EL0* half of rows 0/1/5 (the `0x400` lower-EL vector + privilege transition + userspace-AS copy-user) is runtime-verified in **B6**, when a real EL0 task first exists. No row is left without a named artefact, and no row's verification is over-claimed for the milestone it lands in (avoiding both the skill's "Simulation table without verification = documentation drift" anti-pattern and an over-stated B5 smoke). + +### Dependency chain + +For this decision to be **fully** in effect: + +```text +1. Split IpcError::InvalidCapability → StaleHandle / MissingRight / WrongObjectKind + across ipc/mod.rs + sched/mod.rs + tests; redact Capability's Debug (K3-9). — T-020 (opens with this ADR) +2. SyscallError composition type + From/From impls. — T-021 (opens with this ADR) +3. EL0-sync exception-vector entry + user-register trap frame save/restore. — T-021 +4. Panic-free syscall dispatcher (x8 decode → handler → x0/x1.. encode). — T-021 +5. copy-from/to-user validated against the active address space. — T-021 +6. The concrete v1 syscall set + per-call register layout + numbers. — ADR-0031 (opens with this ADR) +7. Kernel mappings in the userspace AS + EL0-ready Task context register file + (so a real EL0 task can take the trap and the EL1 vector fetch translates). — ADR-0033 (high-half; + slot reserved, opens with + the first per-task TTBR0 swap) +8. tyrne-user safe syscall wrapper crate + first EL0 caller. — Phase B6 (deferred) +``` + +Steps 1–2 + 6 are grounded in tasks/ADRs opened in the same commit set as this ADR (T-020, T-021, ADR-0031), per [ADR-0025 §Rule 1](0025-adr-governance-amendments.md). Steps 3–5 are T-021's scope. Step 7 is the [ADR-0033 high-half placeholder](0027-kernel-virtual-memory-layout.md) (named-but-unallocated, the same forward-flag shape [ADR-0029](0029-initial-userspace-image-format.md) used for ADR-0034) — until it lands, the syscall path is exercised by an **EL1 kernel-stub caller** (B5 acceptance criterion #7), not a real EL0 task. Step 8 is the natural B6 work and is not opened today. + +## Consequences + +### Positive + +- **The widest interface in the system is settled once, before any code depends on it.** The EL0 stub, the EL1 frame, the dispatcher, the host ABI tests, and the `tyrne-user` crate all instantiate one written convention. No "decide the ABI by what the first trampoline happened to do" drift. +- **Result and error never alias.** The dedicated status register means a `CapHandle`, a byte count, or a `Message` word returned in `x1` can take any bit pattern — including ones that would look like `-EFAULT` under a signed-errno scheme — without ambiguity. This removes a whole class of latent confusion that Linux's convention carries. +- **The in-kernel and userspace error spaces agree from the start.** `SyscallError` composes from `CapError` / `IpcError` via `From`; the `IpcError` split removes the one place where the kernel was coarser than userspace needs. A binding can map every failure to a distinct, handleable cause. +- **The taxonomy split is immediately useful and immediately testable.** Splitting `InvalidCapability` is pure-Rust; T-020 lands it with host tests in this milestone, well ahead of the trampoline, so the error space is exercised by the existing IPC test suite before the first syscall exists. +- **AAPCS64 reuse keeps both sides thin.** The EL0 stub is a register-load + `SVC #0`; the EL1 side hands a saved frame to ordinary Rust. No bespoke marshalling. + +### Negative + +- **Six argument registers cap the v1 syscall arg width.** A syscall needing more than six word-sized arguments must pass a pointer to an argument block — which then needs copy-from-user validation. *Mitigation:* the v1 set ([ADR-0031](0031-initial-syscall-set.md)) is designed to fit in six registers (`send` is the widest at six). When a wider syscall surfaces (B6+), it uses the already-required copy-from-user path; no ABI change. +- **One register spent on status.** Option B (signed errno) would free `x0` to carry a value. *We accept this cost* because the disambiguation it buys (no value/-errno aliasing) is worth one register in a 31-register file, and a capability kernel returns opaque handles constantly. +- **The taxonomy split reveals which validation step failed.** A marginal information leak versus the prior collapse. *We accept this* on the §"Security of the taxonomy split" argument: the facts revealed are about the caller's own per-subject, unforgeable handles and aid no forgery or enumeration. The decision is explicit, not incidental. +- **`SyscallError`'s concrete numeric encoding is deferred to T-021.** This ADR fixes the convention but not the integers. *Mitigation:* the integers are a stable contract the moment T-021's host ABI tests pin them; deferring avoids committing numbers before the dispatcher's shape is concrete, and `0 = Ok` (the only number userspace branches on structurally) is fixed here. + +### Neutral + +- **`SVC #0` leaves the immediate free.** A future fast-path "syscall class" split (Option 3) can use the `SVC` immediate without re-encoding v1 stubs, because v1 stubs all use `#0`. +- **The convention is aarch64-shaped but architecture-portable in spirit.** "Number in a GPR, args in arg regs, status+payload in result regs, synchronous trap" is expressible on RISC-V (`ecall`, `a7`=nr, `a0`–`a5` args) and x86-64 (`syscall`, `rax`=nr) with the same structure; only the register names change. No second-architecture ADR is forced today. +- **`x6`/`x7` reserved, not used.** v1 ignores them; reserving rather than repurposing keeps room for a seventh/eighth argument or a future flags word without a convention break. + +## Pros and cons of the options + +### Syscall-number Option 1 — number in `x8`, `SVC #0` (chosen) + +- **Pro:** Matches Linux-aarch64, so the mental model and any borrowed tooling/disassembly intuition transfer; `x0`–`x7` stay free for arguments/results. +- **Pro:** The `SVC` immediate stays `0` and free for a future class split. +- **Pro:** Architecture-portable shape (a GPR carries the number). +- **Con:** Costs one register (`x8`) that a pure-immediate scheme would leave free for an argument. + +### Syscall-number Option 2 — number in the `SVC` immediate + +- **Pro:** Frees `x8`; the trap instruction self-documents the call in a disassembly. +- **Con:** The `SVC` immediate is 16 bits and **aarch64-specific** — a second-architecture port cannot mirror it (RISC-V `ecall` / x86-64 `syscall` carry no immediate), forcing a divergent convention later. +- **Con:** The number is baked into the instruction stream, so a syscall stub cannot compute its number at runtime (e.g. a generic `syscall(nr, ...)` shim) — every syscall needs its own hand-written `SVC #n`. + +### Syscall-number Option 3 — hybrid class + sub-number + +- **Pro:** Enables a fast-path class (e.g. a register-only `send`) without a table lookup. +- **Con:** Two-level dispatch is premature for a five-syscall v1 set; pure over-engineering today. *Kept available* by Option 1's free `SVC` immediate. + +### Error-encoding Option A — dedicated status register (chosen) + +- **Pro:** Result and error never alias; payload registers carry any bit pattern. +- **Pro:** Maps one-to-one onto Rust `Result` on both sides. +- **Con:** Spends one register on status. + +### Error-encoding Option B — signed `-errno` in `x0` + +- **Pro:** Frees a register; the universal Unix convention. +- **Con:** Steals the high bit / negative range from every value return; a handle or length that aliases `-errno` is a latent bug. Bad fit for a kernel that returns opaque handles constantly. + +### Error-encoding Option C — condition-flag (`PSTATE.C`) + +- **Pro:** Frees `x0` entirely for the value. +- **Con:** `PSTATE` is restored from `SPSR_EL1` by `ERET`; threading a result flag through the saved-PSTATE path is fragile and easy to clobber. Architecture-specific and error-prone. Rejected. + +## References + +- [Phase B §B5 — Syscall boundary](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) — milestone scope, including the K2-5 bundle and the panic-free-dispatcher requirement. +- [ADR-0024 — EL drop to EL1 policy](0024-el-drop-policy.md) — the EL0/EL1 privilege boundary this ABI crosses. +- [ADR-0031 — Initial syscall set](0031-initial-syscall-set.md) — the concrete v1 syscalls + per-call register layout that instantiate this convention. +- [ADR-0017 — IPC primitive set](0017-ipc-primitive-set.md) — the IPC operations whose error taxonomy is split here (see its §Revision notes rider). +- [ADR-0014 — Capability representation](0014-capability-representation.md) — per-subject unforgeable handles; the basis for the taxonomy-split security argument. +- [error-handling standard](../standards/error-handling.md) — §3/§7 `From`-composition, the design checklist's "handleable distinction" rule. +- [`docs/architecture/exceptions.md`](../architecture/exceptions.md) — the EL1 vector table the syscall vector slots into. +- [`docs/architecture/security-model.md`](../architecture/security-model.md) — the userspace→kernel trust boundary. +- [Linux aarch64 syscall ABI](https://www.kernel.org/doc/html/latest/arm64/) — `x8`=number, `x0`–`x5` args, `x0` return (Option 1 / Option B prior art). +- [ARM ARM §D1 "The AArch64 System Level Programmers' Model"](https://developer.arm.com/documentation/ddi0487/latest) — `SVC`, `ESR_EL1.EC`, `ELR_EL1` / `SPSR_EL1` / `ERET` semantics. +- [Procedure Call Standard for the Arm 64-bit Architecture (AAPCS64)](https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst) — caller/callee-saved register roles. +- [seL4 manual §"System Calls"](https://sel4.systems/Info/Docs/seL4-manual-latest.pdf) — message-register / dedicated-status-word prior art for a capability kernel. diff --git a/docs/decisions/0031-initial-syscall-set.md b/docs/decisions/0031-initial-syscall-set.md new file mode 100644 index 0000000..c51f2fe --- /dev/null +++ b/docs/decisions/0031-initial-syscall-set.md @@ -0,0 +1,163 @@ +# 0031 — Initial syscall set (B-phase) + +- **Status:** Proposed +- **Date:** 2026-05-29 +- **Deciders:** @cemililik + +## Context + +[ADR-0030](0030-syscall-abi.md) settles *how* a syscall is made — the register calling convention (`x8` = number, `x0`–`x5` = arguments, `x0` = status, `x1`–`x7` = payload), the `SVC #0` trap, and the `SyscallError` space. This ADR settles *which* syscalls exist in the B-phase and the concrete per-call register layout each one instantiates. + +The set must be **small**. [Phase B § B5](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) names the floor and the ceiling in one sentence: "At minimum: `send`, `recv`, `console_write` (debug-gated), `task_yield`, `task_exit`. No more in v1." The reasoning is the same "smallest shape that works now" discipline ADR-0029 (raw-flat image) and ADR-0035 (bitmap PMM) applied: every syscall is a permanent piece of the userspace ABI surface and a panic-free dispatch path the kernel must keep correct forever. Adding a syscall is cheap to write and expensive to ever remove, so v1 ships exactly the calls B6's first "hello" userspace task needs to do useful work and exercise the boundary — and not one more. + +What B6's first userspace task must be able to do: print to the serial console (`console_write`), exit cleanly so the kernel reclaims it (`task_exit`), and — to prove the IPC path works end-to-end across the privilege boundary, not just kernel-internally — send and receive on an endpoint (`send` / `recv`). `task_yield` rounds out cooperative multitasking from EL0. Capability-management syscalls (`cap_copy` / `cap_derive` / `cap_revoke`), `notify`, and address-space `map`/`unmap` are deliberately **not** exposed in v1 — no v1 userspace consumer needs them, and the kernel-internal surfaces remain reachable only from EL1. + +The stakes: a syscall number, once a userspace binary depends on it, is part of a stable contract. Getting the *set* wrong (too large) is unused attack surface in the dispatcher; getting it too small blocks B6. Getting a per-call layout wrong means re-churning the `tyrne-user` wrapper. The decision is bounded because the numbers and layouts are pinned by host ABI tests when the dispatcher lands ([T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md)), so an error is caught mechanically rather than at runtime. + +## Decision drivers + +- **B6 sufficiency.** The set must be exactly enough for B6's "hello from userspace" + clean exit + an IPC round-trip, and no more. See [phase-b §B6](../roadmap/phases/phase-b.md#milestone-b6--first-userspace-hello). +- **Panic-free dispatch surface.** Every syscall is a handler the dispatcher must keep panic-free on all untrusted input ([ADR-0030](0030-syscall-abi.md), B0 hardening). Fewer syscalls = smaller audited surface. +- **Register-budget fit.** Each call's arguments must fit in `x0`–`x5` (six words) and its results in `x0`–`x7` per [ADR-0030](0030-syscall-abi.md), without spilling to a stack-passed block that would need its own copy-from-user validation. The widest call drives the budget. +- **Reuse of existing kernel surfaces.** A syscall handler should be a thin validator + a call into an existing kernel primitive (`ipc_send` / `ipc_recv` / scheduler `yield_now` / the console HAL), not new subsystem logic. The syscall layer adds the EL0 boundary, not new capability semantics. +- **Defence-in-depth on the number space.** An uninitialised `x8` (zero) must not accidentally name a real syscall; a release build must not expose a debug-only console. +- **No capability authority widening.** The syscalls expose operations the caller's capabilities *already* authorise; the syscall layer is a gate, never a new grant. `console_write` is the one debug affordance and is gated accordingly. + +## Considered options + +1. **The phase-b floor set: `send`, `recv`, `console_write`, `task_yield`, `task_exit` (five).** Exactly what B6 needs. +2. **A larger "useful from day one" set** adding `notify`, `cap_copy` / `cap_derive` / `cap_revoke`, and address-space `map` / `unmap` — so userspace can manage its own capabilities and memory without a later ABI bump. +3. **An ultra-minimal set: `console_write` + `task_exit` (two).** The absolute floor to make B6's greeting + exit work, deferring even IPC from EL0 to a later milestone. + +## Decision outcome + +Chosen option: **Option 1 — the five-syscall phase-b floor set.** + +It is the smallest set that lets B6's first userspace task both *do* something observable (`console_write`, `task_exit`) and *exercise the boundary that B5 exists to build* (`send` / `recv` cross the EL0→EL1 line, proving capability-gated IPC works from userspace, not just kernel-internally). `task_yield` makes cooperative multitasking reachable from EL0 with a near-zero-cost handler. Option 3 is too small — it would ship a syscall boundary that never carries an IPC message, leaving the most security-relevant path (capability-gated `send`/`recv` from untrusted EL0) unexercised until a later milestone, which defeats the point of building the boundary now. Option 2's extra calls have **no v1 consumer**; each would be unused dispatch surface to keep panic-free, and `IpcError`/`CapError` are `#[non_exhaustive]`, so the set can grow without breaking the ABI when a real consumer appears. + +### Syscall table (v1) + +Numbers instantiate [ADR-0030](0030-syscall-abi.md)'s convention: `x8` = number, arguments in `x0`–`x5`, `x0` = status (`0` = `Ok`), payload in `x1`–`x7`. **Number `0` is reserved-invalid** (an uninitialised `x8` must fault, not dispatch) and always returns `SyscallError::BadSyscallNumber`. The integers `1`–`5` below are a **fixed decision**, not tentative: as an Accepted ABI ADR, this table *is* the contract. T-021's host tests **regression-verify** these numbers and layouts; they do not get to choose them. + +| `x8` | Name | Arguments (`x0`…) | Returns (`x0`=status, then payload) | Capability checked | Backing primitive | +|------|------|-------------------|--------------------------------------|--------------------|-------------------| +| `0` | *(reserved-invalid)* | — | always `BadSyscallNumber` | — | — | +| `1` | `send` | `x0`=ep cap handle, `x1`=`msg.label`, `x2..x4`=`msg.params[0..3]`, `x5`=transfer cap handle (or the reserved **null-handle sentinel** = "no transfer") | `x1`=`SendOutcome` (`0`=Delivered, `1`=Enqueued) | endpoint cap (`SEND`) | [`ipc_send`](../../kernel/src/ipc/mod.rs) | +| `2` | `recv` | `x0`=ep cap handle | `x1`=`RecvOutcome` (`0`=Received, `1`=Pending), `x2`=`msg.label`, `x3..x5`=`msg.params[0..3]`, `x6`=transferred cap handle (or null sentinel if none) | endpoint cap (`RECV`) | [`ipc_recv`](../../kernel/src/ipc/mod.rs) | +| `3` | `task_yield` | — (args ignored) | *(no payload)* | self (current task) | scheduler `yield_now` | +| `4` | `task_exit` | `x0`=exit code | **does not return** to the caller | self (current task) | scheduler task-termination (B6) | +| `5` | `console_write` | `x0`=debug-console cap handle, `x1`=user VA of byte buffer, `x2`=length | `x1`=bytes written | debug-console cap (write) | console HAL `write_bytes`, via copy-from-user | + +Notes that bind the table: + +- **`send` / `recv` carry the `Message` in registers**, not via a user-pointer buffer. A `Message` is four words (`label` + three `params`); register-passing fits the [ADR-0030](0030-syscall-abi.md) budget (`send` uses `x0`–`x5` for args; `recv` returns in `x1`–`x6`) and avoids a copy-from/to-user round-trip on the common small-message path. When messages grow past the register budget (post-v1), a pointer-buffer variant lands without disturbing these numbers. The **null-handle sentinel** that means "no transfer" / "no cap received" is a reserved `CapHandle` value no live handle can take; its exact bit pattern is T-021's encoder detail (it must round-trip with `Option`). +- **Every syscall that names a *separate* object is capability-gated, per [P1 / P4](../standards/architectural-principles.md).** `send` / `recv` check the endpoint capability (`SEND` / `RECV`); `console_write` checks a **debug-console capability** (its first argument, `x0`). `task_yield` / `task_exit` act on the **caller's own task** — the kernel identifies the caller from its trusted current-task pointer (set at dispatch, not a forgeable argument). This is the caller's inherent authority over its own execution thread, not ambient authority over another object, so these two take no object-capability argument; the trust-boundary check P4 demands is "is there a valid current task?" (always true on the syscall path) plus the kernel never letting the caller name a *different* task. No syscall reaches a privileged effect on another object without a capability. +- **`console_write` is capability-gated *and* debug-gated** — two independent gates. (1) **Capability gate (authority):** the caller must hold a debug-console capability (arg `x0`); the dispatcher validates it (resolves, kind = debug-console, carries the write right) before any output, returning a typed `SyscallError` otherwise — this is the [P1 / P4](../standards/architectural-principles.md)-mandated authority check, present in *all* builds. The concrete `CapObject` kind for the debug console and its grant-at-load wiring are [T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md) (object + check) and B6 (grant to the first userspace task). (2) **Debug gate (defence-in-depth):** in a non-debug build the dispatcher additionally treats number `5` as unknown and returns `BadSyscallNumber`, so the debug console is *absent* from the production syscall surface even for a holder of the capability. The exact debug-gate mechanism (`cfg!(debug_assertions)` arm vs. a Cargo feature) is T-021's implementation choice; the two-gate *contract* is fixed here. **`console_write` is the only syscall that takes a user pointer**: its handler validates `[ptr, ptr+len)` against the **active** address space via copy-from-user before touching a byte (B5 sub-item 5); it never dereferences the raw pointer. +- **`task_exit` does not return.** Control does not come back to the caller, so the ABI defines no return value for it. Its real semantics — mark the EL0 task terminated, drop its context, dispatch the next ready task — depend on the per-task EL0 context register file that does not exist until B6 (gated on the [ADR-0033 high-half placeholder](0027-kernel-virtual-memory-layout.md)). T-021 implements the dispatch and a kernel-stub stand-in; the real EL0-task termination lands with B6's first userspace task. +- **`task_yield` always succeeds in v1** (status `Ok`); it is a thin EL0-reachable wrapper over the scheduler's cooperative `yield_now`, acting on the caller's own task. + +### Simulation + +Representative invocations walking the [ADR-0030](0030-syscall-abi.md) convention (`(state-pre, action, state-post, observable)`): + +| Step | State pre | Action | State post | Observable effect | +|------|-----------|--------|------------|-------------------| +| 0 | caller: `x8`=5 (`console_write`), `x0`=valid debug-console cap, `x1`=buf VA, `x2`=len; **debug build**; buf mapped in active AS | dispatch → **cap check on `x0` passes** → copy-from-user validates `[buf,buf+len)` → console `write_bytes` | bytes emitted on serial | `x0`←`0` (`Ok`), `x1`←len; **no raw user-ptr deref** | +| 1 | caller: `x8`=5, `x0`=stale / wrong-kind / no-write debug-console cap | dispatch → **cap check on `x0` fails** before any output | unchanged | `x0`←typed `SyscallError` (`Cap`/`Ipc`-family); **console untouched — authority gate, all builds** | +| 2 | caller: `x8`=5, `x0`=valid cap, `x1`=buf VA, `x2`=len; **buf not mapped** in active AS | cap check passes; copy-from-user range check fails | unchanged | `x0`←`FaultAddress`; kernel never read the buffer | +| 3 | caller: `x8`=2 (`recv`), `x0`=ep cap; a sender already delivered `{label, params, cap}` | `ipc_recv` → `Ok(Received{msg, cap})`; install cap into caller table | endpoint `Idle`; cap in caller table | `x0`←`0`, `x1`←`0`(Received), `x2`←label, `x3..x5`←params, `x6`←new cap handle | +| 4 | caller: `x8`=4 (`task_exit`), `x0`=code | mark caller (the current task) terminated; dispatch next ready task | caller gone; scheduler runs another task | **no return** to caller; kernel reports termination (B6) | + +The second, independent **release debug-gate** (number `5` → `BadSyscallNumber` in non-debug builds, even for a capability holder) is not a separate row — it short-circuits dispatch ahead of row 0's cap check and is covered in the binding note above. + +#### Simulation row-to-verification mapping + +Per the [`write-adr` skill §Procedure step 5 sub-bullet](../../.agents/skills/write-adr/SKILL.md), all rows are discharged by **[T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md)** (the dispatcher task), because every row is a trampoline/dispatch behaviour: + +- Row 0 (`console_write` happy path, cap check passes) → T-021 host copy-from-user test + the QEMU kernel-stub-SVC smoke trace showing the emitted bytes. +- Row 1 (debug-console **capability check fails**) → T-021 host dispatcher test asserting a stale/wrong-kind/no-write cap yields a typed `SyscallError` with no console output (the [P1 / P4](../standards/architectural-principles.md) authority gate). +- Row 2 (`FaultAddress`) → T-021 host copy-from-user out-of-range test. +- Row 3 (`recv` register unpack) → T-021 host ABI encode/decode round-trip test over `RecvOutcome` + `Message` + `Option`. +- Row 4 (`task_exit` no-return) → T-021 dispatcher test (kernel-stub stand-in); real EL0 termination → B6. +- The release **debug-gate** → T-021 host dispatcher test asserting number `5` → `BadSyscallNumber` under `not(debug_assertions)`. + +The IPC *error-taxonomy* rows these syscalls inherit (a `send` to a stale/wrong-kind/no-`SEND` cap) are discharged by **[T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** per [ADR-0030 §Simulation row 3](0030-syscall-abi.md#simulation). The runtime EL0-vs-EL1 verification split (B5 kernel-stub via the current-EL `0x200` vector vs. B6 real EL0 via `0x400`) is recorded in [ADR-0030 §Simulation row-to-verification mapping](0030-syscall-abi.md#simulation). + +### Dependency chain + +For this decision to be **fully** in effect: + +```text +1. Syscall calling convention + SyscallError space. — ADR-0030 (opens with this ADR) +2. Panic-free dispatcher decoding x8 → one of {1..5}, else + BadSyscallNumber; number 0 reserved-invalid. — T-021 (opens with this ADR) +3. Handlers wiring each syscall to its backing primitive + (ipc_send/ipc_recv/yield_now/console write_bytes/terminate). — T-021 +4. Debug-console capability kind (CapObject) + the dispatcher's + capability check for console_write (the P1/P4 authority gate). — T-021 (object + check) +5. copy-from-user for console_write's buffer. — T-021 +6. The release debug-gate mechanism for console_write. — T-021 (design-notes choice) +7. EL0-ready Task context so task_exit/task_yield have a real EL0 + task to terminate/reschedule. — ADR-0033 (placeholder) + Phase B6 +8. Debug-console capability granted to the first userspace task. — Phase B6 +9. tyrne-user safe wrappers exposing these five calls. — Phase B6 (deferred) +``` + +Steps 1–6 are grounded in ADR-0030 + T-021, opened in the same commit set as this ADR per [ADR-0025 §Rule 1](0025-adr-governance-amendments.md). Steps 7–9 are explicit forward-flags (the [ADR-0033 high-half placeholder](0027-kernel-virtual-memory-layout.md) and B6), the same shape [ADR-0029](0029-initial-userspace-image-format.md) used for its deferred build-pipeline step. Until step 7 lands, the five syscalls are exercised by an EL1 kernel-stub caller (B5 acceptance criterion #7), not a real EL0 task. + +## Consequences + +### Positive + +- **The dispatch surface is minimal and fully audited.** Five real syscalls + one reserved-invalid number; every handler is a thin validator over an existing kernel primitive. Nothing to keep panic-free that no consumer needs. +- **The boundary is exercised, not just built.** `send`/`recv` from EL0 prove capability-gated IPC across the privilege line — the highest-value B5 test — rather than deferring it. +- **Every object-naming syscall is capability-gated; no ambient authority.** `send`/`recv` check the endpoint cap, `console_write` checks a debug-console cap, and `task_yield`/`task_exit` act only on the caller's own (trusted-current-task) identity — upholding [P1 / P4](../standards/architectural-principles.md) uniformly across the v1 set. +- **`0`-reserved + the release debug-gate are defence-in-depth on top of the capability gate.** An uninitialised syscall number faults; production builds drop `console_write` from the surface entirely even for a capability holder. +- **Register-passing keeps the common path allocation-free and copy-free.** Only `console_write` touches user memory, so only one handler carries the copy-from-user cost; `send`/`recv` stay register-only. +- **`#[non_exhaustive]` error spaces mean the set can grow safely.** Adding `notify` or `cap_*` later is additive — new numbers, new `From` paths — with no break to the v1 five. + +### Negative + +- **No userspace capability management in v1.** A v1 EL0 task cannot `cap_copy`/`derive`/`revoke` its own caps; it works only with the caps the loader/parent granted. *Mitigation:* no v1 userspace needs this; the kernel-internal cap operations remain available, and the syscalls land when a real consumer (a multi-task userspace service, post-B6) surfaces. +- **No `notify` from EL0.** An EL0 task cannot signal a notification. *We accept this* — v1's notification users are kernel-internal; the syscall is additive later. +- **`Message` is register-bound to four words.** A larger payload needs a pointer-buffer variant. *Mitigation:* the four-word `Message` is fixed by [ADR-0017](0017-ipc-primitive-set.md); a wider message is a separate ADR with its own syscall number, leaving these layouts intact. +- **`task_exit` semantics are only half-real in B5.** Until B6's EL0 context exists, `task_exit` is dispatcher-plumbing over a kernel-stub. *Mitigation:* the ABI shape ("does not return") is fixed now; the termination behaviour lands with the task it terminates. + +### Neutral + +- **Syscall numbers `0`–`5` are a fixed decision, not tentative.** As an Accepted ABI ADR, this table is the contract; T-021's host tests regression-verify the numbers and layouts, they do not choose them. +- **The release debug-gate *mechanism* is left to T-021** (a `cfg!(debug_assertions)` arm vs. a Cargo feature) — but the gate's *existence* and the capability check are both fixed decisions here. +- **A new `CapObject` kind (debug console) lands in T-021.** This is the smallest object addition that keeps `console_write` capability-gated; it is the first capability kind introduced by a syscall rather than by the kernel-object subsystem directly. +- **The set maps one-to-one onto the future `tyrne-user` crate's public API.** B6's wrapper crate exposes exactly these five. + +## Pros and cons of the options + +### Option 1 — five-syscall floor set (chosen) + +- **Pro:** Exactly B6's needs; smallest panic-free dispatch surface. +- **Pro:** Exercises the capability-gated IPC boundary from EL0 (the key B5 test). +- **Pro:** Register-only for four of five calls; one copy-from-user path. +- **Con:** No EL0 cap-management / `notify` in v1 (additive later; no v1 consumer). + +### Option 2 — larger "useful from day one" set + +- **Pro:** Userspace can manage its own caps + memory without a later ABI bump. +- **Con:** Every added call is unused dispatch surface that must be kept panic-free with **no v1 consumer** to validate it — speculative ABI. +- **Con:** Larger audited attack surface at the most security-sensitive boundary, for zero v1 benefit. +- **Con:** `#[non_exhaustive]` already makes growth non-breaking, so the "avoid a later bump" pro is moot. + +### Option 3 — ultra-minimal `console_write` + `task_exit` + +- **Pro:** Absolute smallest path to B6's greeting + exit. +- **Con:** Ships a syscall boundary that never carries an IPC message — the most security-relevant EL0→EL1 path (capability-gated `send`/`recv`) stays unexercised, defeating the purpose of building the boundary in B5. +- **Con:** `task_yield` is near-free to add and makes cooperative EL0 multitasking reachable; omitting it is false economy. + +## References + +- [Phase B §B5 — Syscall boundary](../roadmap/phases/phase-b.md#milestone-b5--syscall-boundary) — the floor/ceiling sentence this ADR implements. +- [Phase B §B6 — First userspace "hello"](../roadmap/phases/phase-b.md#milestone-b6--first-userspace-hello) — the consumer that justifies the set. +- [ADR-0030 — Syscall ABI and userspace error taxonomy](0030-syscall-abi.md) — the convention these calls instantiate. +- [ADR-0017 — IPC primitive set](0017-ipc-primitive-set.md) — `send`/`recv`/`notify` and the four-word `Message` shape. +- [ADR-0014 — Capability representation](0014-capability-representation.md) — the `CapHandle` the null-sentinel reserves against. +- [`docs/architecture/ipc.md`](../architecture/ipc.md) — the IPC operations `send`/`recv` wrap. +- [seL4 manual §"System Calls"](https://sel4.systems/Info/Docs/seL4-manual-latest.pdf) — minimal syscall-set prior art for a capability kernel. diff --git a/docs/decisions/README.md b/docs/decisions/README.md index 3b5a615..0201e09 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -58,11 +58,13 @@ Each ADR contains: | 0027 | [Kernel virtual memory layout (B2 — identity-mapped MMU activation)](0027-kernel-virtual-memory-layout.md) | Accepted | 2026-05-08 | | 0028 | [Address-space data structure (B3 — kernel-object + capability-gated `Mmu::map` wrappers + activation-on-context-switch)](0028-address-space-data-structure.md) | Accepted | 2026-05-11 | | 0029 | [Initial userspace image format (B4 — raw flat binary)](0029-initial-userspace-image-format.md) | Accepted | 2026-05-14 | +| 0030 | [Syscall ABI and userspace error taxonomy (B5)](0030-syscall-abi.md) | Proposed | 2026-05-29 | +| 0031 | [Initial syscall set (B5 — `send`/`recv`/`console_write`/`task_yield`/`task_exit`)](0031-initial-syscall-set.md) | Proposed | 2026-05-29 | | 0032 | [Endpoint state rollback on `ipc_recv_and_yield` Deadlock + `ipc_cancel_recv` primitive](0032-endpoint-rollback-and-cancel-recv.md) | Accepted | 2026-05-07 | | 0035 | [Physical Memory Manager (B3 prerequisite — bitmap allocator)](0035-physical-memory-manager.md) | Accepted | 2026-05-09 | | 0036 | [QEMU virt is GICv2 / no-IOMMU in v1; corrects GICv3/SMMUv3 in ADR-0004/0006/0012](0036-qemu-virt-gicv2-no-iommu-v1.md) | Accepted | 2026-05-22 | -> **Numbering gaps.** Slots **0030**, **0031**, **0033**, **0034** are intentionally reserved, not missing: 0030 (syscall ABI) and 0031 (initial syscall set / MMU follow-ups) are reserved for B5 per the `phase-b.md` §B5 ADR ledger; 0033 (high-half migration) and 0034 (kernel-image section permissions) are named-but-unallocated placeholders forward-flagged in ADR-0027/0028/0029. No files exist for these yet; they open when the corresponding work surfaces. ADR numbers are stable history and are never renumbered. +> **Numbering gaps.** Slots **0033** and **0034** are intentionally reserved, not missing: 0033 (high-half migration) and 0034 (kernel-image section permissions) are named-but-unallocated placeholders forward-flagged in ADR-0027/0028/0029. No files exist for these yet; they open when the corresponding work surfaces. (Slots **0030** (syscall ABI) and **0031** (initial syscall set) were filed on 2026-05-29 for B5 — `Proposed`, pending Accept — and are no longer gaps.) ADR numbers are stable history and are never renumbered. ## Creating a new ADR From 93d5960985760d7d04542e4cf89e25772b3071a9 Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 07:04:26 +0300 Subject: [PATCH 02/12] docs(adr): accept ADR-0030/0031 after careful re-read + maintainer review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Flip ADR-0030 and ADR-0031 Proposed -> Accepted in a commit separate from the propose draft, per write-adr skill section 10. The careful re-read plus a same-day maintainer review surfaced and corrected several drafting issues *before* this Accept — all folded into the proposed bodies above, so the Accepted text is correct from the start (no post-Accept body edit): - an SVC from a B5 EL1 kernel-stub takes the current-EL (VBAR_EL1+0x200) sync vector, not the lower-EL (+0x400) EL0 vector, so the real EL0 round-trip is runtime-verified in B6, not B5; - console_write is capability-gated on a debug-console capability (it was ambient authority, a P1/P4 violation); the release debug-gate is a separate, independent defense-in-depth gate; - the syscall numbers 1..5 are a fixed decision (tests regression-verify them), and the payload registers are x1..x7. Adds the additive ADR-0017 revision rider recording that the IpcError taxonomy is refined (not superseded) and the three-primitive surface is unchanged. Refs: ADR-0030, ADR-0031 Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/decisions/0017-ipc-primitive-set.md | 1 + docs/decisions/0030-syscall-abi.md | 2 +- docs/decisions/0031-initial-syscall-set.md | 2 +- docs/decisions/README.md | 6 +++--- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/decisions/0017-ipc-primitive-set.md b/docs/decisions/0017-ipc-primitive-set.md index 5389341..035f24b 100644 --- a/docs/decisions/0017-ipc-primitive-set.md +++ b/docs/decisions/0017-ipc-primitive-set.md @@ -214,6 +214,7 @@ The `notify` operation is non-blocking: it ORs bits into the `Notification` word - **2026-04-27 — pointer to architecture doc.** [T-008](../analysis/tasks/phase-b/T-008-architecture-docs.md) created [`docs/architecture/ipc.md`](../architecture/ipc.md), which synthesises this ADR (three-primitive set, endpoint state machine, capability-transfer pre-flight) with [ADR-0021](0021-raw-pointer-scheduler-ipc-bridge.md) (the scheduler-bridge wrappers). The ADR body is unchanged; this rider provides the bidirectional cross-reference T-008's DoD asks for ("ADRs cited from architecture docs are the same ADRs whose §References sections cite the new architecture docs"). - **2026-05-07 — `ipc_cancel_recv` recovery primitive added (ADR-0032 / T-015).** [ADR-0032](0032-endpoint-rollback-and-cancel-recv.md) introduces `ipc_cancel_recv(ep_arena, queues, ep_cap, table)` — a fourth function in `kernel/src/ipc/mod.rs` that reverses an `Idle → RecvWaiting` transition for the calling task. **It is a recovery primitive, not an extension of the user-observable IPC surface this ADR enumerated.** The user-observable set remains `send` / `recv` / `notify`; `cancel_recv` is consumed exclusively by the scheduler bridge's `ipc_recv_and_yield` Deadlock-rollback branch in v1 (kernel-internal). When userspace destroy paths land (Phase B2+), they may invoke it as a "drain receivers" sweep, and the future syscall-ABI ADR (currently pencilled as ADR-0030) decides whether to expose it directly. The `EndpointState` machine itself is unchanged — `cancel_recv` is a single-edge `RecvWaiting → Idle` reverse of an existing arc, not a new state. ADR-0017's *Decision outcome* (three-primitive set) is therefore not superseded; this rider records the additive recovery primitive that lands alongside it. - **2026-05-22 — "ADR-0030" forward-reference is a reserved slot, not yet filed.** The "pencilled as ADR-0030" syscall-ABI reference in the 2026-05-07 rider above is a **reserved slot number** (the `phase-b.md` §B5 ADR ledger formally reserves ADR-0030 for the syscall ABI and ADR-0031 for the initial syscall set), not a claim that a file exists. No `docs/decisions/0030-*.md` file exists yet; the slot opens with B5 userspace work. If the syscall ABI eventually lands under a different number, this reference is the one to update. (Mirrors the ADR-0033/0034 named-but-unallocated placeholder pattern.) +- **2026-05-29 — ADR-0030 filed (Accepted); `IpcError` taxonomy refined (additive split, not supersession).** [ADR-0030](0030-syscall-abi.md) (syscall ABI + userspace error taxonomy) is now `Accepted`, filling the reserved slot the 2026-05-22 rider flagged. As its **K2-5** bundle it splits `IpcError::InvalidCapability` into `StaleHandle` / `MissingRight` / `WrongObjectKind` ([T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)). This **refines the error taxonomy** of the IPC operations this ADR enumerated; it does **not** change the three-primitive surface (`send` / `recv` / `notify`) or the endpoint state machine — ADR-0017's *Decision outcome* stands. [ADR-0031](0031-initial-syscall-set.md) (initial syscall set) chooses **not** to expose `notify` (nor `cancel_recv`) from EL0 in v1; the kernel-internal `ipc_notify` / `ipc_cancel_recv` are unchanged. Both ADRs are additive to this one. ## References diff --git a/docs/decisions/0030-syscall-abi.md b/docs/decisions/0030-syscall-abi.md index a98059e..264453d 100644 --- a/docs/decisions/0030-syscall-abi.md +++ b/docs/decisions/0030-syscall-abi.md @@ -1,6 +1,6 @@ # 0030 — Syscall ABI and userspace error taxonomy -- **Status:** Proposed +- **Status:** Accepted - **Date:** 2026-05-29 - **Deciders:** @cemililik diff --git a/docs/decisions/0031-initial-syscall-set.md b/docs/decisions/0031-initial-syscall-set.md index c51f2fe..e81c59c 100644 --- a/docs/decisions/0031-initial-syscall-set.md +++ b/docs/decisions/0031-initial-syscall-set.md @@ -1,6 +1,6 @@ # 0031 — Initial syscall set (B-phase) -- **Status:** Proposed +- **Status:** Accepted - **Date:** 2026-05-29 - **Deciders:** @cemililik diff --git a/docs/decisions/README.md b/docs/decisions/README.md index 0201e09..319f57f 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -58,13 +58,13 @@ Each ADR contains: | 0027 | [Kernel virtual memory layout (B2 — identity-mapped MMU activation)](0027-kernel-virtual-memory-layout.md) | Accepted | 2026-05-08 | | 0028 | [Address-space data structure (B3 — kernel-object + capability-gated `Mmu::map` wrappers + activation-on-context-switch)](0028-address-space-data-structure.md) | Accepted | 2026-05-11 | | 0029 | [Initial userspace image format (B4 — raw flat binary)](0029-initial-userspace-image-format.md) | Accepted | 2026-05-14 | -| 0030 | [Syscall ABI and userspace error taxonomy (B5)](0030-syscall-abi.md) | Proposed | 2026-05-29 | -| 0031 | [Initial syscall set (B5 — `send`/`recv`/`console_write`/`task_yield`/`task_exit`)](0031-initial-syscall-set.md) | Proposed | 2026-05-29 | +| 0030 | [Syscall ABI and userspace error taxonomy (B5)](0030-syscall-abi.md) | Accepted | 2026-05-29 | +| 0031 | [Initial syscall set (B5 — `send`/`recv`/`console_write`/`task_yield`/`task_exit`)](0031-initial-syscall-set.md) | Accepted | 2026-05-29 | | 0032 | [Endpoint state rollback on `ipc_recv_and_yield` Deadlock + `ipc_cancel_recv` primitive](0032-endpoint-rollback-and-cancel-recv.md) | Accepted | 2026-05-07 | | 0035 | [Physical Memory Manager (B3 prerequisite — bitmap allocator)](0035-physical-memory-manager.md) | Accepted | 2026-05-09 | | 0036 | [QEMU virt is GICv2 / no-IOMMU in v1; corrects GICv3/SMMUv3 in ADR-0004/0006/0012](0036-qemu-virt-gicv2-no-iommu-v1.md) | Accepted | 2026-05-22 | -> **Numbering gaps.** Slots **0033** and **0034** are intentionally reserved, not missing: 0033 (high-half migration) and 0034 (kernel-image section permissions) are named-but-unallocated placeholders forward-flagged in ADR-0027/0028/0029. No files exist for these yet; they open when the corresponding work surfaces. (Slots **0030** (syscall ABI) and **0031** (initial syscall set) were filed on 2026-05-29 for B5 — `Proposed`, pending Accept — and are no longer gaps.) ADR numbers are stable history and are never renumbered. +> **Numbering gaps.** Slots **0033** and **0034** are intentionally reserved, not missing: 0033 (high-half migration) and 0034 (kernel-image section permissions) are named-but-unallocated placeholders forward-flagged in ADR-0027/0028/0029. No files exist for these yet; they open when the corresponding work surfaces. (Slots **0030** (syscall ABI) and **0031** (initial syscall set) were filed and `Accepted` on 2026-05-29 for B5 and are no longer gaps.) ADR numbers are stable history and are never renumbered. ## Creating a new ADR From d20e6d0406e2fdbc79435a4699a354bfdcb2ecd1 Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 06:22:28 +0300 Subject: [PATCH 03/12] feat(ipc): split IpcError::InvalidCapability into three typed variants Per ADR-0030's K2-5 bundle, replace the collapsed IpcError::InvalidCapability with StaleHandle / WrongObjectKind / MissingRight so the in-kernel and the future userspace error spaces agree and each failure is a distinct, handleable case. Validation now resolves in the order resolve -> type-check -> authority (kind before rights), matching CapError's InvalidHandle/WrongKind/ InsufficientRights shape, across validate_ep_cap, validate_notif_cap, and sched::resolve_ep_cap; the four arena-staleness sites map to StaleHandle. Revealing which check failed is safe for a per-subject, unforgeable capability table (ADR-0030 security argument). Remaps the existing rights/stale test assertions and adds 5 new tests pinning each variant (incl. wrong-kind-with- right, proving kind-before-rights, and a destroyed-endpoint StaleHandle). InvalidTransferCap is intentionally left intact (note C3-008). Updates docs/architecture/ipc.md taxonomy section. Security-relevant (capabilities + IPC). fmt / host-test (194 kernel) / host-clippy / kernel-clippy / kernel-build / miri (no UB) all green. Refs: ADR-0030 Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/architecture/ipc.md | 6 +- kernel/src/ipc/mod.rs | 231 ++++++++++++++++++++++++++++++++------- kernel/src/sched/mod.rs | 24 ++-- 3 files changed, 210 insertions(+), 51 deletions(-) diff --git a/docs/architecture/ipc.md b/docs/architecture/ipc.md index d11b447..e80b0e8 100644 --- a/docs/architecture/ipc.md +++ b/docs/architecture/ipc.md @@ -93,13 +93,17 @@ The "park in endpoint state" step is what makes the transfer atomic: even if the ### `IpcError` taxonomy ```text -IpcError::InvalidCapability // ep_cap stale, wrong kind, or lacks the required right +IpcError::StaleHandle // cap did not resolve, or its object was destroyed +IpcError::WrongObjectKind // cap resolved but names the wrong kind of object for the op +IpcError::MissingRight // cap is the right kind but lacks SEND / RECV / NOTIFY IpcError::QueueFull // a previous sender/receiver still occupies the endpoint IpcError::InvalidTransferCap // transfer-handle stale or lacks TRANSFER IpcError::ReceiverTableFull // pre-flight: receiver's cap table has no free slot IpcError::PendingAfterResume // scheduler-bridge invariant violation; see scheduler.md ``` +The first three variants — `StaleHandle` / `WrongObjectKind` / `MissingRight` — replace the former single `InvalidCapability` per [ADR-0030](../decisions/0030-syscall-abi.md)'s **K2-5** split, so the syscall error space and the in-kernel error space agree once userspace can call IPC. Capability validation resolves in the order **resolve → type-check → authority-check** (`StaleHandle` → `WrongObjectKind` → `MissingRight`), mirroring [`CapError`](../../kernel/src/cap/mod.rs)'s `InvalidHandle` / `WrongKind` / `InsufficientRights`. Revealing *which* check failed is safe — a capability table is per-subject and unforgeable, so the failure mode is a fact about the caller's own handle, not a forgery aid (see [ADR-0030 §"Security of the taxonomy split"](../decisions/0030-syscall-abi.md)). The companion redaction of `Capability`'s `Debug` (K3-9) keeps the *object identity* hidden even as the failure mode becomes visible. + The enum is annotated `#[non_exhaustive]`, which deliberately *opens* it for future extension: external matches must include a wildcard arm, so new variants can be added without silently breaking callers. `PendingAfterResume` is special among the variants — it is produced *only* by the scheduler bridge's resume path, never by the bare `ipc_recv` primitive, and it indicates a kernel-internal invariant violation rather than a userspace-reachable error. ADR-0022 §Revision notes (second rider) records why the typed return replaces a `debug_assert!` that was untestable. ### The scheduler-bridge wrappers diff --git a/kernel/src/ipc/mod.rs b/kernel/src/ipc/mod.rs index 1311e83..1d3724c 100644 --- a/kernel/src/ipc/mod.rs +++ b/kernel/src/ipc/mod.rs @@ -80,12 +80,37 @@ pub struct Message { } /// Errors returned by IPC operations. +/// +/// The capability-validation failure modes are split into three distinct, +/// handleable variants per [ADR-0030][adr-0030]'s K2-5 bundle (replacing the +/// former single `InvalidCapability`). Validation resolves in the order +/// `StaleHandle → WrongObjectKind → MissingRight` (resolve, then type-check, +/// then authority-check), mirroring [`CapError`][caperr]'s +/// `InvalidHandle` / `WrongKind` / `InsufficientRights` shape so the in-kernel +/// and userspace error spaces read the same way. Revealing *which* check +/// failed is safe for a per-subject, unforgeable capability table — see +/// [ADR-0030 §"Security of the taxonomy split"][adr-0030]. +/// +/// [adr-0030]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0030-syscall-abi.md +/// [caperr]: crate::cap::CapError #[non_exhaustive] #[derive(Copy, Clone, Debug, Eq, PartialEq)] pub enum IpcError { - /// The endpoint or notification capability is invalid, stale, or the - /// caller lacks the required right (`SEND`, `RECV`, or `NOTIFY`). - InvalidCapability, + /// The capability handle did not resolve in the caller's table, or the + /// kernel object it named has been destroyed (the arena slot's + /// generation moved past the handle's). The reference is dead — the + /// caller should re-acquire it, not retry. + StaleHandle, + /// The capability resolved but names the wrong kind of object for this + /// operation (e.g. a `Notification` capability handed to `ipc_send`, or + /// an `Endpoint` capability handed to `ipc_notify`). A programming + /// error in the caller: the wrong handle was passed. + WrongObjectKind, + /// The capability resolved and is the right kind, but does not carry the + /// right this operation requires (`SEND` for `ipc_send`, `RECV` for + /// `ipc_recv` / `ipc_cancel_recv`, `NOTIFY` for `ipc_notify`). The caller + /// holds the object but lacks the authority — it must obtain the right. + MissingRight, /// The endpoint's waiter queue is at capacity (depth 1 in v1): a second /// blocked sender arrived while the first is still pending, or a second /// receiver registered before the first was served. @@ -269,7 +294,9 @@ impl IpcQueues { /// /// # Errors /// -/// - [`IpcError::InvalidCapability`] — `ep_cap` is stale or lacks `SEND`. +/// - [`IpcError::StaleHandle`] — `ep_cap` did not resolve, or its endpoint +/// was destroyed; [`IpcError::WrongObjectKind`] — `ep_cap` does not name an +/// endpoint; [`IpcError::MissingRight`] — `ep_cap` lacks `SEND`. /// - [`IpcError::InvalidTransferCap`] — `transfer` handle is stale. /// - [`IpcError::QueueFull`] — a previous send is still pending (or a /// delivery for a waiting receiver is uncollected). @@ -298,7 +325,7 @@ pub fn ipc_send( // Confirm the endpoint handle is still live in the arena. ep_arena .get(ep_handle.slot()) - .ok_or(IpcError::InvalidCapability)?; + .ok_or(IpcError::StaleHandle)?; // Pre-flight: queue-full check. Peek state non-destructively before any // cap manipulation so that a QueueFull return leaves both the endpoint @@ -360,7 +387,9 @@ pub fn ipc_send( /// /// # Errors /// -/// - [`IpcError::InvalidCapability`] — `ep_cap` is stale or lacks `RECV`. +/// - [`IpcError::StaleHandle`] — `ep_cap` did not resolve, or its endpoint +/// was destroyed; [`IpcError::WrongObjectKind`] — `ep_cap` does not name an +/// endpoint; [`IpcError::MissingRight`] — `ep_cap` lacks `RECV`. /// - [`IpcError::ReceiverTableFull`] — the receiver's table has no free slot /// for the capability carried with the pending message. Free a slot first. /// - [`IpcError::QueueFull`] — a receiver is already registered on this endpoint. @@ -374,7 +403,7 @@ pub fn ipc_recv( ep_arena .get(ep_handle.slot()) - .ok_or(IpcError::InvalidCapability)?; + .ok_or(IpcError::StaleHandle)?; // Pre-flight: if the pending state carries a capability, ensure the // receiver's table has room before committing the state transition. This @@ -441,7 +470,10 @@ pub fn ipc_recv( /// /// # Errors /// -/// [`IpcError::InvalidCapability`] — `notif_cap` is stale or lacks `NOTIFY`. +/// [`IpcError::StaleHandle`] — `notif_cap` did not resolve, or its +/// notification was destroyed; [`IpcError::WrongObjectKind`] — `notif_cap` +/// does not name a notification; [`IpcError::MissingRight`] — `notif_cap` +/// lacks `NOTIFY`. pub fn ipc_notify( notif_arena: &mut NotificationArena, notif_cap: CapHandle, @@ -451,7 +483,7 @@ pub fn ipc_notify( let notif_handle = validate_notif_cap(caller_table, notif_cap)?; let notif = notif_arena .get_mut(notif_handle.slot()) - .ok_or(IpcError::InvalidCapability)?; + .ok_or(IpcError::StaleHandle)?; notif.set(bits); Ok(()) } @@ -520,9 +552,10 @@ pub fn ipc_notify( /// /// # Errors /// -/// [`IpcError::InvalidCapability`] — `ep_cap` is stale, refers to a -/// non-endpoint object, or lacks `RECV`. The endpoint state is not -/// touched on this error. +/// [`IpcError::StaleHandle`] — `ep_cap` did not resolve, or its endpoint +/// was destroyed; [`IpcError::WrongObjectKind`] — `ep_cap` refers to a +/// non-endpoint object; [`IpcError::MissingRight`] — `ep_cap` lacks `RECV`. +/// The endpoint state is not touched on any of these errors. /// /// [adr-0032]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0032-endpoint-rollback-and-cancel-recv.md /// [ADR-0017]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0017-ipc-primitive-set.md @@ -537,7 +570,7 @@ pub fn ipc_cancel_recv( ep_arena .get(ep_handle.slot()) - .ok_or(IpcError::InvalidCapability)?; + .ok_or(IpcError::StaleHandle)?; let state = queues.state_of(ep_handle); if matches!(state, EndpointState::RecvWaiting) { @@ -548,37 +581,38 @@ pub fn ipc_cancel_recv( // ── Helpers ───────────────────────────────────────────────────────────────── +// Resolve → type-check → authority-check, in that order, per ADR-0030's +// `StaleHandle → WrongObjectKind → MissingRight` taxonomy. The order is +// observable only for a capability that is *both* the wrong kind and +// missing the right; checking kind first reports the more fundamental +// error ("you passed the wrong handle") ahead of the authority error. fn validate_ep_cap( table: &CapabilityTable, ep_cap: CapHandle, required: CapRights, ) -> Result { - let cap = table - .lookup(ep_cap) - .map_err(|_| IpcError::InvalidCapability)?; + let cap = table.lookup(ep_cap).map_err(|_| IpcError::StaleHandle)?; + let CapObject::Endpoint(handle) = cap.object() else { + return Err(IpcError::WrongObjectKind); + }; if !cap.rights().contains(required) { - return Err(IpcError::InvalidCapability); - } - match cap.object() { - CapObject::Endpoint(h) => Ok(h), - _ => Err(IpcError::InvalidCapability), + return Err(IpcError::MissingRight); } + Ok(handle) } fn validate_notif_cap( table: &CapabilityTable, notif_cap: CapHandle, ) -> Result { - let cap = table - .lookup(notif_cap) - .map_err(|_| IpcError::InvalidCapability)?; + let cap = table.lookup(notif_cap).map_err(|_| IpcError::StaleHandle)?; + let CapObject::Notification(handle) = cap.object() else { + return Err(IpcError::WrongObjectKind); + }; if !cap.rights().contains(CapRights::NOTIFY) { - return Err(IpcError::InvalidCapability); - } - match cap.object() { - CapObject::Notification(h) => Ok(h), - _ => Err(IpcError::InvalidCapability), + return Err(IpcError::MissingRight); } + Ok(handle) } /// Take the cap at `handle` (if any) out of `table` for in-flight transfer. @@ -899,7 +933,8 @@ mod tests { None ) .unwrap_err(), - IpcError::InvalidCapability + // Correct kind (endpoint), missing the SEND right → MissingRight. + IpcError::MissingRight ); } @@ -911,7 +946,117 @@ mod tests { let (_, ep_cap) = setup_ep(&mut table, &mut ep_arena, CapRights::SEND); assert_eq!( ipc_recv(&mut ep_arena, &mut queues, ep_cap, &mut table).unwrap_err(), - IpcError::InvalidCapability + // Correct kind (endpoint), missing the RECV right → MissingRight. + IpcError::MissingRight + ); + } + + // ── error taxonomy: WrongObjectKind + StaleHandle (ADR-0030 K2-5) ───────── + // + // These pin the two split variants that the pre-existing rights-failure + // tests above (now `MissingRight`) do not reach. The `WrongObjectKind` + // tests deliberately give the cap the operation's right too, proving the + // kind check runs *before* the rights check (ADR-0030 ordering: a + // wrong-kind cap fails with `WrongObjectKind` even when it carries the + // right). `StaleHandle` is exercised on both the table-lookup-miss path + // (a dropped cap handle) and the arena-staleness path (a destroyed + // endpoint whose cap still resolves in the table). + + #[test] + fn send_with_wrong_object_kind_returns_wrong_object_kind() { + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + // A Task cap that even carries SEND — but it is not an endpoint. + let cap_h = table + .insert_root(Capability::new(CapRights::SEND, task_object(1))) + .unwrap(); + assert_eq!( + ipc_send( + &mut ep_arena, + &mut queues, + cap_h, + &mut table, + test_msg(0), + None + ) + .unwrap_err(), + IpcError::WrongObjectKind + ); + } + + #[test] + fn recv_with_wrong_object_kind_returns_wrong_object_kind() { + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + // A Task cap that even carries RECV — but it is not an endpoint. + let cap_h = table + .insert_root(Capability::new(CapRights::RECV, task_object(1))) + .unwrap(); + assert_eq!( + ipc_recv(&mut ep_arena, &mut queues, cap_h, &mut table).unwrap_err(), + IpcError::WrongObjectKind + ); + } + + #[test] + fn notify_with_wrong_object_kind_returns_wrong_object_kind() { + let mut table = CapabilityTable::new(); + let mut notif_arena = NotificationArena::default(); + // A Task cap that even carries NOTIFY — but it is not a notification. + let cap_h = table + .insert_root(Capability::new(CapRights::NOTIFY, task_object(2))) + .unwrap(); + assert_eq!( + ipc_notify(&mut notif_arena, cap_h, &table, 0xFF).unwrap_err(), + IpcError::WrongObjectKind + ); + } + + #[test] + fn send_with_dropped_cap_handle_returns_stale_handle() { + // Table-lookup-miss path: the cap handle no longer resolves. + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + let (_, ep_cap) = setup_ep(&mut table, &mut ep_arena, all_ep_rights()); + table.cap_drop(ep_cap).unwrap(); + assert_eq!( + ipc_send( + &mut ep_arena, + &mut queues, + ep_cap, + &mut table, + test_msg(0), + None + ) + .unwrap_err(), + IpcError::StaleHandle + ); + } + + #[test] + fn send_to_destroyed_endpoint_returns_stale_handle() { + // Arena-staleness path: the cap still resolves (right kind + right), + // but its endpoint object was destroyed, so the arena `get` fails. + use crate::obj::endpoint::destroy_endpoint; + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + let (ep_handle, ep_cap) = setup_ep(&mut table, &mut ep_arena, all_ep_rights()); + destroy_endpoint(&mut ep_arena, ep_handle).unwrap(); + assert_eq!( + ipc_send( + &mut ep_arena, + &mut queues, + ep_cap, + &mut table, + test_msg(0), + None + ) + .unwrap_err(), + IpcError::StaleHandle ); } @@ -1098,7 +1243,8 @@ mod tests { let cap_h = table.insert_root(cap).unwrap(); assert_eq!( ipc_notify(&mut notif_arena, cap_h, &table, 0xFF).unwrap_err(), - IpcError::InvalidCapability + // Correct kind (notification), missing the NOTIFY right → MissingRight. + IpcError::MissingRight ); } @@ -1108,10 +1254,12 @@ mod tests { // `stale_queue_state_reset_on_slot_reuse`. A cap whose underlying // notification was destroyed (and a new one re-allocated in the same // slot with a bumped generation) must make `ipc_notify` return - // `InvalidCapability` via the arena `get_mut(...).ok_or(...)` branch — - // the realistic adversarial case where the cap's rights check still - // passes but the handle is stale. The endpoint side already pins this; - // this closes the notification-side gap at the `ipc_notify` boundary. + // `StaleHandle` via the arena `get_mut(...).ok_or(...)` branch — + // the realistic adversarial case where the cap's rights/kind checks + // still pass but the handle is stale. The endpoint side already pins + // this; this closes the notification-side gap at the `ipc_notify` + // boundary. (Post-ADR-0030: this path now returns the granular + // `StaleHandle` rather than the former collapsed `InvalidCapability`.) let mut table = CapabilityTable::new(); let mut notif_arena = NotificationArena::default(); @@ -1125,12 +1273,12 @@ mod tests { // The cap still satisfies the rights/kind check (it was minted with // NOTIFY), so the failure must come from the arena staleness lookup, - // not the rights gate — proving the `ok_or(InvalidCapability)` mapping + // not the rights gate — proving the `ok_or(StaleHandle)` mapping // at the IPC boundary fires. (No cap_drop is even needed to provoke it.) assert_eq!( ipc_notify(&mut notif_arena, notif_cap, &table, 0xFF).unwrap_err(), - IpcError::InvalidCapability, - "ipc_notify on a stale notification handle must return InvalidCapability" + IpcError::StaleHandle, + "ipc_notify on a stale notification handle must return StaleHandle" ); // Re-allocating reuses the slot with a bumped generation; the stale @@ -1138,7 +1286,7 @@ mod tests { let _new_handle = create_notification(&mut notif_arena, Notification::new(1)).unwrap(); assert_eq!( ipc_notify(&mut notif_arena, notif_cap, &table, 0xFF).unwrap_err(), - IpcError::InvalidCapability, + IpcError::StaleHandle, "stale cap must still fail after the slot is reused by a new notification" ); } @@ -1428,7 +1576,8 @@ mod tests { let (_, ep_cap) = setup_ep(&mut table, &mut ep_arena, CapRights::SEND); assert_eq!( ipc_cancel_recv(&mut ep_arena, &mut queues, ep_cap, &table).unwrap_err(), - IpcError::InvalidCapability, + // Correct kind (endpoint), missing the RECV right → MissingRight. + IpcError::MissingRight, ); } diff --git a/kernel/src/sched/mod.rs b/kernel/src/sched/mod.rs index eeb52eb..0c28a96 100644 --- a/kernel/src/sched/mod.rs +++ b/kernel/src/sched/mod.rs @@ -371,12 +371,16 @@ impl Scheduler { caller_table: &CapabilityTable, ep_cap: CapHandle, ) -> Result { + // Resolve → type-check, mapping each failure to ADR-0030's granular + // variant: a missing handle is `StaleHandle`, a non-endpoint object is + // `WrongObjectKind`. (The rights check lives inside `ipc_send` / + // `ipc_recv`, which surface `MissingRight`.) let cap = caller_table .lookup(ep_cap) - .map_err(|_| SchedError::Ipc(IpcError::InvalidCapability))?; + .map_err(|_| SchedError::Ipc(IpcError::StaleHandle))?; match cap.object() { CapObject::Endpoint(h) => Ok(h), - _ => Err(SchedError::Ipc(IpcError::InvalidCapability)), + _ => Err(SchedError::Ipc(IpcError::WrongObjectKind)), } } @@ -1232,9 +1236,10 @@ pub unsafe fn ipc_recv_and_yield( // `Idle → RecvWaiting` transition so the caller observes the // same endpoint state it had before the bridge was called. // Phase 1 just validated `ep_cap` with RECV; the cancel cannot - // surface a new InvalidCapability under v1's single-thread - // cooperative invariant, so the result is asserted in debug - // and discarded in release. + // surface a new capability error (the ADR-0030 split variants + // StaleHandle / WrongObjectKind / MissingRight) under v1's + // single-thread cooperative invariant, so the result is asserted + // in debug and discarded in release. // SAFETY: caller contract — `ep_arena`, `queues`, `caller_table` // valid + distinct for this momentary block; `ep_arena` and // `queues` are exclusively owned (`&mut` reborrows below), @@ -2790,8 +2795,9 @@ mod tests { fn ipc_send_and_yield_send_error_preserves_scheduler_state() { // Setup: h0 (current), h1 (Ready in queue). The endpoint cap // grants RECV but not SEND, so `ipc_send` fails its rights check - // with `IpcError::InvalidCapability`. The bridge must surface - // this typed error and leave the scheduler exactly as it was. + // with `IpcError::MissingRight` (ADR-0030: correct kind, missing + // right). The bridge must surface this typed error and leave the + // scheduler exactly as it was. // Symmetric to T-007's `ipc_recv_and_yield_returns_deadlock_…` // state-restore guarantee. let cpu = FakeCpu::new(); @@ -2859,8 +2865,8 @@ mod tests { }; assert!( - matches!(result, Err(SchedError::Ipc(IpcError::InvalidCapability))), - "expected Err(Ipc(InvalidCapability)), got {result:?}" + matches!(result, Err(SchedError::Ipc(IpcError::MissingRight))), + "expected Err(Ipc(MissingRight)), got {result:?}" ); // Scheduler state must be untouched. assert_eq!(sched.current, prior_current); From 324457afd055fbea692d4eeb579618e17ea212fc Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 06:22:32 +0300 Subject: [PATCH 04/12] feat(cap): redact Capability and CapObject Debug to hide object identity Per ADR-0030 "Security of the taxonomy split" / B5 sub-item 6 (K3-9, security review section 6): a userspace-reachable log path (the future console_write syscall) must never disclose the kernel object a capability names. Replace the derived Debug on Capability with a hand-written impl that shows rights but prints the object as , and redact CapObject likewise (kind-only Debug, hiding the wrapped slot index + generation). The individual kernel- object handle types keep their derived Debug for kernel-internal diagnostics (they never cross to userspace; T-021's console_write review gates that). Two host tests pin both redaction layers. Broadens security-model.md's "no unredacted Debug/Display" rule to capabilities. The CapObject redaction was folded in from an adversarial self-review that flagged it as a latent defense-in-depth gap (no current production formatter, but conservative per CLAUDE.md rule 1). Security-relevant. Refs: ADR-0030 Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/architecture/security-model.md | 3 +- kernel/src/cap/mod.rs | 109 +++++++++++++++++++++++++++- 2 files changed, 107 insertions(+), 5 deletions(-) diff --git a/docs/architecture/security-model.md b/docs/architecture/security-model.md index 298fa8d..a68cfbb 100644 --- a/docs/architecture/security-model.md +++ b/docs/architecture/security-model.md @@ -146,6 +146,7 @@ Two capability properties are load-bearing: - **Unforgeability.** Capability tokens are entirely inside the kernel's memory. Userspace never sees raw bits; it references its own table by handle. - **Non-transferability without consent.** A capability can be moved (via IPC) or duplicated (via a duplicate authority), but never silently copied or leaked. +- **Redacted in diagnostics.** A `Capability`'s `Debug` impl shows its rights but **redacts the kernel object it names** (the typed handle's slot index + generation), so a userspace-reachable log path — such as the debug `console_write` syscall — can never disclose kernel-internal object identity. Capabilities follow the same "no unredacted `Debug`/`Display`" discipline cryptographic keys do (see §Cryptography). Per [ADR-0030](../decisions/0030-syscall-abi.md) (K3-9 / security review §6). #### Capability operations @@ -253,7 +254,7 @@ The kernel has **no cryptographic primitives** in its initial form. This is deli - An ADR per primitive or primitive family. - No roll-your-own. Only well-known algorithms from audited crates. - A [security-review](../standards/security-review.md) pass on every introduction, separately from code review. -- Keys are first-class types that do not implement unredacted `Debug` or `Display`. See [logging-and-observability.md](../standards/logging-and-observability.md). +- Keys are first-class types that do not implement unredacted `Debug` or `Display` — the same discipline capabilities follow (see §Capabilities, "Redacted in diagnostics"). See [logging-and-observability.md](../standards/logging-and-observability.md). ### Supply chain diff --git a/kernel/src/cap/mod.rs b/kernel/src/cap/mod.rs index 3a11338..fe5e095 100644 --- a/kernel/src/cap/mod.rs +++ b/kernel/src/cap/mod.rs @@ -85,7 +85,16 @@ pub enum CapKind { /// T-018 (per [ADR-0028][adr-0028]). /// /// [adr-0028]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0028-address-space-data-structure.md -#[derive(Copy, Clone, Debug, Eq, PartialEq)] +/// +/// `Debug` is a **hand-written, redacting** impl (not derived): it prints the +/// object *kind* but redacts the wrapped typed handle (slot index + +/// generation). This is the same kernel-internal-identity hazard the +/// [`Capability`] redaction (K3-9 / [ADR-0030][adr-0030cap]) addresses — +/// closed here at the source so a `CapObject` formatted directly (e.g. into a +/// future error or a userspace-reachable log) cannot leak the handle either. +/// +/// [adr-0030cap]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0030-syscall-abi.md +#[derive(Copy, Clone, Eq, PartialEq)] pub enum CapObject { /// Capability naming a [`Task`][crate::obj::Task] kernel object. Task(TaskHandle), @@ -113,6 +122,21 @@ impl CapObject { } } +impl core::fmt::Debug for CapObject { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + // Show the kind (benign, useful for diagnostics) but redact the + // wrapped typed handle (slot index + generation) — kernel-internal + // identity per K3-9 / ADR-0030. The wrapped handle types keep their + // derived `Debug` for kernel-internal traces (scheduler, arena) where + // the slot/generation is the useful information and never crosses to + // userspace; `CapObject` is redacted because it is the type a + // capability (or a future error) carries toward a log boundary. + f.debug_struct("CapObject") + .field("kind", &self.kind()) + .finish() + } +} + /// A capability. /// /// Deliberately **not** `Copy` and **not** `Clone`. Duplication happens @@ -120,14 +144,34 @@ impl CapObject { /// to hold the [`CapRights::DUPLICATE`] authority on the source. The /// Rust type system enforces the move-only discipline by construction. /// -/// `Debug` is derived so that test assertions can format capabilities; -/// the derived impl exposes typed handles but no other unforgeable bits. -#[derive(Debug)] +/// `Debug` is a **hand-written, redacting** impl (not derived): it prints +/// the `rights` (authority bits — useful for diagnostics and not +/// unforgeable) but redacts the named object as ``. The object +/// names a kernel object by typed handle (slot index + generation), which +/// is kernel-internal identity that must never leak across a +/// userspace-reachable log path such as the future `console_write` syscall. +/// Per [ADR-0030][adr-0030] §"Security of the taxonomy split" and B5 +/// sub-item 6 (K3-9 — security review §6). +/// +/// [adr-0030]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0030-syscall-abi.md pub struct Capability { rights: CapRights, object: CapObject, } +impl core::fmt::Debug for Capability { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + // Redact the named object; keep rights visible. `format_args!` + // (rather than a `&str`) emits the placeholder without surrounding + // quotes, so the output reads `object: `, not + // `object: ""`. + f.debug_struct("Capability") + .field("rights", &self.rights) + .field("object", &format_args!("")) + .finish() + } +} + impl Capability { /// Construct a capability with the given rights over `object`. The /// [`CapKind`] is derived from the `object`'s variant, so @@ -192,3 +236,60 @@ pub enum CapError { /// `AddressSpaceError::CapError(_)` passthrough. WrongKind, } + +#[cfg(test)] +mod tests { + use super::{CapObject, CapRights, Capability}; + use crate::obj::TaskHandle; + + #[test] + fn debug_redacts_named_object_but_keeps_rights() { + // K3-9 (ADR-0030 §"Security of the taxonomy split"): a `Capability`'s + // `Debug` must not leak the kernel object it names — no kind, no slot + // index, no generation — but may show the (non-unforgeable) rights. + let cap = Capability::new( + CapRights::SEND | CapRights::RECV, + CapObject::Task(TaskHandle::test_handle(0xAB, 7)), + ); + let shown = format!("{cap:?}"); + + // The named object is redacted. + assert!( + shown.contains("object: "), + "object must be redacted, got: {shown}" + ); + assert!( + !shown.contains("Task"), + "object kind must not leak, got: {shown}" + ); + assert!( + !shown.contains("171"), + "handle index (0xAB = 171) must not leak, got: {shown}" + ); + // Rights stay visible for diagnostics (`CapRights` derives `Debug`). + assert!( + shown.contains("rights"), + "rights field must be shown, got: {shown}" + ); + } + + #[test] + fn capobject_debug_redacts_handle_but_shows_kind() { + // Defense-in-depth: even formatting a bare `CapObject` (not wrapped in + // a `Capability`) must not leak the handle's slot index / generation. + let obj = CapObject::Task(TaskHandle::test_handle(0xAB, 7)); + let shown = format!("{obj:?}"); + + // The kind is shown (benign, useful for diagnostics)... + assert!(shown.contains("Task"), "kind should be shown, got: {shown}"); + // ...but the wrapped handle's identity is redacted. + assert!( + !shown.contains("171"), + "handle index (0xAB = 171) must not leak, got: {shown}" + ); + assert!( + !shown.contains("SlotId") && !shown.contains("generation"), + "handle internals must not leak, got: {shown}" + ); + } +} From 4777f9ac9ab0f914581878f64607e2949016cb4f Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 07:06:13 +0300 Subject: [PATCH 05/12] test(ipc): pin StaleHandle + WrongObjectKind on ipc_cancel_recv Add two tests so ipc_cancel_recv pins all three split variants (it already had MissingRight): a Task cap carrying RECV proves the kind-before-rights ordering (WrongObjectKind), and a cap whose endpoint was destroyed exercises the arena-staleness branch (StaleHandle). This makes ADR-0030's row-3 verification mapping accurate for cancel_recv (it previously over-claimed cancel coverage). Kernel host tests 194 -> 196. Refs: ADR-0030 Co-Authored-By: Claude Opus 4.8 (1M context) --- kernel/src/ipc/mod.rs | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/kernel/src/ipc/mod.rs b/kernel/src/ipc/mod.rs index 1d3724c..548bb3d 100644 --- a/kernel/src/ipc/mod.rs +++ b/kernel/src/ipc/mod.rs @@ -1597,6 +1597,38 @@ mod tests { assert!(matches!(outcome, RecvOutcome::Pending)); } + #[test] + fn cancel_recv_with_wrong_object_kind_returns_wrong_object_kind() { + // Symmetric to the send/recv wrong-kind tests: a cap that carries + // RECV but names a Task (not an endpoint) fails kind-before-rights. + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + let cap_h = table + .insert_root(Capability::new(CapRights::RECV, task_object(1))) + .unwrap(); + assert_eq!( + ipc_cancel_recv(&mut ep_arena, &mut queues, cap_h, &table).unwrap_err(), + IpcError::WrongObjectKind + ); + } + + #[test] + fn cancel_recv_to_destroyed_endpoint_returns_stale_handle() { + // Arena-staleness path for cancel: the cap resolves with RECV, but its + // endpoint object was destroyed, so the arena `get` fails. + use crate::obj::endpoint::destroy_endpoint; + let mut table = CapabilityTable::new(); + let mut ep_arena = EndpointArena::default(); + let mut queues = IpcQueues::new(); + let (ep_handle, ep_cap) = setup_ep(&mut table, &mut ep_arena, all_ep_rights()); + destroy_endpoint(&mut ep_arena, ep_handle).unwrap(); + assert_eq!( + ipc_cancel_recv(&mut ep_arena, &mut queues, ep_cap, &table).unwrap_err(), + IpcError::StaleHandle + ); + } + // ── reset_if_stale_generation guard tests (T-011) ───────────────────────── // // `IpcQueues::reset_if_stale_generation` (used by `state_of` / From 1df1b522f87101a5ebd8303523b984ed3e2bc222 Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 07:09:07 +0300 Subject: [PATCH 06/12] docs(roadmap): T-020 In Review; narrow B5 acceptance to current-EL proxy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address the remaining maintainer-review findings (the ADR append-only fix landed via the propose/accept rebase; this commit covers the rest): - Major: phase-b §B5 acceptance over-promised a real EL0->EL1 round-trip, which ADR-0030 shows is impossible at B5 (an EL1 kernel-stub SVC takes the current-EL 0x200 vector, not the lower-EL 0x400 EL0 vector). Narrow B5 to "dispatch mechanism verified via the current-EL kernel-stub" and move the real EL0 0x400 round-trip to the B6 acceptance criteria. - Minor: current.md banner said "In Progress" while the fields said "In Review"; fix the banner and the two broken T-021 links (../). - Move T-020 to In Review in the task index + task doc; record the maintainer-review round and the row-to-verification mapping (now incl. the two new cancel_recv variant tests) in T-020's review history. - Add EL0/EL1, SVC, Syscall, and Syscall ABI glossary entries and note the taxonomy split on the ipc.md architecture status row. Refs: ADR-0030, ADR-0031 Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/analysis/tasks/phase-b/README.md | 2 +- .../phase-b/T-020-syscall-error-taxonomy.md | 25 +++++++++++-------- docs/architecture/README.md | 2 +- docs/glossary.md | 8 ++++++ docs/roadmap/current.md | 12 +++++---- docs/roadmap/phases/phase-b.md | 9 ++++--- 6 files changed, 36 insertions(+), 22 deletions(-) diff --git a/docs/analysis/tasks/phase-b/README.md b/docs/analysis/tasks/phase-b/README.md index dd95288..b430231 100644 --- a/docs/analysis/tasks/phase-b/README.md +++ b/docs/analysis/tasks/phase-b/README.md @@ -19,7 +19,7 @@ Tasks belonging to [Phase B — Real userspace](../../../roadmap/phases/phase-b. | [T-017](T-017-physical-memory-manager.md) | Physical Memory Manager (PMM): bitmap allocator + reservation tracking + `FrameProvider` impl (implements ADR-0035) | B3 | Done (2026-05-10) | | [T-018](T-018-address-space-kernel-object.md) | `AddressSpace` kernel object + capability-gated `Mmu::map`/`unmap` wrappers + activation-on-context-switch (implements ADR-0028) | B3 | Done (2026-05-11; live on `main` 2026-05-14 via PR #28) | | [T-019](T-019-task-loader.md) | Task loader: embedded raw-flat userspace image → `LoadedImage` metadata (implements ADR-0029) | B4 | Done (2026-05-16 via PR #31 merge) | -| [T-020](T-020-syscall-error-taxonomy.md) | Syscall error taxonomy: split `IpcError::InvalidCapability` + redact `Capability` `Debug` (implements ADR-0030 K2-5 / K3-9) | B5 | In Progress | +| [T-020](T-020-syscall-error-taxonomy.md) | Syscall error taxonomy: split `IpcError::InvalidCapability` + redact `Capability`/`CapObject` `Debug` (implements ADR-0030 K2-5 / K3-9) | B5 | In Review | | [T-021](T-021-syscall-dispatch.md) | EL0→EL1 `SVC` dispatch: trap trampoline + panic-free dispatcher + copy-from/to-user (implements ADR-0030 / ADR-0031) | B5 | Ready | Tasks are added here as they become active. See [`../../../roadmap/phases/phase-b.md`](../../../roadmap/phases/phase-b.md) for the full phase plan. diff --git a/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md index 5f2429e..40e6b1c 100644 --- a/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md +++ b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md @@ -2,7 +2,7 @@ - **Phase:** B - **Milestone:** B5 — Syscall boundary (this task is B5's pure-Rust foundation: the userspace-facing error taxonomy + capability-Debug redaction that the dispatcher in [T-021](T-021-syscall-dispatch.md) builds on; [ADR-0030](../../../decisions/0030-syscall-abi.md) settles the taxonomy) -- **Status:** In Progress +- **Status:** In Review - **Created:** 2026-05-29 - **Author:** @cemililik (+ Claude Opus 4.8 agent) - **Dependencies:** [ADR-0030](../../../decisions/0030-syscall-abi.md) — must be `Accepted` before code lands (settles the `StaleHandle` / `MissingRight` / `WrongObjectKind` split and the §"Security of the taxonomy split" rationale). No prior task gates this; it is pure-Rust over the existing `kernel/src/ipc` + `kernel/src/cap` surfaces. @@ -25,19 +25,19 @@ This task deliberately **excludes** the trap trampoline, the dispatcher, `Syscal ## Acceptance criteria -- [ ] `IpcError::InvalidCapability` is removed and replaced by `IpcError::StaleHandle`, `IpcError::MissingRight`, `IpcError::WrongObjectKind`, each with a doc-comment describing its distinct meaning per [ADR-0030](../../../decisions/0030-syscall-abi.md). `IpcError` stays `#[non_exhaustive]`. -- [ ] Every production site is mapped to the correct variant: `validate_ep_cap` / `validate_notif_cap` (`ipc/mod.rs`) and `resolve_ep_cap` (`sched/mod.rs`) resolve in the order `StaleHandle → WrongObjectKind → MissingRight`; arena `get`/`get_mut` staleness failures map to `StaleHandle`. -- [ ] Every existing test asserting `InvalidCapability` is updated to its correct post-split variant (rights failures → `MissingRight`; stale-handle/destroyed-object → `StaleHandle`), and the `sched` bridge test is updated. -- [ ] New host tests pin each new variant on a path that does **not** already prove it — at minimum a `WrongObjectKind` test for an endpoint operation and for `ipc_notify` (a wrong-kind cap), and a `StaleHandle` test for `ipc_send`/`ipc_recv` against a destroyed endpoint. -- [ ] `Capability`'s `Debug` impl is custom (not derived) and prints `Capability { rights: , object: }` — `rights` visible, the named object redacted — per [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split) and the K3-9 redaction requirement. A host test pins that the output contains the rights and the literal `` and does **not** contain the object's handle. -- [ ] All gates green: `cargo fmt --all -- --check`, `cargo host-test`, `cargo host-clippy`, `cargo kernel-clippy`, `cargo kernel-build`, and `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt`. -- [ ] Docs updated: [`docs/architecture/ipc.md`](../../../architecture/ipc.md) §"`IpcError` taxonomy", [`docs/architecture/security-model.md`](../../../architecture/security-model.md) redaction rule broadened to capabilities, [`docs/glossary.md`](../../../glossary.md) syscall terms, the ADR index, and the [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) §Revision-notes rider. +- [x] `IpcError::InvalidCapability` is removed and replaced by `IpcError::StaleHandle`, `IpcError::MissingRight`, `IpcError::WrongObjectKind`, each with a doc-comment describing its distinct meaning per [ADR-0030](../../../decisions/0030-syscall-abi.md). `IpcError` stays `#[non_exhaustive]`. +- [x] Every production site is mapped to the correct variant: `validate_ep_cap` / `validate_notif_cap` (`ipc/mod.rs`) and `resolve_ep_cap` (`sched/mod.rs`) resolve in the order `StaleHandle → WrongObjectKind → MissingRight`; arena `get`/`get_mut` staleness failures map to `StaleHandle`. +- [x] Every existing test asserting `InvalidCapability` is updated to its correct post-split variant (rights failures → `MissingRight`; stale-handle/destroyed-object → `StaleHandle`), and the `sched` bridge test is updated. +- [x] New host tests pin each new variant on a path that does **not** already prove it — `WrongObjectKind` for `ipc_send` / `ipc_recv` / `ipc_notify` (wrong-kind caps that carry the right, proving kind-before-rights), and `StaleHandle` for `ipc_send` on both a dropped cap handle and a destroyed endpoint. +- [x] `Capability`'s `Debug` impl is custom (not derived) and prints `Capability { rights: , object: }` — `rights` visible, the named object redacted — per [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split) and the K3-9 redaction requirement. **`CapObject` is also redacted** (kind-only `Debug`, hiding the wrapped handle) — closing the defense-in-depth gap the adversarial review raised. Two host tests pin both layers. +- [x] All gates green: `cargo fmt --all -- --check`, `cargo host-test` (194 kernel), `cargo host-clippy`, `cargo kernel-clippy`, `cargo kernel-build`, and `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` (no UB). +- [x] Docs updated: [`docs/architecture/ipc.md`](../../../architecture/ipc.md) §"`IpcError` taxonomy", [`docs/architecture/security-model.md`](../../../architecture/security-model.md) redaction rule broadened to capabilities, [`docs/glossary.md`](../../../glossary.md) syscall terms, the ADR index, and the [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) §Revision-notes rider. ## Out of scope - The EL0→EL1 `SVC` trap trampoline, the panic-free dispatcher, `SyscallError`, and copy-from/to-user — all [T-021](T-021-syscall-dispatch.md). - Splitting `IpcError::InvalidTransferCap` (note C3-008) — deferred to a future ADR when a userspace transfer consumer needs the `TransferCapHasChildren` distinction. -- Redacting `CapObject` / `CapHandle` / `SlotId` `Debug` impls themselves — the redaction is at the `Capability` boundary, where the rights+object pairing is the sensitive unit; the handle types remain `Debug` for kernel-internal diagnostics that never cross to userspace. +- Redacting the **individual kernel-object handle types** (`TaskHandle` / `EndpointHandle` / `NotificationHandle` / `AddressSpaceHandle` / `SlotId`) and the userspace-facing `CapHandle` — these keep their derived `Debug` for kernel-internal diagnostics (scheduler dispatch traces, arena bookkeeping, test-failure messages) where the slot/generation is the useful information and never crosses to userspace. (`CapObject` *is* redacted in this task — see §Design notes — because it is the type a `Capability` carries toward a log boundary. The deeper per-handle redaction, if ever wanted, is a separate kernel-Debug-hygiene decision; T-021's `console_write` review must confirm no kernel-object handle is formatted into the userspace-reachable path.) - Any userspace crate, EL0 context, or real syscall invocation — Phase B6. ## Approach @@ -64,14 +64,17 @@ Per [error-handling standard §2](../../../standards/error-handling.md): the enu ## Definition of done -All acceptance criteria checked; gates green (incl. Miri — a [Phase-B exit prerequisite](../../../roadmap/phases/phase-b.md) with weight on `sched`/`ipc`); docs updated; ADR-0017 rider added; `current.md` reflects T-020 Done and B5 in progress. **Security-relevant** (capabilities + IPC): flagged for explicit review per CLAUDE.md. +All acceptance criteria checked; gates green (incl. Miri — a [Phase-B exit prerequisite](../../../roadmap/phases/phase-b.md) with weight on `sched`/`ipc`); docs updated; ADR-0017 rider added; `current.md` reflects T-020 `In Review` (implementation complete, awaiting maintainer merge) and B5 in progress. **Security-relevant** (capabilities + IPC): flagged for explicit review per CLAUDE.md. ## Design notes - The validation **order change** (kind-before-rights) is observable only for a capability that is both wrong-kind *and* missing-right; all existing rights-failure tests use correct-kind caps and remain `MissingRight`. Documented in [ADR-0030 §"The K2-5 `IpcError` split"](../../../decisions/0030-syscall-abi.md#the-k2-5-ipcerror-split-lands-now-in-t-020). - The security argument for revealing the failure mode (per-subject, unforgeable handles ⇒ no forgery/enumeration aid) is in [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split). The redaction keeps the *object identity* hidden even as the *failure mode* becomes visible — the two are independent surfaces. - Redaction approach is a custom `impl Debug`, not a `Redacted` wrapper, matching the codebase's direct-impl style and avoiding a cascading wrapper refactor; no code structurally depends on `Capability: Debug`. +- **`CapObject` redaction (folded in from the adversarial review).** The first redaction pass touched only `Capability`. An adversarial self-review flagged that `CapObject` (and the handle types it wraps) still derived `Debug`, so a `CapObject` formatted directly — now or in a future error/log — would leak the slot index + generation the `Capability` layer hides. Verified there is **no** current production formatter of `CapObject` (the only capability-type `Debug` formatter in the tree is the redacting `Capability::Debug`; `IpcError`/`SchedError` carry no capability payload; `EndpointState` has no `Debug`), so this was a *latent* gap, not a live leak. Per CLAUDE.md rule 1 ("when in doubt, choose the more conservative option") the gap was closed at the source: `CapObject` now has a kind-only redacting `Debug`. The individual handle types keep their derived `Debug` (kernel-internal diagnostics) — see §Out of scope. ## Review history -- _(filled on close)_ +- **2026-05-29 — Implementation complete; `In Progress → In Review`.** Landed the K2-5 `IpcError` split (`StaleHandle` / `WrongObjectKind` / `MissingRight`, validation reordered to resolve→type-check→authority across `validate_ep_cap` / `validate_notif_cap` / `sched::resolve_ep_cap`; 4 arena-staleness sites → `StaleHandle`) and the K3-9 redaction (`Capability` + `CapObject` `Debug` redacted). Tests: 6 existing assertions remapped + 8 new variant/redaction tests (kernel suite **187 → 196**). **Row-to-verification mapping** ([ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation)): row 3 (IPC error taxonomy) → `send_with_wrong_object_kind_returns_wrong_object_kind`, `recv_with_wrong_object_kind_returns_wrong_object_kind`, `notify_with_wrong_object_kind_returns_wrong_object_kind`, `cancel_recv_with_wrong_object_kind_returns_wrong_object_kind`, `send_with_dropped_cap_handle_returns_stale_handle`, `send_to_destroyed_endpoint_returns_stale_handle`, `cancel_recv_to_destroyed_endpoint_returns_stale_handle`, `notify_with_stale_handle_after_slot_reuse_fails`, plus the remapped `*_without_*_right_fails` (`MissingRight`) tests and the `sched` bridge `ipc_send_and_yield_send_error_preserves_scheduler_state` test — so all four operations (`ipc_send` / `ipc_recv` / `ipc_notify` / `ipc_cancel_recv`) pin all three variants; the K3-9 redaction → `cap::tests::debug_redacts_named_object_but_keeps_rights` + `capobject_debug_redacts_handle_but_shows_kind`. Gates all green: `fmt`, `host-test` (196 kernel / 43 hal / 53 test-hal), `host-clippy`, `kernel-clippy`, `kernel-build`, `miri` (no UB; Stacked Borrows). +- **2026-05-29 — Adversarial multi-lens self-review** (security / correctness / completeness / design). Verdict: **no Blocker/Major findings**; the split's reordering is consistent across the direct and scheduler-bridge paths, the `format_args!` redaction is sound, and `EndpointState` cannot leak. One Minor defense-in-depth finding (`CapObject`/handle `Debug` derives) was folded in for `CapObject` (see §Design notes); one Nit (test-module `#[allow]` pragma) was assessed and intentionally not applied (the test uses only `format!`/`assert!`, triggering none of the forbidden pragmas). +- **2026-05-29 — Maintainer review (post-rebase).** A maintainer review of the ADR/task arc raised: (Major) the same-day ADR corrections had been folded into the *Accepted* bodies (append-only concern) — resolved by rebasing the branch so the corrections land in the `Proposed` draft and Accept is a separate clean commit (no Accepted body edited post-Accept); (Major) the phase-b §B5 acceptance criterion over-promised a real EL0 round-trip — narrowed to the current-EL kernel-stub mechanism, with the real `0x400` EL0 round-trip moved to §B6; (Minor) `current.md` banner/links and (Minor) the ADR-0030 row-3 mapping over-claiming `cancel_recv` coverage — the latter resolved by adding `cancel_recv_with_wrong_object_kind_*` + `cancel_recv_to_destroyed_endpoint_*` tests; (Nit) the arena-staleness ordering caveat added to ADR-0030 §K2-5. No code-correctness or security bug was found by either review. diff --git a/docs/architecture/README.md b/docs/architecture/README.md index eeaf380..312d31b 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -15,7 +15,7 @@ The architecture is being written in phases. Many documents listed below are pla | [`hal.md`](hal.md) | Hardware Abstraction Layer: trait surfaces, board support packages, portability. | Accepted | | [`boot.md`](boot.md) | Boot flow from reset vector through kernel init to first userspace task. | Accepted (v0.0.1 — QEMU virt; T-013 EL drop landed) | | [`scheduler.md`](scheduler.md) | Cooperative FIFO scheduler: ready queue, idle task, raw-pointer IPC bridge, ContextSwitch trait. | Accepted (v0.0.1 — single-core, no preemption) | -| [`ipc.md`](ipc.md) | Inter-process communication: synchronous send/recv, endpoint state machine, capability transfer, scheduler-bridge wrappers. | Accepted (v0.0.1 — depth-1 endpoints) | +| [`ipc.md`](ipc.md) | Inter-process communication: synchronous send/recv, endpoint state machine, capability transfer, scheduler-bridge wrappers. | Accepted (v0.0.1 — depth-1 endpoints; `IpcError` taxonomy split per ADR-0030) | | [`exceptions.md`](exceptions.md) | Exception vector table, IRQ dispatch, GIC v2 driver, generic-timer IRQ wiring, idle WFI activation. | Accepted (v0.0.1 — T-012 Done 2026-04-28 via PR #10 merge; design + implementation match; maintainer-side QEMU smoke verification of the deliberate-deadline path remains pre-B1-closure work) | | [`memory-management.md`](memory-management.md) | Physical + virtual memory, MMU/paging, allocators, address-space objects, task loader. | Accepted (v0.0.1 — MMU/PMM/AddressSpace/loader; T-016..T-019) | | [`task-loader.md`](task-loader.md) | Task loader: raw-flat image → populated address space; rollback contract; audit-log surface. | Accepted (v0.0.1 — T-019) | diff --git a/docs/glossary.md b/docs/glossary.md index c171bd7..1e1a708 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -36,6 +36,8 @@ Terminology used throughout Tyrne. Entries are alphabetical. If a term appears i **Cooperative scheduling.** A scheduling model in which the CPU is only taken from a running task when that task voluntarily yields. Tyrne v1 is cooperative and single-core; preemption arrives later in Phase B / Phase C. +**EL0 / EL1 (Exception Levels).** ARM aarch64 privilege levels. EL0 is unprivileged (userspace); EL1 is the kernel/OS level. A task runs application code at EL0 and traps into the kernel at EL1 via an `SVC` (syscall) or an exception/interrupt. Tyrne drops to EL1 at boot ([ADR-0024](decisions/0024-el-drop-policy.md)) and exposes the EL0→EL1 syscall boundary in Phase B5 ([ADR-0030](decisions/0030-syscall-abi.md)). The higher levels (EL2 hypervisor, EL3 secure monitor) are not used by the kernel. See also *Syscall*, *SVC*. + **Endpoint.** In seL4-style IPC, a kernel object used to rendezvous senders and receivers. Possessing a capability to an endpoint is what grants the right to send or receive. **Generation tag.** The counter stored alongside every arena slot that detects stale handles. When a slot is freed and reused, its generation increments; a handle carries the generation it was issued with, so lookup can distinguish "same slot, new object" from "same slot, same object". See [ADR-0016](decisions/0016-kernel-object-storage.md). @@ -86,6 +88,12 @@ Terminology used throughout Tyrne. Entries are alphabetical. If a term appears i **StaticCell.** A BSP helper in [bsp-qemu-virt](../bsp-qemu-virt/src/main.rs) that wraps `UnsafeCell>` to provide write-once-at-boot, share-afterwards static storage for kernel state. It exposes `as_mut_ptr` so callers can derive raw pointers without materialising a `&mut` (see [ADR-0021](decisions/0021-raw-pointer-scheduler-ipc-bridge.md)). +**SVC (Supervisor Call).** The ARM aarch64 instruction a lower exception level uses to synchronously trap into a higher one. An `SVC` from EL0 takes the lower-EL synchronous exception vector at EL1; an `SVC` issued at EL1 takes the *current-EL* vector instead. Tyrne's syscall ABI uses `SVC #0` with the syscall number in `x8` ([ADR-0030](decisions/0030-syscall-abi.md)). See also *Syscall*, *EL0 / EL1*. + +**Syscall (system call).** The synchronous kernel entry point from userspace (EL0) into the kernel (EL1), made via an `SVC` instruction. The dispatcher validates the caller's capabilities and performs the operation, returning a typed result — and is panic-free on every untrusted input. Tyrne's v1 set is `send` / `recv` / `console_write` / `task_yield` / `task_exit` ([ADR-0031](decisions/0031-initial-syscall-set.md)). See also *Syscall ABI*, *SVC*. + +**Syscall ABI.** The register-level contract for a syscall: which register carries the syscall number (`x8`), which carry arguments (`x0`–`x5`), and how the status + payload return (`x0` = status word with `0` = `Ok`, `x1`–`x7` = payload). Fixed by [ADR-0030](decisions/0030-syscall-abi.md). A hardware-specific instance of an *ABI*. + **TCB (Trusted Computing Base).** The set of components that must be correct for the system's security guarantees to hold — code whose compromise would compromise everything. Tyrne keeps the TCB deliberately small by running drivers, filesystems, and network stacks in userspace rather than in the kernel, so that adding a feature does not enlarge the trusted core unless it strictly must; the README frames this as "the entire trusted computing base can be audited line by line." The boundary of the TCB is drawn in [architecture/security-model.md](architecture/security-model.md). See also *Microkernel* and *Trust boundary*. **Trust boundary.** A line in the system at which assumptions about integrity, confidentiality, or availability change. Crossing a trust boundary should require an explicit capability check. Trust boundaries are drawn in [architecture/security-model.md](architecture/security-model.md). diff --git a/docs/roadmap/current.md b/docs/roadmap/current.md index 973f8c7..477cc4c 100644 --- a/docs/roadmap/current.md +++ b/docs/roadmap/current.md @@ -4,6 +4,8 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ --- +> **2026-05-29 update — B5 opened: ADR-0030/0031 Accepted; T-020 (error taxonomy + Debug redaction) In Review; T-021 (SVC dispatch) Ready.** The B5 syscall boundary is now active on branch `t-020-syscall-error-taxonomy` (off `main`). [ADR-0030](../decisions/0030-syscall-abi.md) settles the syscall ABI (`x8` = number, `x0`–`x5` args, `x0` status + `x1`–`x7` payload, `SVC #0`) and the **K2-5** taxonomy split of `IpcError::InvalidCapability` → `StaleHandle` / `MissingRight` / `WrongObjectKind`; [ADR-0031](../decisions/0031-initial-syscall-set.md) fixes the five-syscall v1 set (`send` / `recv` / `console_write` / `task_yield` / `task_exit`; number `0` reserved-invalid; every object-naming syscall capability-gated per [P1/P4](../standards/architectural-principles.md)). Both **Accepted 2026-05-29** (Propose → careful-re-read + maintainer-review Accept). **[T-020](../analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md)** — the pure-Rust foundation (the `IpcError` split + `Capability`/`CapObject` `Debug` redaction, K3-9) — is **In Review** (implementation complete, all gates green incl. Miri; kernel host tests 196); **[T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md)** — the EL0→EL1 `SVC` trampoline + panic-free dispatcher + copy-from/to-user, the security-critical hardware-facing half — is opened **Ready**, deferred to its own arc per CLAUDE.md §6. A same-day maintainer review surfaced two items, folded into the ADR bodies **before Accept** (the ADRs were re-drafted at `Proposed` and Accepted in a separate commit, so no Accepted body was edited post-Accept): (a) the B5 kernel-stub `SVC` exercises the **current-EL** `VBAR_EL1+0x200` vector, not the lower-EL `+0x400` (EL0) vector — so the real-EL0 round-trip is runtime-verified in **B6**, not B5; (b) `console_write` is **capability-gated** on a debug-console capability (it was ambient authority, a P1/P4 violation). This banner supersedes the 2026-05-28 banner below. + > **2026-05-28 update — B4 CLOSED via the closure trio; B5 (syscall boundary) is next.** The B4 milestone (Task loader) is formally **Closed** via its closure trio — [business retrospective](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md) + [security review](../analysis/reviews/security-reviews/2026-05-28-B4-closure.md) (**Approve**) + [performance baseline](../analysis/reviews/performance-optimization-reviews/2026-05-28-B4-closure.md) — **which is the canonical source for B4's closing metrics** (not duplicated here). The period also included the 2026-05-22 full-tree [master review](../analysis/reviews/master-review/2026-05-22-152729/consolidated.md) (APPROVE the kernel; findings clustered in CI/doc/ADR, 0 kernel-correctness/security Blockers) and remediation PR #32 which — with MR-009 closed at this closure — resolved **all 24** verified Blocker+Major findings. Headline: gates green at HEAD `3ab029f` (**286** host tests; QEMU smoke clean, 629 guest-errors all pre-existing PL011; release perf band 15.641 / 17.587 / 19.150 ms). This banner supersedes the 2026-05-16 banner below. **Next:** B5 — ADR-0030 (syscall ABI) + ADR-0031 (initial syscall set), then EL0→EL1 SVC dispatch. > > **2026-05-16 update — T-019 merged; B4 implementation-complete; closure trio pending.** PR #31 merged into `main` at commit `7f876af` ("Merge pull request #31 from cemililik/t-019-task-loader"), landing T-019 (task loader) on `main`. The branch arc continued past the review-round-4 commit named in the 2026-05-15 banner below with two further follow-up commits: `5078944` (review-round 5 — added one PMM host test, taking the suite to **260/260**) and `eb14c51` (review-round 6 — 5 valid findings). T-019 status flips `In Review → Done` (`date_done: 2026-05-16`). **Host-test count at HEAD: 260/260** (42 hal + 175 kernel + 43 test-hal); the 2026-05-15 banner's "259/259" was accurate when written, before the round-5 PMM test landed. B4 is now **implementation-complete**; the **B4 closure trio (business + security + performance reviews) has NOT yet fired** and is the next review trigger (the maintainer sequences it separately). This banner resolves the pre-merge "In Review" state recorded below — that banner is retained as a point-in-time record. @@ -55,10 +57,10 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ - **Active phase:** B — opened 2026-04-21. **B0 closed 2026-04-27**; **B1 closed 2026-05-07**; **B2 closed 2026-05-09**; **B3 closed 2026-05-14** via PR #29's closure trio (business + security + performance baseline; merge commit `b425dc1`). All four closures lifted `Done` after a verbatim QEMU smoke trace + clean `-d guest_errors` count per the [business master-plan §Acceptance criteria](../analysis/reviews/business-reviews/master-plan.md#acceptance-criteria) rule. **The 2026-04-28 implementation-complete claim for B1 was rolled back on 2026-05-06 by the smoke regression and re-issued 2026-05-07 as a smoke-verified Done** — that remains the only re-open arc to date; B2 and B3 both closed cleanly on first attempt. - **Active milestone:** **B5 — Syscall boundary (opens next).** B4 (Task loader) was formally **Closed 2026-05-28** via its closure trio (see the top banner + the [B4 closure retrospective](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md)). B5 per [phase-b.md §B5](phases/phase-b.md): ADR-0030 (syscall ABI + `IpcError::InvalidCapability` split into `StaleHandle` / `MissingRight` / `WrongObjectKind`) + ADR-0031 (initial syscall set: `send`, `recv`, `console_write`, `task_yield`, `task_exit`), then EL0→EL1 SVC dispatch, a panic-free syscall dispatcher, validated copy-from/to-user through the active AS, and `Capability` `Debug` redaction. B5 is the prerequisite for the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper (B4 §3) that turns a `LoadedImage` into a runnable `CapHandle{CapObject::Task(...)}`, then B6 (first userspace "hello"). -- **Active task:** none — B4 closed via the 2026-05-28 closure trio; B5 prep / first ADR (ADR-0030 syscall ABI) opens next. Per [ADR-0025 §Rule 1](../decisions/0025-adr-governance-amendments.md), the implementation task opens in the same commit as the first B5 ADR's *Dependency chain* section. **Last task Done: T-019 — Task loader, 2026-05-16** (PR #31, merge `7f876af`; branch `t-019-task-loader` retired): `pub fn load_image(...) -> Result` in `kernel/src/obj/task_loader.rs` produces a `LoadedImage { as_cap, entry_va, stack_top_va, image_bytes, stack_bytes }` of a freshly populated userspace AS — **not** a runnable `CapHandle{CapObject::Task(...)}` (runnability gates on B5/B6); 10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry. DoD fully met (B5/B6 deferrals are §Out-of-scope, not unchecked DoD items). -- **In review:** none. (The B4 closure trio fired 2026-05-28 and is committed/awaiting maintainer review; it is not an in-flight task.) -- **In progress:** none. -- **Working branch:** none / B5 prep opens next. Development branches off `main` per the PR pattern; no task branch is currently active and no rebase is pending. +- **Active task:** **T-020 — Syscall error taxonomy (`IpcError` split + `Capability`/`CapObject` `Debug` redaction), In Review** on branch `t-020-syscall-error-taxonomy` (implementation complete, all gates green incl. Miri; awaiting maintainer review/merge). ADR-0030 (syscall ABI) + ADR-0031 (initial syscall set) Accepted 2026-05-29; [T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md) (EL0→EL1 SVC dispatch — the hardware-facing half) opened **Ready**, deferred to its own arc. **Last task Done: T-019 — Task loader, 2026-05-16** (PR #31, merge `7f876af`; branch `t-019-task-loader` retired): `pub fn load_image(...) -> Result` in `kernel/src/obj/task_loader.rs` produces a `LoadedImage { as_cap, entry_va, stack_top_va, image_bytes, stack_bytes }` of a freshly populated userspace AS — **not** a runnable `CapHandle{CapObject::Task(...)}` (runnability gates on B5/B6); 10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry. DoD fully met (B5/B6 deferrals are §Out-of-scope, not unchecked DoD items). +- **In review:** **T-020 — Syscall error taxonomy** (the `IpcError::InvalidCapability` → `StaleHandle`/`MissingRight`/`WrongObjectKind` split + `Capability`/`CapObject` `Debug` redaction) — implementation complete on branch `t-020-syscall-error-taxonomy`, all gates green (incl. Miri), awaiting maintainer review/merge. +- **In progress:** none. ([T-021](../analysis/tasks/phase-b/T-021-syscall-dispatch.md) is `Ready` but not started — the SVC trampoline lands in its own arc after T-020 merges.) +- **Working branch:** `t-020-syscall-error-taxonomy` (off `main` per the PR pattern; ADRs Accepted, code in progress). No rebase pending. - **Last completed milestone:** **B4 — Task loader, Closed 2026-05-28** via the closure trio ([business](../analysis/reviews/business-reviews/2026-05-28-B4-closure.md) + [security](../analysis/reviews/security-reviews/2026-05-28-B4-closure.md) Approve + [performance](../analysis/reviews/performance-optimization-reviews/2026-05-28-B4-closure.md) baseline). Required task Done: T-019 (2026-05-16, PR #31 `7f876af`). The trio is the **canonical source for B4's closing metrics**; headline: **286** host tests, QEMU smoke clean (629 guest-errors, all pre-existing PL011, zero fault classes), release perf band 15.641 / 17.587 / 19.150 ms, audit log 28 entries. The 2026-05-22 master review + PR #32 remediation (all 24 Blocker+Major findings resolved, MR-009 closed at this closure) landed in this period. **Previous closures:** **B3** 2026-05-14 (PR #29 `b425dc1`); **B2** 2026-05-09; **B1** 2026-05-07 (PR #15 `e9fa019` + PR #16 `95b15aa`); **B0** 2026-04-27 (PR #9 `9a66e8b`). - **Last completed tasks:** **T-019 — Done 2026-05-16, merged to `main` via PR #31** (branch `t-019-task-loader`, merge commit `7f876af`) — Task loader: `load_image` produces a `LoadedImage` descriptor of a freshly populated userspace AS (10-variant `LoadError`, leak-path-closure preflight chain, UNSAFE-2026-0027 byte-copy entry); does **not** mint a runnable `TaskCap` (B5/B6 prerequisite). **Earlier:** **T-018 — Done 2026-05-11, live on `main` 2026-05-14 via PR #28** (branch `t-018-address-space-kernel-object`, merge commit `47b0a86`). T-018 implementation: [`AddressSpace`](../../kernel/src/mm/address_space.rs) kernel-object struct + per-type [`AddressSpaceArena`](../../kernel/src/mm/address_space.rs) (ADR-0016 pattern); `CapKind::AddressSpace` + `CapObject::AddressSpace(AddressSpaceHandle)` variants in [`kernel/src/cap/mod.rs`](../../kernel/src/cap/mod.rs); capability-gated wrappers `cap_create_address_space` / `cap_map` / `cap_unmap` with step-by-step preflights (DERIVE rights → no-widening → depth preflight → arena/cap-table capacity → PMM alloc → arena commit → `cap_derive` cap-table insert); `Task` struct extension with `address_space_handle`; activation-on-context-switch hook threaded through `yield_now` / `start` / `ipc_recv_and_yield` / `ipc_send_and_yield` (closure-as-parameter, fires only when outgoing and incoming task ASes differ — short-circuits in v1's bootstrap-shared topology); BSP wiring in [`bsp-qemu-virt/src/main.rs`](../../bsp-qemu-virt/src/main.rs) wraps the already-live bootstrap root via the new `QemuVirtAddressSpace::from_existing_root` `pub unsafe fn` companion. Cross-cutting additions during the review-round arc: `MmuError::BlockMapped` variant (commit `8b9f52e`) so unmap into a bootstrap block descriptor surfaces a distinct typed error from `AlreadyMapped`; `CapabilityTable::depth_of` `pub(crate)` preflight helper closing the PMM-leak path; UNSAFE-2026-0014 fifth Amendment scope-extends the umbrella to the activation hook + BSP-side activation closure (zero new audit entries — additive scope on the existing `&mut Scheduler` momentary-borrow umbrella). Smoke trace gains one new line `tyrne: address-space-arena ready (1 / 8 slots used; bootstrap AS root = 0x4008d000)` immediately after `tyrne: pmm initialized (...)` and before `tyrne: timer ready (...)`. Full demo runs to `tyrne: all tasks complete`; `-d int,unimp,guest_errors` reports only the pre-existing PL011-disabled-UART noise (unchanged baseline). **Earlier:** T-017 — Done 2026-05-10 (PR #27, branch `t-017-physical-memory-manager`) — Physical Memory Manager (`Pmm` bitmap allocator + `FrameProvider` trait + UNSAFE-2026-0026 zero-fill audit). **Earlier:** T-016 — Done 2026-05-08 (branch `t-016-mmu-activation`) — MMU activation, VMSAv8 descriptor encoders, `MapperFlush` flush-token, UNSAFE-2026-0022 / 0023 / 0024 / 0025 introduced. **Earlier:** T-015 — Done 2026-05-07 (PR #17, branch `t-015-endpoint-rollback-cancel-recv`) — `ipc_cancel_recv` recovery primitive + symmetric scheduler+endpoint rollback in `ipc_recv_and_yield`'s Phase 2 Deadlock branch (ADR-0032). **Earlier:** T-014 (2026-05-07 via PR #15), T-012 (2026-04-28 via PR #10), T-013 (2026-04-27 via PR #9). - **Last reviews:** @@ -91,7 +93,7 @@ A short pointer file updated as work progresses. For the full plan see [`phases/ - [ADR-0026 — Idle dispatch via separate fallback slot](../decisions/0026-idle-dispatch-fallback.md) — `Accepted` (2026-05-06). Supersedes ADR-0022's *idle-task-location* axis only (Option A → Option B: dedicated `Scheduler::idle: Option` slot, dispatched via `ready.dequeue().or(s.idle)` only when the ready queue is empty). ADR-0022's *typed-error* axis (Option G — `SchedError::Deadlock` + `IpcError::PendingAfterResume` + `start`'s panic) stands. Implemented by T-014 (Done 2026-05-07). Includes a queue-state simulation table that ADR-0022 lacked; this discipline (simulation tables on multi-step state-machine ADRs) is the central learning of the [B1 smoke-regression arc](../analysis/reviews/business-reviews/2026-05-06-B1-smoke-regression.md). - [ADR-0032 — Endpoint state rollback + `ipc_cancel_recv` primitive](../decisions/0032-endpoint-rollback-and-cancel-recv.md) — `Accepted` (2026-05-07). Adds a recovery primitive that reverses an `Idle → RecvWaiting` transition, called by `ipc_recv_and_yield`'s Phase 2 Deadlock branch so both *scheduler* and *endpoint* state restore to pre-call shape on `SchedError::Deadlock`. Kernel-internal in v1 (no userspace caller); future consumers are the userspace-driven endpoint destroy drain (B2+), multi-waiter wake (ADR-0019 §Open questions), and preemption-rollback (B5+). Implemented by T-015 (Done 2026-05-07). Includes a Phase-2 Deadlock simulation table; ADR-0017 §Revision notes rider records the additive recovery primitive (user-observable surface unchanged). The Accept commit is the first project-side application of [`write-adr` skill](../../.agents/skills/write-adr/SKILL.md) step 10's *careful re-read* discipline as a separate diff from the Propose commit. - [ADR-0027 — Kernel virtual memory layout (B2 — identity-mapped MMU activation)](../decisions/0027-kernel-virtual-memory-layout.md) — **`Accepted` (2026-05-08)**. B2 commits to identity-only mapping (kernel in `TTBR0_EL1`; `TTBR1_EL1` reserved with `EPD1=1` for future high-half ADR-0033 placeholder when B5 surfaces per-task `TTBR0_EL1` swap), 4 KiB granule + 48-bit VA + 4-level translation, MAIR indices 0/1 for device-nGnRnE / normal-cached, four bootstrap page-table frames in a new `.boot_pt` section, and a typed [`MapperFlush`](../../hal/src/mmu/mod.rs) flush-token discipline at the `Mmu` trait surface (additive change to `map`/`unmap` return types, recorded in ADR-0009 §Revision notes rider via T-016). Includes a five-row §Simulation table walking the SCTLR.M=1 transition (Steps 0–4). **First non-recovery-primitive state-machine ADR drafted under [`write-adr` skill §Simulation](../../.agents/skills/write-adr/SKILL.md) discipline** — ADR-0026's table was the empirical retro-source; ADR-0032's table was the first application but its subject is a recovery primitive; ADR-0027 is the first productive-design state machine to use the rule. Implementation: T-016 (Draft, opens with the Propose commit). Accept landed as a separate commit (`bb0a6ba`) per `write-adr` §10. -- **Next task to open:** **B5 — Syscall boundary: ADR-0030 (syscall ABI) + ADR-0031 (initial syscall set).** ADR-0030 settles the register calling convention + error-return convention + the K2-5 `IpcError::InvalidCapability` split (`StaleHandle` / `MissingRight` / `WrongObjectKind`); ADR-0031 settles the initial syscall set (`send`, `recv`, `console_write`, `task_yield`, `task_exit` — no more in v1). Then EL0→EL1 SVC dispatch + a panic-free dispatcher + validated copy-from/to-user + `Capability` `Debug` redaction (K3-9). ADR numbers tentative per [ADR-0013](../decisions/0013-roadmap-and-planning.md). B5 is the prerequisite for the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper, then B6 (first userspace "hello"). The [phase-b.md §B5](phases/phase-b.md) plan describes the milestone shape. **(The B4-closure §Adjustments — MR-009's "Miri green = Phase-B exit prerequisite" line in phase-b.md and the `current.md` 260→286 count fix — were applied as part of this closure.)** +- **Next task to open:** **T-021 — EL0→EL1 SVC dispatch** (trap trampoline + panic-free dispatcher + copy-from/to-user + the debug-console capability + `SyscallError`), opened `Ready` and picked up after T-020 lands. ADR-0030 + ADR-0031 are now **Accepted** (no longer tentative — the syscall numbers `1`–`5` + number-`0`-reserved are a fixed contract per ADR-0031). B5's remaining shape after T-020/T-021: the deferred [`task_create_from_image`](phases/phase-b.md#milestone-b4--task-loader) wrapper, then B6 (first userspace "hello"), where the real EL0 round-trip through the lower-EL `VBAR_EL1+0x400` vector is finally runtime-verified (T-021's B5 proxy only drives the current-EL `+0x200` path). The [phase-b.md §B5](phases/phase-b.md) plan describes the milestone shape. - **Next review trigger:** **B5 closure trio** — produced when the first B5 milestone reaches `In Review`. (The B4 closure trio fired 2026-05-28.) Possible interim triggers: a mini-retro if EL0/syscall bring-up surfaces a learning worth capturing mid-arc; a maintainer-initiated review or a second on-demand full-tree master review if the corpus drifts again before B5 closes. Forward-flag audit notes: UNSAFE-2026-0025 / 0026's `Pending QEMU smoke verification` notes were lifted by T-019 (first post-bootstrap `cap_map` / `cap_create_address_space` runtime exerciser); UNSAFE-2026-0019 / 0020 / 0021 continue to gate on the first deadline-arming caller (B5+). ## Notes diff --git a/docs/roadmap/phases/phase-b.md b/docs/roadmap/phases/phase-b.md index f754242..4248bff 100644 --- a/docs/roadmap/phases/phase-b.md +++ b/docs/roadmap/phases/phase-b.md @@ -214,11 +214,11 @@ Traps from EL0 into EL1 via `SVC` (or the chosen mechanism). Syscall dispatch va ### Acceptance criteria - ADR-0030 and ADR-0031 Accepted. -- Syscall entry works from EL0 back to EL1 and back; register state is preserved correctly. -- Invalid syscalls (bad number, missing capability, out-of-bounds pointer) return typed errors without panicking. +- The syscall **dispatch mechanism** works and preserves register state: the dispatcher is installed at both the current-EL (`VBAR_EL1+0x200`) and lower-EL (`+0x400`) sync vectors, and the round-trip is exercised at B5 via an **EL1 kernel-stub `SVC`** that takes the current-EL `0x200` vector (per [ADR-0030 §Simulation](../../decisions/0030-syscall-abi.md#simulation), an `SVC` issued at EL1 cannot take the lower-EL vector). The **real EL0 round-trip through the `0x400` vector** — with the EL0↔EL1 privilege transition and copy-user against a separate userspace `TTBR0_EL1` — requires kernel mappings in the userspace AS + an EL0 context register file (gated on the ADR-0033 high-half placeholder) and is therefore a **B6 acceptance criterion**, not B5. +- Invalid syscalls (bad number, missing/stale/wrong-kind capability, out-of-bounds pointer) return typed errors without panicking; every object-naming syscall performs a capability check (P1/P4). - Copy-from-user never dereferences raw user pointers outside the validated mapping. -- `IpcError` variants are split per ADR-0030's taxonomy; all call sites and tests updated. -- `Capability` `Debug` output redacts security-sensitive fields. +- `IpcError` variants are split per ADR-0030's taxonomy (`StaleHandle` / `WrongObjectKind` / `MissingRight`); all call sites and tests updated. *(Done — T-020.)* +- `Capability` (and `CapObject`) `Debug` output redacts the named kernel object. *(Done — T-020.)* ### Flags to resolve during B5 @@ -244,6 +244,7 @@ A real userspace task, loaded by B4, running in EL0 in its own address space, ma ### Acceptance criteria - Userspace "hello from userspace" appears on the serial console after the kernel's greeting. +- **The real EL0→EL1 syscall round-trip is exercised at runtime** (carried over from B5): a true EL0 task takes the lower-EL sync vector (`VBAR_EL1+0x400`), the dispatcher copies the `console_write` buffer from the userspace `TTBR0_EL1` address space, and `ERET` returns to EL0 — the half B5's EL1 kernel-stub proxy could not prove (see [ADR-0030 §Simulation](../../decisions/0030-syscall-abi.md#simulation)). - Userspace can call `task_exit` cleanly; the kernel reports task termination. - Guide: `docs/guides/first-userspace.md` committed. - Performance review recording IPC round-trip and context-switch numbers against the A6 baseline. From 7b35ed67762655f1fb9d2b2cf4130b9a7221e48b Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 10:54:49 +0300 Subject: [PATCH 07/12] test(ipc): make wrong-kind tests actually prove kind-before-rights MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second-round review found the four *_wrong_object_kind tests handed the cap the operation's own right, so they returned WrongObjectKind under *both* the chosen kind-first order and a hypothetical rights-first regression — i.e. ordering-agnostic, proving nothing (a rights-first flip would not fail them). Fix: each test now uses a wrong-kind cap that also LACKS the required right (CapRights::empty()), the only input that discriminates the order (WrongObjectKind under kind-first, MissingRight under rights-first), so a regression to rights-first now fails the tests. Updates the section comment and T-020 AC#4 to state what each test actually proves; corrects T-020's stale test counts (AC#6 194 -> 196; review history "8 new" -> "9 new"). (The stale-variant references in the Turkish technical-analysis IPC chapter were also refreshed on disk for local reference, but that tree is gitignored, so it is not part of this commit / the repo.) No production code change; fmt / host-test 196 / clippy / build / miri (no UB) all green. Refs: ADR-0030 Co-Authored-By: Claude Opus 4.8 (1M context) --- .../phase-b/T-020-syscall-error-taxonomy.md | 6 +-- kernel/src/ipc/mod.rs | 37 ++++++++++++------- 2 files changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md index 40e6b1c..3ed9208 100644 --- a/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md +++ b/docs/analysis/tasks/phase-b/T-020-syscall-error-taxonomy.md @@ -28,9 +28,9 @@ This task deliberately **excludes** the trap trampoline, the dispatcher, `Syscal - [x] `IpcError::InvalidCapability` is removed and replaced by `IpcError::StaleHandle`, `IpcError::MissingRight`, `IpcError::WrongObjectKind`, each with a doc-comment describing its distinct meaning per [ADR-0030](../../../decisions/0030-syscall-abi.md). `IpcError` stays `#[non_exhaustive]`. - [x] Every production site is mapped to the correct variant: `validate_ep_cap` / `validate_notif_cap` (`ipc/mod.rs`) and `resolve_ep_cap` (`sched/mod.rs`) resolve in the order `StaleHandle → WrongObjectKind → MissingRight`; arena `get`/`get_mut` staleness failures map to `StaleHandle`. - [x] Every existing test asserting `InvalidCapability` is updated to its correct post-split variant (rights failures → `MissingRight`; stale-handle/destroyed-object → `StaleHandle`), and the `sched` bridge test is updated. -- [x] New host tests pin each new variant on a path that does **not** already prove it — `WrongObjectKind` for `ipc_send` / `ipc_recv` / `ipc_notify` (wrong-kind caps that carry the right, proving kind-before-rights), and `StaleHandle` for `ipc_send` on both a dropped cap handle and a destroyed endpoint. +- [x] New host tests pin each new variant on a path that does **not** already prove it — `WrongObjectKind` for `ipc_send` / `ipc_recv` / `ipc_notify` / `ipc_cancel_recv` using a **wrong-kind cap that also lacks the operation's right** (the only input that discriminates the kind-before-rights order: `WrongObjectKind` under kind-first, `MissingRight` under a rights-first regression), and `StaleHandle` for `ipc_send` (dropped cap handle + destroyed endpoint) and `ipc_cancel_recv` (destroyed endpoint). - [x] `Capability`'s `Debug` impl is custom (not derived) and prints `Capability { rights: , object: }` — `rights` visible, the named object redacted — per [ADR-0030 §"Security of the taxonomy split"](../../../decisions/0030-syscall-abi.md#security-of-the-taxonomy-split) and the K3-9 redaction requirement. **`CapObject` is also redacted** (kind-only `Debug`, hiding the wrapped handle) — closing the defense-in-depth gap the adversarial review raised. Two host tests pin both layers. -- [x] All gates green: `cargo fmt --all -- --check`, `cargo host-test` (194 kernel), `cargo host-clippy`, `cargo kernel-clippy`, `cargo kernel-build`, and `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` (no UB). +- [x] All gates green: `cargo fmt --all -- --check`, `cargo host-test` (196 kernel / 43 hal / 53 test-hal), `cargo host-clippy`, `cargo kernel-clippy`, `cargo kernel-build`, and `cargo miri test --workspace --exclude tyrne-bsp-qemu-virt` (no UB). - [x] Docs updated: [`docs/architecture/ipc.md`](../../../architecture/ipc.md) §"`IpcError` taxonomy", [`docs/architecture/security-model.md`](../../../architecture/security-model.md) redaction rule broadened to capabilities, [`docs/glossary.md`](../../../glossary.md) syscall terms, the ADR index, and the [ADR-0017](../../../decisions/0017-ipc-primitive-set.md) §Revision-notes rider. ## Out of scope @@ -75,6 +75,6 @@ All acceptance criteria checked; gates green (incl. Miri — a [Phase-B exit pre ## Review history -- **2026-05-29 — Implementation complete; `In Progress → In Review`.** Landed the K2-5 `IpcError` split (`StaleHandle` / `WrongObjectKind` / `MissingRight`, validation reordered to resolve→type-check→authority across `validate_ep_cap` / `validate_notif_cap` / `sched::resolve_ep_cap`; 4 arena-staleness sites → `StaleHandle`) and the K3-9 redaction (`Capability` + `CapObject` `Debug` redacted). Tests: 6 existing assertions remapped + 8 new variant/redaction tests (kernel suite **187 → 196**). **Row-to-verification mapping** ([ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation)): row 3 (IPC error taxonomy) → `send_with_wrong_object_kind_returns_wrong_object_kind`, `recv_with_wrong_object_kind_returns_wrong_object_kind`, `notify_with_wrong_object_kind_returns_wrong_object_kind`, `cancel_recv_with_wrong_object_kind_returns_wrong_object_kind`, `send_with_dropped_cap_handle_returns_stale_handle`, `send_to_destroyed_endpoint_returns_stale_handle`, `cancel_recv_to_destroyed_endpoint_returns_stale_handle`, `notify_with_stale_handle_after_slot_reuse_fails`, plus the remapped `*_without_*_right_fails` (`MissingRight`) tests and the `sched` bridge `ipc_send_and_yield_send_error_preserves_scheduler_state` test — so all four operations (`ipc_send` / `ipc_recv` / `ipc_notify` / `ipc_cancel_recv`) pin all three variants; the K3-9 redaction → `cap::tests::debug_redacts_named_object_but_keeps_rights` + `capobject_debug_redacts_handle_but_shows_kind`. Gates all green: `fmt`, `host-test` (196 kernel / 43 hal / 53 test-hal), `host-clippy`, `kernel-clippy`, `kernel-build`, `miri` (no UB; Stacked Borrows). +- **2026-05-29 — Implementation complete; `In Progress → In Review`.** Landed the K2-5 `IpcError` split (`StaleHandle` / `WrongObjectKind` / `MissingRight`, validation reordered to resolve→type-check→authority across `validate_ep_cap` / `validate_notif_cap` / `sched::resolve_ep_cap`; 4 arena-staleness sites → `StaleHandle`) and the K3-9 redaction (`Capability` + `CapObject` `Debug` redacted). Tests: 6 existing assertions remapped + 9 new variant/redaction tests (kernel suite **187 → 196**: 5 `ipc` variant + 2 `cancel_recv` variant + 2 `cap` redaction). **Row-to-verification mapping** ([ADR-0030 §Simulation](../../../decisions/0030-syscall-abi.md#simulation)): row 3 (IPC error taxonomy) → `send_with_wrong_object_kind_returns_wrong_object_kind`, `recv_with_wrong_object_kind_returns_wrong_object_kind`, `notify_with_wrong_object_kind_returns_wrong_object_kind`, `cancel_recv_with_wrong_object_kind_returns_wrong_object_kind`, `send_with_dropped_cap_handle_returns_stale_handle`, `send_to_destroyed_endpoint_returns_stale_handle`, `cancel_recv_to_destroyed_endpoint_returns_stale_handle`, `notify_with_stale_handle_after_slot_reuse_fails`, plus the remapped `*_without_*_right_fails` (`MissingRight`) tests and the `sched` bridge `ipc_send_and_yield_send_error_preserves_scheduler_state` test — so all four operations (`ipc_send` / `ipc_recv` / `ipc_notify` / `ipc_cancel_recv`) pin all three variants; the K3-9 redaction → `cap::tests::debug_redacts_named_object_but_keeps_rights` + `capobject_debug_redacts_handle_but_shows_kind`. Gates all green: `fmt`, `host-test` (196 kernel / 43 hal / 53 test-hal), `host-clippy`, `kernel-clippy`, `kernel-build`, `miri` (no UB; Stacked Borrows). - **2026-05-29 — Adversarial multi-lens self-review** (security / correctness / completeness / design). Verdict: **no Blocker/Major findings**; the split's reordering is consistent across the direct and scheduler-bridge paths, the `format_args!` redaction is sound, and `EndpointState` cannot leak. One Minor defense-in-depth finding (`CapObject`/handle `Debug` derives) was folded in for `CapObject` (see §Design notes); one Nit (test-module `#[allow]` pragma) was assessed and intentionally not applied (the test uses only `format!`/`assert!`, triggering none of the forbidden pragmas). - **2026-05-29 — Maintainer review (post-rebase).** A maintainer review of the ADR/task arc raised: (Major) the same-day ADR corrections had been folded into the *Accepted* bodies (append-only concern) — resolved by rebasing the branch so the corrections land in the `Proposed` draft and Accept is a separate clean commit (no Accepted body edited post-Accept); (Major) the phase-b §B5 acceptance criterion over-promised a real EL0 round-trip — narrowed to the current-EL kernel-stub mechanism, with the real `0x400` EL0 round-trip moved to §B6; (Minor) `current.md` banner/links and (Minor) the ADR-0030 row-3 mapping over-claiming `cancel_recv` coverage — the latter resolved by adding `cancel_recv_with_wrong_object_kind_*` + `cancel_recv_to_destroyed_endpoint_*` tests; (Nit) the arena-staleness ordering caveat added to ADR-0030 §K2-5. No code-correctness or security bug was found by either review. diff --git a/kernel/src/ipc/mod.rs b/kernel/src/ipc/mod.rs index 548bb3d..5190723 100644 --- a/kernel/src/ipc/mod.rs +++ b/kernel/src/ipc/mod.rs @@ -955,11 +955,15 @@ mod tests { // // These pin the two split variants that the pre-existing rights-failure // tests above (now `MissingRight`) do not reach. The `WrongObjectKind` - // tests deliberately give the cap the operation's right too, proving the - // kind check runs *before* the rights check (ADR-0030 ordering: a - // wrong-kind cap fails with `WrongObjectKind` even when it carries the - // right). `StaleHandle` is exercised on both the table-lookup-miss path - // (a dropped cap handle) and the arena-staleness path (a destroyed + // tests use a wrong-kind cap that ALSO lacks the operation's right — + // the *only* input that discriminates the kind-before-rights ordering + // ADR-0030 §K2-5 specifies. Under the chosen order (kind → rights) the + // result is `WrongObjectKind`; under a hypothetical rights-first + // regression the same cap would return `MissingRight`. So a flip to + // rights-first would flip the asserted variant and fail these tests + // (a cap that *carries* the right would be ordering-agnostic and prove + // nothing). `StaleHandle` is exercised on both the table-lookup-miss + // path (a dropped cap handle) and the arena-staleness path (a destroyed // endpoint whose cap still resolves in the table). #[test] @@ -967,9 +971,11 @@ mod tests { let mut table = CapabilityTable::new(); let mut ep_arena = EndpointArena::default(); let mut queues = IpcQueues::new(); - // A Task cap that even carries SEND — but it is not an endpoint. + // A Task cap (wrong kind) that also lacks SEND. Kind-first → the + // result is WrongObjectKind; a rights-first order would return + // MissingRight, so this discriminates the ADR-0030 ordering. let cap_h = table - .insert_root(Capability::new(CapRights::SEND, task_object(1))) + .insert_root(Capability::new(CapRights::empty(), task_object(1))) .unwrap(); assert_eq!( ipc_send( @@ -990,9 +996,10 @@ mod tests { let mut table = CapabilityTable::new(); let mut ep_arena = EndpointArena::default(); let mut queues = IpcQueues::new(); - // A Task cap that even carries RECV — but it is not an endpoint. + // Wrong kind (Task) and lacks RECV → WrongObjectKind under kind-first, + // MissingRight under rights-first; the assertion pins kind-first. let cap_h = table - .insert_root(Capability::new(CapRights::RECV, task_object(1))) + .insert_root(Capability::new(CapRights::empty(), task_object(1))) .unwrap(); assert_eq!( ipc_recv(&mut ep_arena, &mut queues, cap_h, &mut table).unwrap_err(), @@ -1004,9 +1011,10 @@ mod tests { fn notify_with_wrong_object_kind_returns_wrong_object_kind() { let mut table = CapabilityTable::new(); let mut notif_arena = NotificationArena::default(); - // A Task cap that even carries NOTIFY — but it is not a notification. + // Wrong kind (Task) and lacks NOTIFY → WrongObjectKind under kind-first, + // MissingRight under rights-first; the assertion pins kind-first. let cap_h = table - .insert_root(Capability::new(CapRights::NOTIFY, task_object(2))) + .insert_root(Capability::new(CapRights::empty(), task_object(2))) .unwrap(); assert_eq!( ipc_notify(&mut notif_arena, cap_h, &table, 0xFF).unwrap_err(), @@ -1599,13 +1607,14 @@ mod tests { #[test] fn cancel_recv_with_wrong_object_kind_returns_wrong_object_kind() { - // Symmetric to the send/recv wrong-kind tests: a cap that carries - // RECV but names a Task (not an endpoint) fails kind-before-rights. + // Symmetric to the send/recv wrong-kind tests: a Task cap (wrong kind) + // that also lacks RECV. Kind-first → WrongObjectKind; a rights-first + // order would return MissingRight, so this discriminates the ordering. let mut table = CapabilityTable::new(); let mut ep_arena = EndpointArena::default(); let mut queues = IpcQueues::new(); let cap_h = table - .insert_root(Capability::new(CapRights::RECV, task_object(1))) + .insert_root(Capability::new(CapRights::empty(), task_object(1))) .unwrap(); assert_eq!( ipc_cancel_recv(&mut ep_arena, &mut queues, cap_h, &table).unwrap_err(), From 806c9660041929483c784e8c34087686512da350 Mon Sep 17 00:00:00 2001 From: Cemil ILIK Date: Fri, 29 May 2026 12:21:56 +0300 Subject: [PATCH 08/12] =?UTF-8?q?feat(syscalls):=20EL0=E2=86=92EL1=20SVC?= =?UTF-8?q?=20dispatch=20=E2=80=94=20trampoline,=20panic-free=20dispatcher?= =?UTF-8?q?,=20copy-user?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Land the security-critical hardware-facing half of B5 (T-021): the EL0→EL1 SVC trap path that instantiates ADR-0030's calling convention and ADR-0031's five-syscall v1 set. New architecture-agnostic, panic-free, host-tested kernel `syscall` module: - error.rs: SyscallError composing CapError/IpcError via From, with a stable numeric status encoding (0 = Ok; 1-3 top-level; 0x10x = Cap; 0x20x = Ipc). - abi.rs: SyscallNumber decode (release debug-gate on console_write via cfg!(debug_assertions)), the register frame types, value↔register packing for Message/outcomes, and the Option null-handle sentinel. - user_access.rs: UserAccessWindow + validated copy_from_user/copy_to_user (range-check-then-copy; wrap and zero-length handled; never derefs an unvalidated user pointer). - dispatch.rs: the panic-free dispatcher + per-syscall handlers + the debug-console capability check; control-plane syscalls (task_yield/exit) return a SyscallEffect directive rather than touching the scheduler. Capability surface: CapObject::DebugConsole (singleton, no handle) + CapRights::CONSOLE_WRITE (bit 7, added to KNOWN_BITS) + CapHandle::from_raw (ABI-decode constructor; reconstructed handles are validated by lookup). BSP (hardware-facing): tyrne_sync_trampoline in vectors.s installed at both VBAR_EL1+0x200 (current-EL, the B5 path) and +0x400 (lower-EL AArch64, the B6 EL0 path) — saves the full x0-x30 + SP_EL0 + ELR_EL1 + SPSR_EL1 frame, routes ESR_EL1.EC==SVC64 to a Rust syscall_entry, else to the existing panic path. SyscallTrapFrame (272 B, #[repr(C)], const-asserted to match the asm). kernel_entry runs an EL1 kernel-stub SVC smoke (console_write + bad-number). Gates: fmt / host-clippy / kernel-clippy / kernel-build clean; host tests 236 (+40); cargo test --release green (the debug-gate release-path tests); cargo miri test --workspace --exclude tyrne-bsp-qemu-virt clean (43+236+53). QEMU smoke (debug): two SVCs taken at the current-EL vector (ESR 0x15/SVC64, EL1→EL1), clean ERET; console_write emits its buffer via the syscall path (status 0x0, 63 bytes); a reserved-invalid number returns BadSyscallNumber (0x1); -d int,unimp,guest_errors shows no new fault class; the cooperative demo still runs to "tyrne: all tasks complete". The real EL0 +0x400 round-trip (EL0↔EL1 transition + copy-user against a separate userspace TTBR0_EL1) is wired but runtime-verified in B6 per ADR-0030 §Simulation. Refs: ADR-0030, ADR-0031 Audit: UNSAFE-2026-0029, UNSAFE-2026-0030 Co-Authored-By: Claude Opus 4.8 (1M context) --- bsp-qemu-virt/src/main.rs | 123 +++ bsp-qemu-virt/src/syscall.rs | 206 +++++ bsp-qemu-virt/src/vectors.s | 116 ++- .../tasks/phase-b/T-021-syscall-dispatch.md | 67 +- docs/architecture/exceptions.md | 43 +- docs/audits/unsafe-log.md | 41 + docs/roadmap/current.md | 12 +- docs/roadmap/phases/phase-b.md | 10 +- kernel/src/cap/mod.rs | 23 + kernel/src/cap/rights.rs | 14 +- kernel/src/cap/table.rs | 28 +- kernel/src/lib.rs | 6 + kernel/src/syscall/abi.rs | 459 ++++++++++ kernel/src/syscall/dispatch.rs | 814 ++++++++++++++++++ kernel/src/syscall/error.rs | 250 ++++++ kernel/src/syscall/mod.rs | 52 ++ kernel/src/syscall/user_access.rs | 350 ++++++++ 17 files changed, 2583 insertions(+), 31 deletions(-) create mode 100644 bsp-qemu-virt/src/syscall.rs create mode 100644 kernel/src/syscall/abi.rs create mode 100644 kernel/src/syscall/dispatch.rs create mode 100644 kernel/src/syscall/error.rs create mode 100644 kernel/src/syscall/mod.rs create mode 100644 kernel/src/syscall/user_access.rs diff --git a/bsp-qemu-virt/src/main.rs b/bsp-qemu-virt/src/main.rs index dfdcc91..4345000 100644 --- a/bsp-qemu-virt/src/main.rs +++ b/bsp-qemu-virt/src/main.rs @@ -49,6 +49,7 @@ mod exceptions; mod gic; mod mmu; mod mmu_bootstrap; +mod syscall; use console::Pl011Uart; use cpu::QemuVirtCpu; @@ -384,6 +385,16 @@ static EP_CAP_A: StaticCell = StaticCell::new(); /// Task B's endpoint capability handle (index into `TABLE_B`). static EP_CAP_B: StaticCell = StaticCell::new(); +// ─── T-021 syscall-boundary smoke ───────────────────────────────────────────── + +/// The EL1 kernel-stub's capability table — the `caller_table` the syscall +/// dispatcher resolves capabilities in for the B5 `SVC` smoke (see +/// [`syscall::syscall_entry`]). In B5 the only `SVC` comes from a kernel-stub, +/// so it has a dedicated table holding a single debug-console capability; +/// B6 replaces this with the scheduler's current-task table once a real EL0 +/// task exists. Distinct from `TABLE_A` / `TABLE_B` (the IPC-demo tables). +static SYSCALL_STUB_TABLE: StaticCell = StaticCell::new(); + /// Task kernel-object arena — global per [ADR-0016]. Although the v1 demo /// never reads this arena after `create_task` has returned the two /// `TaskHandle`s, global storage is the uniform pattern established by @@ -653,6 +664,109 @@ fn task_a() -> ! { } } +// ─── T-021 syscall-boundary smoke ────────────────────────────────────────────── + +/// EL1 kernel-stub `SVC` smoke for the B5 syscall boundary ([T-021]). +/// +/// Issues two `SVC #0` traps **from EL1** — exercising the current-EL +/// `VBAR_EL1 + 0x200` sync vector and the full save → decode → dispatch → +/// `ERET` round-trip (an `SVC` issued at EL1 cannot take the lower-EL `+0x400` +/// vector; that real-EL0 path is B6's smoke per [ADR-0030 §Simulation]): +/// +/// 1. **`console_write`** (number `5`) through a granted debug-console +/// capability — the dispatcher's capability check passes, `copy_from_user` +/// validates the buffer against the active address space, and the bytes are +/// emitted on the serial console (the round-trip + emitted-bytes half of B5 +/// acceptance criterion #7). +/// 2. a **reserved-invalid number** (`0`) — the panic-free error path returns +/// `SyscallError::BadSyscallNumber` (status `0x1`) without touching any +/// capability. +/// +/// Runs after the IPC statics are published (the dispatcher's +/// [`SyscallContext`][tyrne_kernel::syscall::SyscallContext] borrows +/// `EP_ARENA` / `IPC_QUEUES`) and before `start()`. `task_yield` / `task_exit` +/// are not driven here — their dispatcher routing is host-tested; their real +/// EL0 semantics land in B6. +/// +/// [T-021]: https://github.com/HodeTech/Tyrne/blob/main/docs/analysis/tasks/phase-b/T-021-syscall-dispatch.md +/// [ADR-0030 §Simulation]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0030-syscall-abi.md +#[allow( + clippy::cast_possible_truncation, + reason = "Tyrne's BSP target is 64-bit aarch64; pointer/usize → u64 \ + register-word casts are lossless" +)] +fn syscall_boundary_smoke(console: &Pl011Uart) { + // Mint a debug-console capability into the kernel-stub's table. + // + // SAFETY: `SYSCALL_STUB_TABLE` lives in `.bss`; this is its single write, + // performed before any `SVC` issues. The momentary `&mut` for the + // `insert_root` drops before the trap. Audit: UNSAFE-2026-0010 (StaticCell) + // + UNSAFE-2026-0014 (momentary `&mut`). + let cons_cap = unsafe { + (*SYSCALL_STUB_TABLE.0.get()).write(CapabilityTable::new()); + let table = (*SYSCALL_STUB_TABLE.0.get()).assume_init_mut(); + table + .insert_root(Capability::new( + CapRights::CONSOLE_WRITE, + CapObject::DebugConsole, + )) + .expect("debug-console cap mint in empty table cannot fail") + }; + let cons_cap_word = tyrne_kernel::syscall::encode_cap_handle(Some(cons_cap)); + + // (1) console_write via SVC: x8 = 5, x0 = cap, x1 = buffer VA, x2 = length. + let greeting: &[u8] = b"tyrne: hello from the syscall boundary (console_write via SVC)\n"; + let ptr = greeting.as_ptr() as u64; + let len = greeting.len() as u64; + let status: u64; + let written: u64; + // SAFETY: `SVC #0` traps to the EL1 current-EL sync vector (+0x200), runs + // the panic-free dispatcher, and `ERET`s back here. The convention is + // x8 = number, x0..x2 = args; the handler writes x0 = status, x1 = bytes + // written, clobbers x0..x7, and preserves x8..x30 + SP_EL0. The emitted + // greeting bytes are the observable round-trip proof. Audit: UNSAFE-2026-0029. + unsafe { + core::arch::asm!( + "svc #0", + in("x8") 5u64, + inout("x0") cons_cap_word => status, + inout("x1") ptr => written, + in("x2") len, + out("x3") _, + out("x4") _, + out("x5") _, + out("x6") _, + out("x7") _, + ); + } + + // (2) reserved-invalid number 0 → BadSyscallNumber, panic-free. + let bad_status: u64; + // SAFETY: same `SVC` trap mechanism; number 0 is reserved-invalid, so the + // dispatcher returns a typed `SyscallError::BadSyscallNumber` in x0 without + // touching any capability or panicking. Audit: UNSAFE-2026-0029. + unsafe { + core::arch::asm!( + "svc #0", + in("x8") 0u64, + out("x0") bad_status, + out("x1") _, + out("x2") _, + out("x3") _, + out("x4") _, + out("x5") _, + out("x6") _, + out("x7") _, + ); + } + + let mut w = FmtWriter(console); + let _ = writeln!( + w, + "tyrne: syscall smoke ok (console_write status={status:#x}, bytes={written}; bad-number status={bad_status:#x})" + ); +} + // ─── Boot entry ─────────────────────────────────────────────────────────────── // Reset entry (`_start`). See `boot.s` and `docs/architecture/boot.md`. @@ -1216,6 +1330,15 @@ pub extern "C" fn kernel_entry() -> ! { (*EP_CAP_B.0.get()).write(ep_cap_b); } + // ── Syscall-boundary smoke — T-021 ──────────────────────────────────────── + // + // Exercise the EL0→EL1 `SVC` trap → panic-free dispatcher → `ERET` + // round-trip via an EL1 kernel-stub (the current-EL `+0x200` vector). Runs + // here, after the IPC statics the dispatcher's context borrows are live, and + // before `start()` hands control to the cooperative demo. The real EL0 + // (`+0x400`) round-trip is B6's smoke. + syscall_boundary_smoke(console); + // ── Scheduler setup ─────────────────────────────────────────────────────── let mut sched = Scheduler::::new(); diff --git a/bsp-qemu-virt/src/syscall.rs b/bsp-qemu-virt/src/syscall.rs new file mode 100644 index 0000000..4a92138 --- /dev/null +++ b/bsp-qemu-virt/src/syscall.rs @@ -0,0 +1,206 @@ +//! BSP-side syscall glue: the `SVC` trap frame and the Rust entry the +//! `vectors.s` sync trampoline calls. +//! +//! The architecture-agnostic, panic-free dispatch logic lives in the kernel +//! ([`tyrne_kernel::syscall`]). This module owns only the **hardware-facing** +//! half: +//! +//! - [`SyscallTrapFrame`] — the `#[repr(C)]` mirror of the register frame the +//! `tyrne_sync_trampoline` in `vectors.s` saves (`x0`–`x30` + `SP_EL0` + +//! `ELR_EL1` + `SPSR_EL1`); its field order and offsets must match the asm +//! `stp` sequence byte-for-byte (a compile-time `size_of` guard catches drift). +//! - [`syscall_entry`] — reads the syscall number + arguments from the saved +//! frame, builds a [`SyscallContext`] from the BSP statics, calls +//! [`tyrne_kernel::syscall::dispatch`], and applies the returned +//! [`SyscallEffect`] by writing the status + payload back into the frame. +//! +//! ## B5 scope and the `0x200` / `0x400` split +//! +//! The shared trampoline is installed at **both** sync vector slots — current-EL +//! (`VBAR_EL1 + 0x200`) and lower-EL-AArch64 (`VBAR_EL1 + 0x400`) — because the +//! save → dispatch → `ERET` mechanism is privilege-entry-agnostic. In B5 the +//! only `SVC` comes from an **EL1 kernel-stub** (see `kernel_entry`'s syscall +//! smoke), which — executing at the *current* EL — takes the `0x200` vector, +//! **not** the lower-EL `0x400` vector. A real EL0 task taking the `0x400` +//! vector (with the EL0↔EL1 privilege transition and copy-user against a +//! separate userspace `TTBR0_EL1`) is verified at runtime in **B6**, per +//! [ADR-0030 §Simulation row-to-verification mapping][adr-0030]. The `0x400` +//! handler is installed now so B6 adds only the EL0 task, not new trap plumbing. +//! +//! `caller_table` is a dedicated **kernel-stub** capability table in B5 +//! ([`crate::SYSCALL_STUB_TABLE`]); B6 replaces it with the scheduler's +//! current-task table once a real EL0 task exists. +//! +//! Audit: UNSAFE-2026-0029 (the trap-frame asm + this entry's frame +//! reads/writes). +//! +//! [adr-0030]: https://github.com/HodeTech/Tyrne/blob/main/docs/decisions/0030-syscall-abi.md + +use tyrne_kernel::syscall::{ + dispatch, SyscallArgs, SyscallContext, SyscallEffect, UserAccessWindow, +}; + +/// Saved-register frame the `tyrne_sync_trampoline` in `vectors.s` populates +/// before branching into [`syscall_entry`] on an `SVC`. +/// +/// `#[repr(C)]` is **mandatory**: the field order and byte offsets must match +/// the asm `stp` sequence in `vectors.s` exactly. The frame is 272 bytes total +/// (`x0`–`x29` as 15 pairs, then `x30`/`SP_EL0`, then `ELR_EL1`/`SPSR_EL1`), +/// 16-byte SP-aligned. Unlike the IRQ [`TrapFrame`][crate::exceptions::TrapFrame] +/// (which saves only the AAPCS64 caller-saved set), the syscall frame saves the +/// **full** general-purpose register file plus `SP_EL0` so it is a complete +/// snapshot of the trapped context — the shape a real EL0 task (B6) and any +/// future preemption arc require. +/// +/// Fields are private: the only reader/writer is [`syscall_entry`] in this +/// module, and keeping the raw register snapshot un-`pub` avoids exposing +/// (or accidentally logging) trapped register contents elsewhere. +#[repr(C)] +pub struct SyscallTrapFrame { + // `x0`–`x29` saved as 15 consecutive pairs at offsets 0x00..0xF0. + x0_x1: [u64; 2], + x2_x3: [u64; 2], + x4_x5: [u64; 2], + x6_x7: [u64; 2], + x8_x9: [u64; 2], + x10_x11: [u64; 2], + x12_x13: [u64; 2], + x14_x15: [u64; 2], + x16_x17: [u64; 2], + x18_x19: [u64; 2], + x20_x21: [u64; 2], + x22_x23: [u64; 2], + x24_x25: [u64; 2], + x26_x27: [u64; 2], + x28_x29: [u64; 2], + /// `x30` (LR) at 0xF0 and `SP_EL0` at 0xF8. + x30_sp_el0: [u64; 2], + /// `ELR_EL1` (return address) at 0x100 and `SPSR_EL1` (saved PSTATE) at 0x108. + elr_spsr: [u64; 2], +} + +// The trampoline reserves exactly 272 bytes and writes through fixed offsets +// mirroring the field order above. A size/layout drift between the asm and this +// `#[repr(C)]` would corrupt saved registers on every syscall; this guard fails +// the build before that can ship. (Mirrors the `TrapFrame` 192-byte guard.) +const _: () = assert!(core::mem::size_of::() == 272); + +/// Length of the syscall copy-from/to-user window in B5: the whole +/// identity-mapped RAM extent the bootstrap address space covers. +/// +/// The B5 EL1 kernel-stub runs on the bootstrap AS, which identity-maps the +/// managed extent (per [ADR-0027 §Decision outcome (a)]), so the stub's buffer +/// — a `.rodata`-resident `&[u8]` in the kernel image — is in range. B6's real +/// EL0 task derives a tighter window from its own mapped region (see +/// [`UserAccessWindow`]'s module docs). The subtraction is a `const` (no runtime +/// arithmetic): `PMM_EXTENT_END > PMM_EXTENT_START` by construction. +const SYSCALL_USER_WINDOW_LEN: usize = crate::PMM_EXTENT_END - crate::PMM_EXTENT_START; + +/// Rust entry for the `SVC` sync trampoline (`vectors.s`). +/// +/// Reads the syscall number (`x8`) and arguments (`x0`–`x5`) from the saved +/// `frame`, dispatches through [`tyrne_kernel::syscall::dispatch`], and applies +/// the resulting [`SyscallEffect`] by writing the status (`x0`) and payload +/// (`x1`–`x7`) back into the frame. Returns to the trampoline, which restores +/// the (now result-bearing) frame and `ERET`s. +/// +/// # Safety +/// +/// `extern "C"` so the asm trampoline can `bl` it. `frame` is guaranteed valid +/// by the trampoline (constructed via `stp` immediately before the `bl`, on the +/// kernel stack); this function dereferences it only inside `unsafe` blocks. +/// +/// **Why `unsafe` is required.** The function reads and writes the saved +/// register frame through a raw `*mut SyscallTrapFrame` (the asm calling +/// convention passes a pointer, not a `&mut`), and it materialises momentary +/// references to the write-once BSP statics via `assume_init_{mut,ref}`. +/// **Invariants upheld.** (1) The four statics it reaches +/// (`EP_ARENA` / `IPC_QUEUES` / `SYSCALL_STUB_TABLE` / `CONSOLE`) are all +/// written before the syscall smoke issues any `SVC`; (2) v1 is single-core and +/// the `SVC` handler runs with interrupts masked (exception entry masks `DAIF`), +/// so no peer aliases them mid-call; (3) the momentary `&mut`s are scoped to the +/// single `dispatch` call and do not cross a context switch — the data-plane +/// syscalls do not switch and the control-plane ones return a directive *before* +/// any switch, honouring the [ADR-0021] discipline; (4) the frame writes touch +/// only `x0`–`x7`, leaving the trampoline's restore of `x8`–`x30` + `SP_EL0` + +/// `ELR_EL1` + `SPSR_EL1` intact. **Rejected alternatives.** Passing a `&mut +/// SyscallTrapFrame` from the asm is impossible (asm has no Rust references); +/// holding the BSP statics behind a lock would deadlock the interrupts-masked +/// handler with no soundness gain under single-core cooperative semantics. +/// +/// Audit: UNSAFE-2026-0029 (trap-frame asm + frame access) + UNSAFE-2026-0010 +/// (`StaticCell` pattern) + UNSAFE-2026-0014 (momentary `&mut` to kernel state). +#[unsafe(no_mangle)] +pub unsafe extern "C" fn syscall_entry(frame: *mut SyscallTrapFrame) { + // SAFETY: `frame` is valid per the trampoline contract above; read the + // syscall number (x8) and argument words (x0..x5) out of the saved frame. + // Audit: UNSAFE-2026-0029. + let args = unsafe { + let f = &*frame; + SyscallArgs { + number: f.x8_x9[0], + args: [ + f.x0_x1[0], f.x0_x1[1], f.x2_x3[0], f.x2_x3[1], f.x4_x5[0], f.x4_x5[1], + ], + } + }; + + // SAFETY: build the dispatch context from the write-once BSP statics. All + // four are initialised in `kernel_entry` before the syscall smoke runs; + // single-core + interrupts-masked-in-handler means no aliasing; the + // momentary `&mut`s drop at the end of the `dispatch` call and never cross a + // switch. Audit: UNSAFE-2026-0010 (StaticCell) + UNSAFE-2026-0014 (momentary + // `&mut` to kernel state) + UNSAFE-2026-0029 (the syscall arc). + let effect = unsafe { + let mut ctx = SyscallContext { + ep_arena: (*crate::EP_ARENA.0.get()).assume_init_mut(), + queues: (*crate::IPC_QUEUES.0.get()).assume_init_mut(), + caller_table: (*crate::SYSCALL_STUB_TABLE.0.get()).assume_init_mut(), + console: (*crate::CONSOLE.0.get()).assume_init_ref(), + user_window: UserAccessWindow::new(crate::PMM_EXTENT_START, SYSCALL_USER_WINDOW_LEN), + }; + dispatch(&mut ctx, args) + }; + + match effect { + SyscallEffect::Resume(r) => { + // SAFETY: write the status (x0) + payload (x1..x7) back into the + // saved frame; the trampoline restores them on `ERET`. Touches only + // x0..x7. Audit: UNSAFE-2026-0029. + unsafe { + let f = &mut *frame; + f.x0_x1[0] = r.status; // x0 = status + f.x0_x1[1] = r.payload[0]; // x1 + f.x2_x3[0] = r.payload[1]; // x2 + f.x2_x3[1] = r.payload[2]; // x3 + f.x4_x5[0] = r.payload[3]; // x4 + f.x4_x5[1] = r.payload[4]; // x5 + f.x6_x7[0] = r.payload[5]; // x6 + f.x6_x7[1] = r.payload[6]; // x7 + } + } + SyscallEffect::Reschedule => { + // task_yield. v1 B5 stand-in: there is no scheduler-resident EL0 + // task issuing this (the smoke runs the stub before `start()`), so + // the real `yield_now` wiring lands in B6 once the caller is an EL0 + // task. The dispatcher-level routing (number 3 → Reschedule) is + // host-tested; here we resume with `Ok` (x0 = 0) — task_yield + // "always succeeds in v1" per ADR-0031. + // SAFETY: write x0 only. Audit: UNSAFE-2026-0029. + unsafe { + (*frame).x0_x1[0] = tyrne_kernel::syscall::OK_STATUS; + } + } + SyscallEffect::Terminate(_code) => { + // task_exit. The ABI says "does not return", but v1 has no EL0 + // context register file to drop — real termination lands in B6. The + // dispatcher-level routing (number 4 → Terminate) is host-tested; + // here we defensively resume with `Ok` so a stray kernel-stub + // task_exit cannot wedge the boot before B6 wires real termination. + // SAFETY: write x0 only. Audit: UNSAFE-2026-0029. + unsafe { + (*frame).x0_x1[0] = tyrne_kernel::syscall::OK_STATUS; + } + } + } +} diff --git a/bsp-qemu-virt/src/vectors.s b/bsp-qemu-virt/src/vectors.s index 3da5554..1224f3b 100644 --- a/bsp-qemu-virt/src/vectors.s +++ b/bsp-qemu-virt/src/vectors.s @@ -27,8 +27,14 @@ * * Tyrne runs at EL1 with SPSel = 1 (per ADR-0024's EL drop + * SPSR_EL2 = 0x3c5 = EL1h). An IRQ taken from kernel code lands at - * +0x280; userspace doesn't exist in v1 so the lower-EL entries are - * unreachable. Sync/FIQ/SError on any class trampoline to a panic. + * +0x280. The two *sync* entries (+0x200 current-EL and +0x400 lower-EL + * AArch64) route to the SVC sync trampoline (T-021): on ESR_EL1.EC == + * SVC64 they save the full register frame and call the Rust syscall + * dispatcher; any other sync cause falls through to the panic path. + * In v1 only the +0x200 path fires (an EL1 kernel-stub `SVC`); the + * +0x400 (real EL0) path is wired now but exercised at runtime in B6. + * FIQ/SError on any class, and sync on the unused SP_EL0 / AArch32 + * categories, still trampoline to a panic. * * Each entry is one `b