Skip to content

v0.9 sub-issue #6: Stage 5 Guard (withdrawal / lifecycle / expiry watchers + clean teardown + audit emitter) #125

@devin-ai-integration

Description

@devin-ai-integration

v0.9 Sub-issue #6 — Stage 5 Guard

Part of v0.9 epic.

Implements v0.7 §4.3 + §5 + §6 + v0.8 Part B Stage 5 (Guard): the
withdrawal watcher, lifecycle watcher, expiry watcher, audit emitter
and clean teardown. After this sub-issue merges, lifectl run is
fully functional except for the integration tests + echo Provider +
docs (sub-issue 7).

Spec ref

  • docs/LIFE_RUNTIME_STANDARD.md §4.3 (withdrawal polling + revocation)
  • docs/LIFE_RUNTIME_STANDARD.md §5 (audit emission)
  • docs/LIFE_RUNTIME_STANDARD.md §6 (termination)
  • docs/LIFE_RUNTIME_STANDARD.md Part B §B.7 (withdrawal_poll,
    lifecycle_transition_observed)
  • docs/LIFE_LIFECYCLE_SPEC.md (lifecycle states)

Watchers

Each watcher runs as a daemon thread (or async task) started after
Stage 3's mount_succeeded event.

Withdrawal watcher (§4.3)

  • Polls life-package.withdrawal_endpoint at minimum every 24h.
    CLI flag --poll-interval-override <seconds> (test-only) reduces this
    for the conformance harness.
  • Each poll emits withdrawal_poll{endpoint, result, http_status}
    result is one of not_revoked / revoked / unreachable / malformed.
  • On revoked: trigger graceful teardown with reason withdrawal;
    emit unmount{reason: "withdrawal"}.
  • On 3 consecutive unreachable polls (≥ 72h gap): emit
    withdrawal_unreachable_warning{consecutive_failures} but do NOT
    unmount (per §4.3 — runtime continues serving but flags the user).

Lifecycle watcher

  • Polls the package's lifecycle/lifecycle.json::lifecycle_state at
    the same cadence as the withdrawal watcher.
  • Detects transitions:
    • active → superseded: emit lifecycle_transition_observed;
      teardown with reason lifecycle if the user has set
      ~/.config/dlrs/runtime.json::on_supersede = "unmount" (default
      "warn"); else flag user with a banner.
    • active → frozen: enter memorial read-only mode (Stage 4 starts
      refusing new turns; replays only). v0.9 implementation: emit a
      transition event + print a banner "this .life is now frozen
      in memorial state — read-only" and refuse further input. Memorial
      holographic echo (the Q8 D extension) is v0.10+.
    • active → withdrawn: teardown with reason lifecycle.

For pointer-mode .life, the lifecycle file may be remote (per
lifecycle spec). v0.9 reads from local cache only; warns but does NOT
auto-fetch (offline-first).

Expiry watcher

  • Reads life-package.expires_at once at mount time. Schedules a
    callback at that time (using threading.Timer or asyncio.call_at).
  • On fire: emit expiry_reached{expires_at} + teardown with reason
    expiry.

Teardown sequence

When any of the three watchers (or Ctrl-C / explicit lifectl quit)
triggers teardown:

  1. Stop accepting new turns at Stage 4 Run loop (the loop checks a
    shared Event flag every iteration).
  2. Wait for any in-flight invoke() to return (≤ 30s timeout; SIGTERM
    the subprocess host after that).
  3. Call teardown() on every bound LifeCapabilityProvider.
  4. SIGTERM all user_installed subprocess hosts; SIGKILL after 5s if
    they don't exit.
  5. Remove any temp-extracted files from the zip mount (no raw-asset
    leakage per §3.3).
  6. Emit unmount{reason} as the final event in the audit chain.
  7. lifectl process exits 0 (normal teardown) or non-zero (error).

Audit emitter (v0.4 hash chain)

runtime/audit/emitter.py (created in sub-issue 1, fleshed out here):

  • Hash-chained appender to audit/events.jsonl inside the runtime's
    per-mount data directory (~/.local/share/dlrs/mounts/<package_id>/).
    This is the runtime's local audit log, distinct from the .life's
    bundled audit/events.jsonl (which is read-only at runtime).
  • Each event {event_type, occurred_at, actor: "runtime/<version>", prev_hash, ...fields}.
  • prev_hash is sha256(prev_event_json_canonical) per v0.4 hash chain.
  • The first runtime event (mount_attempted) chains its prev_hash from
    the bundled audit log's tip (this is what links the runtime's session
    log back to the .life's issuer-emitted chain).

Module layout

runtime/guard/
├── __init__.py            # exports start_watchers(verify_result, assemble_result, on_teardown)
├── withdrawal_watcher.py
├── lifecycle_watcher.py
├── expiry_watcher.py
└── teardown.py            # signal-handlers + ordered teardown sequence

runtime/audit/
├── __init__.py
└── emitter.py             # full hash-chain implementation (was stub in sub1)

Audit events emitted

  • withdrawal_poll{endpoint, result, http_status} — every poll.
  • withdrawal_unreachable_warning{consecutive_failures} — at 3+ failures.
  • lifecycle_transition_observed{old_state, new_state, package_id}
    any state change.
  • expiry_reached{expires_at} — once at the scheduled time.
  • unmount{reason} — the terminal event.

CLI surface

lifectl run after this PR exits cleanly on:

  • Ctrl-C (SIGINT) → unmount{reason: "user_quit"}.
  • Withdrawal trigger → unmount{reason: "withdrawal"}.
  • Lifecycle transition to withdrawnunmount{reason: "lifecycle"}.
  • Expiry reached → unmount{reason: "expiry"}.
  • Stage exception → unmount{reason: "error"}.

lifectl run --poll-interval-override <s> flag added (test-only).

Tests

tools/test_runtime_guard.py:

  1. Withdrawal trigger: mock endpoint returns {revoked: true} on
    the second poll → unmount within 2 polls; unmount{reason: "withdrawal"}
    emitted.
  2. Withdrawal unreachable warning: mock endpoint always 503 → 3
    withdrawal_poll{result: "unreachable"} events + 1
    withdrawal_unreachable_warning event; runtime keeps serving.
  3. Lifecycle to withdrawn: rebuild package mid-run with state
    withdrawn; lifecycle watcher detects + unmounts.
  4. Lifecycle to frozen: detect transition; Run loop refuses further
    input; memorial banner shown.
  5. Expiry trigger: short-lived package (expires_at = now + 5s);
    timer fires → unmount with reason expiry.
  6. Ctrl-C: send SIGINT; runtime tears down within 30s + emits
    unmount{reason: "user_quit"}.
  7. Subprocess SIGTERM-then-SIGKILL: user_installed Provider
    subprocess refuses SIGTERM; runtime SIGKILLs after 5s; no zombie.
  8. Hash chain unbroken: full mount → 5+ turns → unmount; assert
    every event in the runtime's audit log links via prev_hash and the
    first event's prev_hash matches the .life's bundled audit-tip hash.

Acceptance

  • All 3 watchers implemented + tested
  • Clean teardown with no zombie subprocesses
  • Audit hash chain validates across stages
  • All 8 test cases pass
  • CI runtime-guard job green

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions