OpenDIKW · helebest · Jun 29, 2026 · Jun 29, 2026
diff --git a/.claude/skills/dikw-web-verify-frontend/SKILL.md b/.claude/skills/dikw-web-verify-frontend/SKILL.md
@@ -92,6 +92,50 @@ no-UI-framework, dark reader contrast, graph filters/legend/no-bloom, the
 markdown HTML allow-list, and the surface contracts. Items it marks "e2e: …" are
 already gated — for those, re-run that spec instead of eyeballing.
 
+## Step 2.5 — Measured perf + a11y (Chrome DevTools MCP)
+
+Turn the **eyeballed** a11y / contrast / perf items of the rubric into a *measured*
+pass against numbers, not vibes. Use the **`chrome-devtools-mcp`** plugin (already
+installed — `lighthouse_audit`, `performance_start_trace` / `performance_stop_trace`,
+`performance_analyze_insight`; skills `chrome-devtools-mcp:a11y-debugging` and
+`debug-optimize-lcp`). This is the verification step the Chrome MCP interaction pass
+(Step 1) can't give you. **Run it for the route(s) the diff touched**; skip a route the
+change can't affect.
+
+Two different tools — don't conflate them: **`lighthouse_audit` excludes performance**
+(its tool reference directs perf to the trace tools), so a11y comes from Lighthouse and
+Web Vitals come from a performance trace.
+
+1. Open the changed route in a Chrome DevTools MCP page at
+   `http://127.0.0.1:4321/#<route>` (reuse the running dev server).
+2. **Accessibility (+ best-practices) → `lighthouse_audit`.** Run it with the
+   **accessibility** and **best-practices** categories (**not** `performance` — the tool
+   excludes it). The `a11y-debugging` skill walks specific failures (semantic HTML, ARIA
+   labels, focus order, tap-target size, contrast ratios).
+3. **Web Vitals → a performance trace.** `performance_start_trace` (reload = true so the
+   load is captured) → exercise the route → `performance_stop_trace`; read CLS + LCP from
+   the trace, and use `performance_analyze_insight` on the LCP/CLS insight for detail.
+   (The `debug-optimize-lcp` skill covers this flow.)
+4. Score against this rubric (the budget is a floor, not a target):
+   - **Accessibility ≥ 0.9**, and **no new violation** vs `main` for the route — from the
+     Lighthouse pass. Treat a dropped score as a fail; fix the contrast / label / role and
+     re-audit. This backs the rubric's "contrast ≥ 4.5:1 body / 3:1 headings" with a number.
+   - **CLS ≤ 0.1** — from the trace. Already gated by `tests/e2e/perf.spec.ts` on the
+     primary routes; here it's a cross-check on the *changed* route, and the trace shows
+     *which* element shifted so a regression is fixable, not just flagged.
+   - **LCP** — from the trace; a **soft** budget: record it and flag a clear regression vs
+     `main`, but it's runner-dependent (annotated, not hard-gated, in `perf.spec.ts`).
+5. **Pixi `#graph` caveat (same root cause as Step 1's gotcha):** a background
+   DevTools MCP tab can stall `requestAnimationFrame`, so a performance trace of `#graph`
+   may capture a canvas that never animated. Trace graph perf only in a foreground page,
+   or skip the trace there and rely on `graph.spec.ts` for its render contract. The
+   Lighthouse accessibility audit (DOM-based) is unaffected — the node overlay exposes
+   stable button targets, so run a11y on `#graph` normally.
+
+These are **measured-locally** checks, not new CI gates (Lighthouse + trace timing is
+runner-dependent — the same reason `perf.spec.ts` gates only CLS). A ❌ here feeds Step
+4's loop like any other finding.
+
 ## Step 3 — (if the change touches core data shape) smoke the live contract
 
 If the change reads a different `/v1` field/shape, the mocked e2e suite can't
@@ -102,5 +146,6 @@ Skip when the change is purely presentational.
 ## Step 4 — Close the loop
 
 Any ❌ → fix the source, re-run the affected gate, re-verify the route. Only
-report the UI change done once behavior + rubric pass clean in both themes. This
-skill is Step 5 of the `dikw-web-delivery-workflow`.
+report the UI change done once behavior (Step 1), the rubric (Step 2), and the
+measured perf + a11y pass (Step 2.5) are clean in both themes. This skill is Step 5
+of the `dikw-web-delivery-workflow`.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,21 @@ file format introduced in `[0.0.1.0]` was dropped.
 
 ## [Unreleased]
 
+## [0.8.8] - 2026-06-29
+
+### Changed
+
+- **`dikw-web-verify-frontend` gains a measured perf + a11y pass (Step 2.5).** The
+  frontend-verify skill previously eyeballed the `docs/ui-checklist.md` a11y / contrast
+  / perf items. It now uses the already-installed `chrome-devtools-mcp`: `lighthouse_audit`
+  for **accessibility + best-practices** (the tool excludes performance), plus a
+  `performance_start_trace`/`stop_trace` for **Web Vitals**, scored to a rubric — **a11y
+  ≥ 0.9** with no new violation, **CLS ≤ 0.1** (cross-checking the `perf.spec.ts` gate),
+  **LCP** recorded as a soft budget — turning the qualitative items into numbers.
+  Locally-measured, not a new CI gate (Lighthouse + trace timing is runner-dependent).
+  The `#graph` Pixi route audits a11y normally but skips the background-tab perf trace.
+  See `docs/adr/0005-delivery-loop-hardening.md`.
+
 ## [0.8.7] - 2026-06-29
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -70,7 +70,7 @@ End-to-end loop from request to landed PR. Run autonomously for behavior changes
    - 3.1 Run `/codex:review --background` for an independent review pass.
    - 3.2 Evaluate the findings, decide which are valid, and fix.
 4. **Final pass.** Run `/code-review`, scored against `docs/review-rubric.md` (the project-specific principles), and resolve every finding before continuing.
-5. **Verify in the browser.** For UI changes, invoke the `dikw-web-verify-frontend` skill: navigate the changed routes via Chrome MCP, confirm a clean runtime console on real data, exercise the affected interactions, and run the `docs/ui-checklist.md` rubric in light + dark — confirm the change actually rendered as intended, not just that unit tests pass.
+5. **Verify in the browser.** For UI changes, invoke the `dikw-web-verify-frontend` skill: navigate the changed routes via Chrome MCP, confirm a clean runtime console on real data, exercise the affected interactions, run the `docs/ui-checklist.md` rubric in light + dark, and run the **measured perf + a11y pass** (Step 2.5 — Chrome DevTools MCP: `lighthouse_audit` for accessibility ≥ 0.9 with no new violation, and a `performance_start_trace`/`stop_trace` for CLS ≤ 0.1 + LCP) for the changed route — confirm the change actually rendered as intended, not just that unit tests pass.
 6. **Update markdown docs.** Walk `CLAUDE.md`, `README.md`, and the relevant `docs/*.md` against the diff; any contract, behavior, command, or doc index that drifted must be updated in the same change. Don't leave docs to "catch up later".
 7. **Create the PR.** Branch with a descriptive name, commit with `<type>(<scope>): <subject>` matching the project's existing convention (see recent `git log`), push, then `gh pr create`. CI auto-runs lint + format:check + typecheck + coverage + build + e2e + bundle budget + the `gate-integrity` reward-hacking gate (`check:gate`) + security scans (npm audit, gitleaks, Trivy, CodeQL). Bump `package.json.version` manually (standard 3-digit SemVer) when the change warrants it, and add an entry to `CHANGELOG.md` under the matching version heading. On merge to `main`, CI's `release` job auto-cuts a GitHub Release tagged `dikw-web-v<version>` from `package.json.version` (idempotent — only a version bump creates a new tag; notes come from the matching CHANGELOG section via `scripts/changelog-notes.mjs`), so a deliberate version bump is what publishes a release.
 8. **Monitor CI and PR comments; resolve as they surface, then merge.** After pushing, actively watch both signals — don't passively wait, and don't batch resolution to merge time.

diff --git a/docs/adr/0005-delivery-loop-hardening.md b/docs/adr/0005-delivery-loop-hardening.md
@@ -86,3 +86,30 @@ state protocol, container `--network none` isolation, per-token cost accounting
 out of scope. dikw-web's loop is interactive Claude Code with worktree isolation
 already available for background jobs and no destructive operations; that machinery
 would add complexity without proportional value here.
+
+---
+
+## Item 2 — trustworthy green signal (flaky e2e)
+
+Resolved outside this effort: the flaky `graph.spec.ts > renders a nonblank Pixi graph
+canvas` was root-cause-fixed in PR #140 (attach the Pixi canvas only inside the
+effect's `active` guard; the spec gates on `data-render-count >= 1`). `main`
+deliberately keeps `retries: 2` as a *general* backstop for timing-sensitive specs
+(no longer Pixi-specific), so the gate's `e2e-retries-raised` check guards that
+decision without forcing it to 1.
+
+## Item 3 — measured perf + a11y in `verify-frontend`
+
+The `dikw-web-verify-frontend` skill verified real-browser behavior + a clean console,
+then eyeballed the `docs/ui-checklist.md` a11y/contrast/perf items. Following Delba
+Oliveira's feedback-loops note ("many checks have criteria Claude can measure against:
+a performance budget, an accessibility checklist"), **Step 2.5** now uses the
+already-installed `chrome-devtools-mcp` against the changed route: `lighthouse_audit` for
+**accessibility + best-practices** (the tool deliberately excludes performance), plus a
+`performance_start_trace`/`stop_trace` for **Web Vitals**. Scored to a rubric: **a11y ≥
+0.9** with no new violation (Lighthouse), **CLS ≤ 0.1** (cross-checking the
+`perf.spec.ts` gate) and **LCP** as a soft budget (both from the trace). Kept a
+**locally-measured** step, not a new CI gate — Lighthouse + trace timing is
+runner-dependent, the same reason `perf.spec.ts` hard-gates only CLS. The `#graph` Pixi route audits a11y normally but skips the perf trace in a
+background tab (stalled `requestAnimationFrame`), mirroring the skill's existing
+Chrome-MCP caveat.
diff --git a/docs/ui-checklist.md b/docs/ui-checklist.md
@@ -63,9 +63,28 @@ no e2e are the ones the manual pass exists for.
 - [ ] **Contrast.** Normal article text ≥ 4.5:1; large headings ≥ 3:1;
   metadata/control text ≥ 3:1 against their background. _e2e: `theme.spec.ts`
   computes these — re-run it for reader changes rather than eyeballing._
+  _measured: also covered by the `lighthouse_audit` accessibility category in
+  `dikw-web-verify-frontend` Step 2.5 (DevTools MCP), which scores contrast +
+  labels + roles for the changed route._
 - [ ] **No console errors** in either theme (the e2e console gate covers mocked
   flows; the manual pass covers real-data rendering). See `tests/e2e/harness.ts`.
 
+## Measured perf + a11y (DevTools MCP)
+
+> These back the eyeballed items above with numbers. Run via
+> `dikw-web-verify-frontend` **Step 2.5** for the route the diff touched, with two
+> `chrome-devtools-mcp` tools: `lighthouse_audit` for **a11y** (it excludes performance)
+> and a `performance_start_trace`/`stop_trace` for **Web Vitals**. Measured locally, not
+> a CI gate (Lighthouse + trace timing is runner-dependent — the same reason
+> `perf.spec.ts` gates only CLS).
+
+- [ ] **Accessibility score ≥ 0.9** for the changed route, with **no new violation**
+  vs `main` (`lighthouse_audit`). A dropped score is a fail — fix and re-audit.
+- [ ] **CLS ≤ 0.1** on the changed route (performance trace). _e2e: `perf.spec.ts` gates
+  this on the primary routes; the trace here shows *which* element shifted._
+- [ ] **LCP** recorded from the trace; flag a clear regression vs `main` (soft budget —
+  annotated, not hard-gated, in `perf.spec.ts`).
+
 ## Graph (`#graph`)
 
 > Do **not** verify the Pixi canvas through Chrome MCP — a background MCP tab

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "dikw-web",
-  "version": "0.8.7",
+  "version": "0.8.8",
   "private": true,
   "type": "module",
   "engines": {