diff --git a/.claude/skills/dikw-web-verify-frontend/SKILL.md b/.claude/skills/dikw-web-verify-frontend/SKILL.md index 6360660..2e72002 100644 --- a/.claude/skills/dikw-web-verify-frontend/SKILL.md +++ b/.claude/skills/dikw-web-verify-frontend/SKILL.md @@ -92,6 +92,50 @@ no-UI-framework, dark reader contrast, graph filters/legend/no-bloom, the markdown HTML allow-list, and the surface contracts. Items it marks "e2e: …" are already gated — for those, re-run that spec instead of eyeballing. +## Step 2.5 — Measured perf + a11y (Chrome DevTools MCP) + +Turn the **eyeballed** a11y / contrast / perf items of the rubric into a *measured* +pass against numbers, not vibes. Use the **`chrome-devtools-mcp`** plugin (already +installed — `lighthouse_audit`, `performance_start_trace` / `performance_stop_trace`, +`performance_analyze_insight`; skills `chrome-devtools-mcp:a11y-debugging` and +`debug-optimize-lcp`). This is the verification step the Chrome MCP interaction pass +(Step 1) can't give you. **Run it for the route(s) the diff touched**; skip a route the +change can't affect. + +Two different tools — don't conflate them: **`lighthouse_audit` excludes performance** +(its tool reference directs perf to the trace tools), so a11y comes from Lighthouse and +Web Vitals come from a performance trace. + +1. Open the changed route in a Chrome DevTools MCP page at + `http://127.0.0.1:4321/#` (reuse the running dev server). +2. **Accessibility (+ best-practices) → `lighthouse_audit`.** Run it with the + **accessibility** and **best-practices** categories (**not** `performance` — the tool + excludes it). The `a11y-debugging` skill walks specific failures (semantic HTML, ARIA + labels, focus order, tap-target size, contrast ratios). +3. **Web Vitals → a performance trace.** `performance_start_trace` (reload = true so the + load is captured) → exercise the route → `performance_stop_trace`; read CLS + LCP from + the trace, and use `performance_analyze_insight` on the LCP/CLS insight for detail. + (The `debug-optimize-lcp` skill covers this flow.) +4. Score against this rubric (the budget is a floor, not a target): + - **Accessibility ≥ 0.9**, and **no new violation** vs `main` for the route — from the + Lighthouse pass. Treat a dropped score as a fail; fix the contrast / label / role and + re-audit. This backs the rubric's "contrast ≥ 4.5:1 body / 3:1 headings" with a number. + - **CLS ≤ 0.1** — from the trace. Already gated by `tests/e2e/perf.spec.ts` on the + primary routes; here it's a cross-check on the *changed* route, and the trace shows + *which* element shifted so a regression is fixable, not just flagged. + - **LCP** — from the trace; a **soft** budget: record it and flag a clear regression vs + `main`, but it's runner-dependent (annotated, not hard-gated, in `perf.spec.ts`). +5. **Pixi `#graph` caveat (same root cause as Step 1's gotcha):** a background + DevTools MCP tab can stall `requestAnimationFrame`, so a performance trace of `#graph` + may capture a canvas that never animated. Trace graph perf only in a foreground page, + or skip the trace there and rely on `graph.spec.ts` for its render contract. The + Lighthouse accessibility audit (DOM-based) is unaffected — the node overlay exposes + stable button targets, so run a11y on `#graph` normally. + +These are **measured-locally** checks, not new CI gates (Lighthouse + trace timing is +runner-dependent — the same reason `perf.spec.ts` gates only CLS). A ❌ here feeds Step +4's loop like any other finding. + ## Step 3 — (if the change touches core data shape) smoke the live contract If the change reads a different `/v1` field/shape, the mocked e2e suite can't @@ -102,5 +146,6 @@ Skip when the change is purely presentational. ## Step 4 — Close the loop Any ❌ → fix the source, re-run the affected gate, re-verify the route. Only -report the UI change done once behavior + rubric pass clean in both themes. This -skill is Step 5 of the `dikw-web-delivery-workflow`. +report the UI change done once behavior (Step 1), the rubric (Step 2), and the +measured perf + a11y pass (Step 2.5) are clean in both themes. This skill is Step 5 +of the `dikw-web-delivery-workflow`. diff --git a/CHANGELOG.md b/CHANGELOG.md index 57ee876..db18ffc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,21 @@ file format introduced in `[0.0.1.0]` was dropped. ## [Unreleased] +## [0.8.8] - 2026-06-29 + +### Changed + +- **`dikw-web-verify-frontend` gains a measured perf + a11y pass (Step 2.5).** The + frontend-verify skill previously eyeballed the `docs/ui-checklist.md` a11y / contrast + / perf items. It now uses the already-installed `chrome-devtools-mcp`: `lighthouse_audit` + for **accessibility + best-practices** (the tool excludes performance), plus a + `performance_start_trace`/`stop_trace` for **Web Vitals**, scored to a rubric — **a11y + ≥ 0.9** with no new violation, **CLS ≤ 0.1** (cross-checking the `perf.spec.ts` gate), + **LCP** recorded as a soft budget — turning the qualitative items into numbers. + Locally-measured, not a new CI gate (Lighthouse + trace timing is runner-dependent). + The `#graph` Pixi route audits a11y normally but skips the background-tab perf trace. + See `docs/adr/0005-delivery-loop-hardening.md`. + ## [0.8.7] - 2026-06-29 ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 5dd68fb..a08443c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -70,7 +70,7 @@ End-to-end loop from request to landed PR. Run autonomously for behavior changes - 3.1 Run `/codex:review --background` for an independent review pass. - 3.2 Evaluate the findings, decide which are valid, and fix. 4. **Final pass.** Run `/code-review`, scored against `docs/review-rubric.md` (the project-specific principles), and resolve every finding before continuing. -5. **Verify in the browser.** For UI changes, invoke the `dikw-web-verify-frontend` skill: navigate the changed routes via Chrome MCP, confirm a clean runtime console on real data, exercise the affected interactions, and run the `docs/ui-checklist.md` rubric in light + dark — confirm the change actually rendered as intended, not just that unit tests pass. +5. **Verify in the browser.** For UI changes, invoke the `dikw-web-verify-frontend` skill: navigate the changed routes via Chrome MCP, confirm a clean runtime console on real data, exercise the affected interactions, run the `docs/ui-checklist.md` rubric in light + dark, and run the **measured perf + a11y pass** (Step 2.5 — Chrome DevTools MCP: `lighthouse_audit` for accessibility ≥ 0.9 with no new violation, and a `performance_start_trace`/`stop_trace` for CLS ≤ 0.1 + LCP) for the changed route — confirm the change actually rendered as intended, not just that unit tests pass. 6. **Update markdown docs.** Walk `CLAUDE.md`, `README.md`, and the relevant `docs/*.md` against the diff; any contract, behavior, command, or doc index that drifted must be updated in the same change. Don't leave docs to "catch up later". 7. **Create the PR.** Branch with a descriptive name, commit with `(): ` matching the project's existing convention (see recent `git log`), push, then `gh pr create`. CI auto-runs lint + format:check + typecheck + coverage + build + e2e + bundle budget + the `gate-integrity` reward-hacking gate (`check:gate`) + security scans (npm audit, gitleaks, Trivy, CodeQL). Bump `package.json.version` manually (standard 3-digit SemVer) when the change warrants it, and add an entry to `CHANGELOG.md` under the matching version heading. On merge to `main`, CI's `release` job auto-cuts a GitHub Release tagged `dikw-web-v` from `package.json.version` (idempotent — only a version bump creates a new tag; notes come from the matching CHANGELOG section via `scripts/changelog-notes.mjs`), so a deliberate version bump is what publishes a release. 8. **Monitor CI and PR comments; resolve as they surface, then merge.** After pushing, actively watch both signals — don't passively wait, and don't batch resolution to merge time. diff --git a/docs/adr/0005-delivery-loop-hardening.md b/docs/adr/0005-delivery-loop-hardening.md index 56e670d..00eca49 100644 --- a/docs/adr/0005-delivery-loop-hardening.md +++ b/docs/adr/0005-delivery-loop-hardening.md @@ -86,3 +86,30 @@ state protocol, container `--network none` isolation, per-token cost accounting out of scope. dikw-web's loop is interactive Claude Code with worktree isolation already available for background jobs and no destructive operations; that machinery would add complexity without proportional value here. + +--- + +## Item 2 — trustworthy green signal (flaky e2e) + +Resolved outside this effort: the flaky `graph.spec.ts > renders a nonblank Pixi graph +canvas` was root-cause-fixed in PR #140 (attach the Pixi canvas only inside the +effect's `active` guard; the spec gates on `data-render-count >= 1`). `main` +deliberately keeps `retries: 2` as a *general* backstop for timing-sensitive specs +(no longer Pixi-specific), so the gate's `e2e-retries-raised` check guards that +decision without forcing it to 1. + +## Item 3 — measured perf + a11y in `verify-frontend` + +The `dikw-web-verify-frontend` skill verified real-browser behavior + a clean console, +then eyeballed the `docs/ui-checklist.md` a11y/contrast/perf items. Following Delba +Oliveira's feedback-loops note ("many checks have criteria Claude can measure against: +a performance budget, an accessibility checklist"), **Step 2.5** now uses the +already-installed `chrome-devtools-mcp` against the changed route: `lighthouse_audit` for +**accessibility + best-practices** (the tool deliberately excludes performance), plus a +`performance_start_trace`/`stop_trace` for **Web Vitals**. Scored to a rubric: **a11y ≥ +0.9** with no new violation (Lighthouse), **CLS ≤ 0.1** (cross-checking the +`perf.spec.ts` gate) and **LCP** as a soft budget (both from the trace). Kept a +**locally-measured** step, not a new CI gate — Lighthouse + trace timing is +runner-dependent, the same reason `perf.spec.ts` hard-gates only CLS. The `#graph` Pixi route audits a11y normally but skips the perf trace in a +background tab (stalled `requestAnimationFrame`), mirroring the skill's existing +Chrome-MCP caveat. diff --git a/docs/ui-checklist.md b/docs/ui-checklist.md index dd5282b..2abdeb4 100644 --- a/docs/ui-checklist.md +++ b/docs/ui-checklist.md @@ -63,9 +63,28 @@ no e2e are the ones the manual pass exists for. - [ ] **Contrast.** Normal article text ≥ 4.5:1; large headings ≥ 3:1; metadata/control text ≥ 3:1 against their background. _e2e: `theme.spec.ts` computes these — re-run it for reader changes rather than eyeballing._ + _measured: also covered by the `lighthouse_audit` accessibility category in + `dikw-web-verify-frontend` Step 2.5 (DevTools MCP), which scores contrast + + labels + roles for the changed route._ - [ ] **No console errors** in either theme (the e2e console gate covers mocked flows; the manual pass covers real-data rendering). See `tests/e2e/harness.ts`. +## Measured perf + a11y (DevTools MCP) + +> These back the eyeballed items above with numbers. Run via +> `dikw-web-verify-frontend` **Step 2.5** for the route the diff touched, with two +> `chrome-devtools-mcp` tools: `lighthouse_audit` for **a11y** (it excludes performance) +> and a `performance_start_trace`/`stop_trace` for **Web Vitals**. Measured locally, not +> a CI gate (Lighthouse + trace timing is runner-dependent — the same reason +> `perf.spec.ts` gates only CLS). + +- [ ] **Accessibility score ≥ 0.9** for the changed route, with **no new violation** + vs `main` (`lighthouse_audit`). A dropped score is a fail — fix and re-audit. +- [ ] **CLS ≤ 0.1** on the changed route (performance trace). _e2e: `perf.spec.ts` gates + this on the primary routes; the trace here shows *which* element shifted._ +- [ ] **LCP** recorded from the trace; flag a clear regression vs `main` (soft budget — + annotated, not hard-gated, in `perf.spec.ts`). + ## Graph (`#graph`) > Do **not** verify the Pixi canvas through Chrome MCP — a background MCP tab diff --git a/package.json b/package.json index 1bd5529..2db9578 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "dikw-web", - "version": "0.8.7", + "version": "0.8.8", "private": true, "type": "module", "engines": {