From 2df9faa4d0f3975643819ecbdce6db2b9e6ba0c6 Mon Sep 17 00:00:00 2001 From: Pulkit Pareek Date: Wed, 13 May 2026 10:33:25 +0530 Subject: [PATCH 1/5] ADR-0004: split governance docs into pulkitpareek18/ZeroAuth-Governance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Day 1's B06 was skipped in favor of keeping governance inline. On Day 3 of Week 1 we revisited because (a) the DPDP §8(7) breach-notification procedure was unwritten and that's a regulatory-teeth gap, not a hygiene one, (b) compliance mappings need an auditor-friendly surface separate from the TypeScript repo, and (c) component threat-models for Week 2+ need a stable canonical URL before the verifier ships. Created pulkitpareek18/ZeroAuth-Governance with the full B06 structure: shared policy, canonical threat model, compliance mappings, ADR index, release coordination, evidence-pack source checksums, CODEOWNERS with a two-reviewer rule on /docs/shared/ and /docs/compliance/. This repo's docs/threat_model.md is on a deprecation path; the canonical in the governance repo was synced from it on 2026-05-13 and is now authoritative. Co-Authored-By: Claude Opus 4.7 (1M context) --- adr/0004-governance-in-separate-repo.md | 91 +++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 adr/0004-governance-in-separate-repo.md diff --git a/adr/0004-governance-in-separate-repo.md b/adr/0004-governance-in-separate-repo.md new file mode 100644 index 0000000..1d35b63 --- /dev/null +++ b/adr/0004-governance-in-separate-repo.md @@ -0,0 +1,91 @@ +# ADR-0004 — Split governance docs into a separate repo (`pulkitpareek18/ZeroAuth-Governance`) + +## Status + +Accepted + +## Context + +The dev suite's B06 build prompt (`04_development_suite/02_claude_code_dev/build_prompts/B06_governance_repo_bootstrap.md`) calls for a separate `zeroauth-governance` repo — "the first repo to bootstrap; everything else links to this." It would hold: + +- A shared security policy that every product repo's `CLAUDE.md` links to +- The canonical cross-repo threat model +- DPDP / IRDAI / RBI / MeitY compliance mappings +- An ADR index across all repos +- A release coordination matrix +- Evidence-pack source checksums + +Through Day 2 of Week 1 we operated with governance content embedded inside this API repo: `CLAUDE.md` (constitution), `docs/threat_model.md`, `docs/api_contract.md`, `docs/error_codes.md`, and three ADRs in `adr/`. That covered most of B01's quality bar but explicitly skipped B06. + +On Day 3, we re-examined the decision and decided to execute B06 properly. + +The reasons we revisited: + +1. **The DPDP §8(7) breach-notification procedure was unwritten.** No document anywhere named which lawyer gets called, in what time window, with what information. That's a legal-teeth gap, not a hygiene gap. It has to land somewhere; writing it in a code repo would mix legal blast radius with engineering blast radius. +2. **Compliance mappings have multiple-regulator scope.** A DPDP / IRDAI / RBI / MeitY mapping is read by auditors and a buyer's security team. Forcing them to clone a TypeScript repo to find it is friction at exactly the wrong moment in a pilot conversation. +3. **The canonical threat model needs a stable URL** before repo #2 (verifier, B02, Week 2) exists. If the verifier's component threat model points at `pulkitpareek18/ZeroAuth/docs/threat_model.md`, the link rots the moment we split the verifier; if it points at a governance repo, the URL is stable forever. +4. **Two-reviewer enforcement is easier with a dedicated repo.** Path-globbed CODEOWNERS in a code repo gets bypassed under deadline pressure ("just merge the policy change inline, fix it later"). A standalone repo where every PR is *by definition* a policy change makes the discipline mechanical. + +## Decision + +Create `pulkitpareek18/ZeroAuth-Governance` as a separate public GitHub repo with the structure from `governance_CLAUDE.md`: + +- `docs/shared/{security-policy, coding-standards, naming-conventions, incident-response, breach-notification}.md` +- `docs/threat-model/{canonical, api, verifier, iot, sdk, dashboard}.md` +- `docs/compliance/{dpdp, irdai, rbi, meity}-mapping.md` + `audit-format.md` +- `adr-index/ALL.md` +- `release-coordination/matrix.md` + `changelogs/` +- `evidence-pack-sources/{CHECKSUMS, RELEASES}.md` +- `CODEOWNERS` (two-reviewer rule on `/docs/shared/` and `/docs/compliance/`) +- `.github/workflows/lint.yml` (markdownlint + link-check on every PR) + +The repo is **public**, CC-BY-4.0 licensed — same posture as the main `ZeroAuth` repo. The audit story benefits from open visibility. + +This repo (`pulkitpareek18/ZeroAuth`) keeps: + +- `CLAUDE.md` — the constitution for this repo, links to the canonical shared docs +- `docs/api_contract.md` — API-specific contract (won't move) +- `docs/error_codes.md` — API-specific (won't move) +- `docs/threat_model.md` — **deprecated** in favor of `docs/threat-model/canonical.md` in the governance repo. We keep the file for now with a header pointing at the canonical, until Week 2 when we remove it entirely. +- `adr/` — local ADRs. The governance repo's `adr-index/ALL.md` is the cross-repo index pointing here. + +## Consequences + +- **Positive — DPDP §8(7) procedure now exists.** Written down, with named counsel contacts (TODO entries where contacts aren't confirmed yet). Drillable. Reviewable. +- **Positive — auditor-friendly surface.** A buyer's security team can clone one repo and read every policy without slogging through TypeScript. The W08 evidence-pack assembler from the operational suite reads from `evidence-pack-sources/CHECKSUMS.md` cleanly. +- **Positive — stable URLs across the 8-week build.** When B02 (verifier, Week 2), B03 (IoT, Week 3), B04 (SDK, Week 5) split out, they all link to `github.com/pulkitpareek18/ZeroAuth-Governance/blob/main/docs/threat-model/canonical.md` — that URL doesn't move. +- **Positive — two-reviewer rule is mechanical.** CODEOWNERS in the governance repo names both Pulkit and Amit on `/docs/shared/` and `/docs/compliance/`. Counsel review is enforced manually (counsel doesn't have GitHub access) by a note in the PR description before merge. +- **Negative — two repos to clone on a fresh dev machine.** Mitigated: `scripts/setup-dev.sh` (TODO) will clone both side by side. +- **Negative — cross-repo links rot more easily than same-repo links.** Mitigated by `markdown-link-check` CI on every PR in both repos. +- **Negative — context switch when authoring a policy change that's tied to a code change.** Engineer has to open two PRs and link them. Acceptable cost — the discipline is the point. +- **Neutral — `docs/threat_model.md` in this repo is in deprecation limbo.** It's still the most current text today; the governance repo's `canonical.md` was synced from it on 2026-05-13. By the end of Week 2, the canonical is authoritative and the file in this repo becomes a 1-line pointer. + +## Alternatives considered + +- **Option A — Stay collapsed.** Keep one repo, enforce two-reviewer via CODEOWNERS on path globs. **Rejected** because: (1) the DPDP §8(7) procedure deserves its own surface; (2) the audit story is materially weaker; (3) cross-repo link stability becomes a problem the moment B02 ships. +- **Option C — Split only the regulator-facing pieces** (breach-notification + compliance) and keep security-policy / coding-standards / threat-model inline. **Rejected** because: a buyer's security team expects everything in one place. Splitting the policy surface in two creates confusion about which one is authoritative. +- **Option D — Submodule the governance dir into product repos.** **Rejected** universally and on first principles — submodules are hated for good reason. +- **Option E — Stay collapsed forever, accept the discipline gap.** **Rejected** — DPDP §8(7) is a regulatory requirement, not a discipline gap. + +## Cost of the change + +- One-time setup: ~3 hours (this session — Wed May 13 2026) +- Per-policy-PR friction: estimated ~5 minutes extra (clone the governance repo, work there, link the PR back to the code PR if relevant) +- CI cost: trivial (markdownlint + link-check, no Node compile / Jest) + +## Exit ramps (when to consolidate back, if ever) + +The governance repo doesn't get folded back into the API repo. The split is monotonic — once separated, stays separated. If something ever justifies re-collapsing, that's a new ADR superseding this one. + +## References + +- B06 build prompt: `zeroauth_prompt_suite/04_development_suite/02_claude_code_dev/build_prompts/B06_governance_repo_bootstrap.md` +- Governance constitution: `zeroauth_prompt_suite/04_development_suite/02_claude_code_dev/CLAUDE_md/governance_CLAUDE.md` +- New repo: +- Canonical threat model (new home): +- Brainstorm session on Day 3 (Wed May 13 2026) weighing collapsed vs separate repo: this conversation + +--- + +LAST_UPDATED: 2026-05-13 +OWNER: Pulkit Pareek From 876fac37612f44756fa0af69d8314ee53fdf55d8 Mon Sep 17 00:00:00 2001 From: Pulkit Pareek Date: Wed, 13 May 2026 11:27:16 +0530 Subject: [PATCH 2/5] =?UTF-8?q?Seed=20qa-log/=20with=20DW01=20cadence=20?= =?UTF-8?q?=E2=80=94=20first=20dated=20entry=20+=20format?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dev brainstorm's DW01 cadence prompt fires twice weekly (Tue + Thu 09:55 IST) and asks the engineer to run the four-demo battery (printed-photo rejection, airplane mode, three-different-hashes, hand-the-phone) and record results in /qa-log/YYYY-MM-DD.md. The cadence had never been wired up; today seeds it. None of the four demos can run today — the IoT firmware (B03 Week 3), mobile SDK (B04 Week 5), liveness detection (B13 Week 3/5), offline queue (B14 Week 4), and LSH bucket protocol (B10 Week 3+) all unbuilt. The seed entry honestly records every demo as `Blocked` rather than faking pass entries (the brainstorm's whole point is that the cadence catches missing work — faking it would defeat the purpose). Surrogate smokes against components that DO exist today: - API smoke against https://zeroauth.dev/v1/* — all 200 - Dashboard reachability /dashboard/{login,signup,overview} — all 200 - Playwright happy-path E2E — Green in CI on commit 0d1741d - Jest + Vitest unit suites — 82 tests passing Surrogate green does not lift HOLD on buyer-facing demo URLs. HOLD stays in place until Demo 1–4 actually run Green, expected around Week 5 EOD when B03/B04/B13/B14 all land. Files added: - qa-log/README.md — format spec, the four demos, the Blocked-period surrogate convention, the cadence - qa-log/STATUS.md — current rollup (HOLD, with reason) - qa-log/LATEST.md — pointer to the most recent dated entry - qa-log/2026-05-13.md — the seed entry, today's run The cadence target for Thursday 2026-05-14 is 09:55 IST. Today's entry went up at ~11:30 IST because the cadence wasn't ready until task 3 of today's EOD list got executed. Co-Authored-By: Claude Opus 4.7 (1M context) --- qa-log/2026-05-13.md | 99 ++++++++++++++++++++++++++++++++++++++++++++ qa-log/LATEST.md | 9 ++++ qa-log/README.md | 95 ++++++++++++++++++++++++++++++++++++++++++ qa-log/STATUS.md | 37 +++++++++++++++++ 4 files changed, 240 insertions(+) create mode 100644 qa-log/2026-05-13.md create mode 100644 qa-log/LATEST.md create mode 100644 qa-log/README.md create mode 100644 qa-log/STATUS.md diff --git a/qa-log/2026-05-13.md b/qa-log/2026-05-13.md new file mode 100644 index 0000000..b0b826e --- /dev/null +++ b/qa-log/2026-05-13.md @@ -0,0 +1,99 @@ +# QA Log — 2026-05-13 + +**Run by:** Pulkit Pareek (alone — Amit not in this rehearsal) +**Time:** 11:30 IST (the canonical slot is 09:55 Tue/Thu; this is the seed entry for Wed because the cadence had not been wired yet) +**Build:** + +- API (`pulkitpareek18/ZeroAuth`): `2df9faa` on `dev` (production is `0d1741d` on `main`) +- Governance (`pulkitpareek18/ZeroAuth-Governance`): `bad10e7` on `main` +- IoT firmware: **not built** (B03 — Week 3) +- Mobile SDK: **not built** (B04 — Week 5) +- Liveness detection: **not built** (B13 — Week 3 / Week 5) +- Offline queue: **not built** (B14 — Week 4) + +## Results — four-demo battery + +### Demo 1 — Printed photo rejection + +**Status:** Blocked +**Note:** No IoT terminal hardware (Orange Pi 5 + Astra Pro Plus not ordered). No liveness detection code (B13 unbuilt). Unblocks when B03 + B13 ship in Week 3. + +### Demo 2 — Airplane mode authentication + +**Status:** Blocked +**Note:** No IoT firmware + no offline queue. Unblocks when B14 ships in Week 4. + +### Demo 3 — Three-different-hashes for the same identity + +**Status:** Blocked +**Note:** Three-mode LSH bucket protocol (B10) unbuilt. Unblocks when B10 ships (Week 3+). + +### Demo 4 — Hand-the-phone (impostor) + +**Status:** Blocked +**Note:** No mobile SDK + no on-device liveness. Unblocks when B04 + B13 ship in Week 5. + +## Surrogate smoke (while battery is Blocked) + +Substituting with smoke tests against components that *do* exist today. These are NOT a substitute for the battery — they cover a different surface (the central API + dashboard, not the IoT terminal + mobile SDK) — but they prove that something is being smoked twice a week during the Blocked period. + +### S-1 — API reachability against production (`https://zeroauth.dev`) + +**Status:** Green +**Method:** `curl` with `Authorization: Bearer za_live_…` (the live default key for tenant `2c648045-e32c-4943-9629-7ef9206aaac2`). + +| Endpoint | HTTP code | Notes | +|---|---|---| +| `GET /v1/audit` | 200 | Returns the tenant's audit events | +| `GET /v1/devices` | 200 | Returns `{"devices":[],"environment":"live"}` (fresh tenant) | +| `GET /v1/users` | 200 | Returns `{"users":[],"environment":"live"}` | +| `GET /v1/verifications` | 200 | Returns empty array | +| `GET /v1/attendance` | 200 | Returns empty array | +| `GET /api/health` | 200 | App + DB + Redis all healthy | + +### S-2 — Dashboard reachability + +**Status:** Green +**Method:** `curl -sS -o /dev/null -w "%{http_code}"` against the three SPA entry points. + +| URL | HTTP code | +|---|---| +| `/dashboard/login` | 200 | +| `/dashboard/signup` | 200 | +| `/dashboard/overview` | 200 | + +(Note: `/dashboard/overview` returns 200 because the SPA shell renders before the `RequireAuth` guard redirects unauthenticated users client-side. Server-side this is a static-asset response. Real auth-flow coverage is in S-3.) + +### S-3 — Playwright happy-path E2E + +**Status:** Green (most recently in CI on commit `0d1741d` at 12:09 UTC on 2026-05-12; not re-run today) +**Method:** `cd dashboard && npm run e2e` against ephemeral Postgres in CI. +**Spec:** `dashboard/e2e/happy-path.spec.ts` — signup → first-key reveal → overview → mint a second key → register a device → see audit events → sign out. +**Note:** Local re-run not performed today (would have taken ~3 min and required a clean Postgres). The CI run from yesterday's PR #25 merge is authoritative. + +### S-4 — Unit + integration suites + +**Status:** Green +**Method:** CI `validate` job on PR #25 (commit `0d1741d`). +**Result:** All Jest tests passing (50). All Vitest dashboard tests passing (32). 82 total. + +## Rollup + +**Overall:** **HOLD** + +The HOLD is unambiguous and expected: it stays HOLD until the IoT firmware and mobile SDK exist. This is the baseline state of the cadence for Weeks 1–5. The surrogate smokes are an honesty signal that engineering is alive and producing testable artifacts, not a substitute for clearing demo URLs to buyers. + +## Escalations + +None today — every Blocked demo is blocked on planned work (B03/B04/B10/B13/B14), not on a regression. The blocking work items are tracked in `pulkitpareek18/ZeroAuth-Governance: adr-index/ALL.md` open list (ADR-0008 onwards) and in the dev brainstorm's 8-week build order. + +## Operator notes (out-of-band observations, not part of the formal rollup) + +- The fact that `/api/health` returned 200 is a real signal. The VPS-level smoke proves the live deployment is healthy on Day 3. +- This is the first entry in the QA log. The format is provisional — adjust after the next 2–3 entries once the cadence has shaken out. Friday's W05 review should explicitly call out whether the surrogate-smoke section is useful or noise. +- DW01 specifies 09:55 IST as the canonical slot. Today's entry is at 11:30 IST because the cadence wasn't wired yet. Starting Thursday (2026-05-14), aim for 09:55. +- The brainstorm puts `/qa-log/` inside `zeroauth-iot/`. That repo doesn't exist yet, so the log lives in `pulkitpareek18/ZeroAuth: qa-log/` for now. When B03 ships and `zeroauth-iot` materializes, decide whether to migrate the directory or keep it in the API repo and link from IoT. + +--- + +LAST_UPDATED: 2026-05-13 diff --git a/qa-log/LATEST.md b/qa-log/LATEST.md new file mode 100644 index 0000000..b73995b --- /dev/null +++ b/qa-log/LATEST.md @@ -0,0 +1,9 @@ +# Latest QA Run + +→ [`2026-05-13.md`](2026-05-13.md) + +**Rollup:** HOLD (every demo Blocked; surrogate smokes green) +**Date:** 2026-05-13 +**Next run:** Thursday 2026-05-14 at 09:55 IST + +(This file is overwritten on every run. For history, see the dated files in this directory.) diff --git a/qa-log/README.md b/qa-log/README.md new file mode 100644 index 0000000..00a3b6e --- /dev/null +++ b/qa-log/README.md @@ -0,0 +1,95 @@ +# QA Log + +Twice-weekly engineering QA records, written every Tuesday + Thursday at 9:55am IST per the dev brainstorm's DW01 cadence prompt (`zeroauth_prompt_suite/04_development_suite/03_cowork_dev/DW01_demo_battery.md`). Records are append-only — once a dated file lands here it does not get edited later (corrections go in the next day's file with a back-reference). + +## What it is + +The four-demo battery is ZeroAuth's smoke test before any buyer-facing demo URL is shared. The four demos: + +| # | Demo | Pass criterion | +|---|---|---| +| 1 | **Printed photo rejection** | Hold a printed photo up to the IoT terminal. Reject within 2 seconds. | +| 2 | **Airplane mode authentication** | Set device to airplane mode. Authenticate. UI shows "Authenticated (offline)" + on-device audit ID. | +| 3 | **Three-different-hashes** | Authenticate three times with the same fingerprint. The three on-screen hashes are visibly different. | +| 4 | **Hand-the-phone (impostor)** | Hand the device to a different person who attempts to authenticate as Pulkit. Authentication fails. | + +Each demo records: **Green** (pass) / **Yellow** (passes but with caveats) / **Red** (fail) + a one-sentence note. + +## Files + +- `README.md` — this file +- `STATUS.md` — current rollup: `GREEN` / `YELLOW` / `HOLD`. Updated after every run. `HOLD` means: do not share new buyer-facing demo URLs until the next Green run. +- `LATEST.md` — one-line pointer to the most recent dated entry +- `YYYY-MM-DD.md` — one file per run + +## Format of a dated entry + +```text +# QA Log — YYYY-MM-DD + +**Run by:** +**Build:** +- API: +- IoT firmware: +- Mobile SDK: + +## Results + +### Demo 1 — Printed photo rejection +**Status:** Green | Yellow | Red | Blocked +**Note:** + +### Demo 2 — Airplane mode authentication +**Status:** ... +**Note:** ... + +### Demo 3 — Three-different-hashes +**Status:** ... +**Note:** ... + +### Demo 4 — Hand-the-phone +**Status:** ... +**Note:** ... + +## Rollup +**Overall:** GREEN | YELLOW | HOLD + +## Escalations + +``` + +## Blocked status + +Until the IoT firmware (B03), mobile SDK (B04), liveness detection (B13), offline queue (B14), and demo wrappers (B15–B18) ship, the four demos cannot be run. The entry status during this period is `Blocked` per demo, and the rollup is `HOLD`. **The cadence still fires.** A `Blocked` log entry is more honest than no entry — it documents that the discipline is alive and what's gating it. + +When B03/B04/B13/B14 ship, the format remains identical; the `Blocked` statuses transition to Green/Yellow/Red on the next run. + +## Surrogate smoke during the Blocked period + +While the four-demo battery can't run, we substitute with smoke tests against the components that *do* exist today: + +- **API smoke** — `curl` the live `/v1/audit`, `/v1/devices`, `/v1/users`, `/v1/verifications` endpoints with a test API key. Expected: all 200. +- **Dashboard smoke** — load `https://zeroauth.dev/dashboard/login`, log in with a known test tenant, navigate every page. +- **E2E happy path** — run `cd dashboard && npm run e2e` (the Playwright spec at `dashboard/e2e/happy-path.spec.ts`). + +These appear in dated entries under a "Surrogate smoke (while battery is Blocked)" heading. They are NOT a substitute for the battery — they cover a different surface — but they establish that *something* is being smoked twice a week. + +## How to run + +Today (during Blocked period): + +1. Open the most recent `YYYY-MM-DD.md` file. Copy its template. +2. Rename to today's date. Update the Build block (`git rev-parse --short HEAD` for the API). +3. Run the surrogate smokes. Record results. +4. For each of the four demos, record `Blocked` with the blocking work item. +5. Update `STATUS.md` (will stay `HOLD` during Blocked period). +6. Update `LATEST.md` pointer. +7. Commit: `git add qa-log/ && git commit -m "QA log — YYYY-MM-DD (Blocked + surrogate smoke green)"`. + +Once the demos are runnable: follow the same steps but record real Green/Yellow/Red against demos 1–4 instead of `Blocked`. + +## Chain hooks + +- DW10 (engineering Friday annex) summarises the week's QA log into the W05 packet +- W05 (Friday review packet) reads DW10's annex +- Buyer-facing demo URLs check `STATUS.md` freshness before being shared diff --git a/qa-log/STATUS.md b/qa-log/STATUS.md new file mode 100644 index 0000000..52de81c --- /dev/null +++ b/qa-log/STATUS.md @@ -0,0 +1,37 @@ +# QA Battery — Current Status + +**Status:** **HOLD** + +**Last updated:** 2026-05-13 (after the seed entry) +**Last run:** [`2026-05-13.md`](2026-05-13.md) +**Next scheduled run:** Thursday 2026-05-14 at 09:55 IST (per DW01 cadence) + +## Why HOLD + +All four demos are currently `Blocked` because their underlying components do not exist yet (IoT firmware = B03 Week 3, mobile SDK = B04 Week 5, liveness = B13 Week 3/5, offline queue = B14 Week 4, LSH protocol = B10 Week 3+). + +**HOLD means:** do not share new buyer-facing demo URLs until the next `GREEN` run. + +This is the expected baseline of the QA log during Weeks 1–5 of the 8-week build sprint. HOLD here is not a regression signal — it's the honest representation of "the demo battery cannot run yet." + +## When HOLD lifts + +HOLD lifts to `GREEN` when: + +1. B03 + B13 ship → Demos 1, 3, 4 become runnable on mock hardware +2. B14 ships → Demo 2 becomes runnable +3. B04 ships → Demo 4 fully runnable on real mobile +4. All four demos pass on a single run + +Target: Week 5 EOD per the 8-week build order (`zeroauth_prompt_suite/04_development_suite/00_dev_brainstorm/01_dev_brainstorm.md` part 4). + +## Surrogate smoke status (today) + +While battery is HOLD, surrogate smokes against the components that *do* exist: + +- API smoke against `https://zeroauth.dev/v1/*`: **Green** (today) +- Dashboard reachability `/dashboard/{login,signup,overview}`: **Green** (today) +- Playwright happy-path E2E: **Green** (last CI run on commit `0d1741d`, 2026-05-12) +- Jest + Vitest unit suites: **Green** (last CI run on commit `0d1741d`, 2026-05-12) + +Surrogate green does NOT lift the HOLD on demo URLs. It only signals that "engineering is healthy" for the W05 weekly review. From b263dd50f2946858ae1aa502290fee441d3cff44 Mon Sep 17 00:00:00 2001 From: Pulkit Pareek Date: Wed, 13 May 2026 11:38:23 +0530 Subject: [PATCH 3/5] =?UTF-8?q?Retroactive=20security=20review=20of=20PR?= =?UTF-8?q?=20#22=20=E2=80=94=203=20Medium=20/=203=20Low=20/=201=20Info?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #22 (merged as 0c325fb, live at 0d1741d) touched all four security-reviewer trigger surfaces — auth, crypto, audit, tenant boundaries — and merged without the subagent running. CLAUDE.md mandates the subagent on any change to these surfaces. Day 3 discipline-debt clearance. Subagent (acdae2de12c322caa) reviewed the diff 69fd27e..0c325fb. Net risk: Medium. No Critical. No tenant API key rotation needed. Mediums to land this week: - F-1: console JWT in localStorage; docs/threat_model.md A-09 claims "client memory" — reconcile docs to code or migrate to httpOnly cookies. - F-2: email enumeration via 409 on /api/console/signup — return uniform 202 + send verification email out-of-band. - F-3: console-initiated audit rows show actor_type='api_key' with actor_id=NULL because the new console handlers don't plumb the operator email into recordAuditEvent. Forensic gap, not exploit. Lows (F-4 per-tenant write limit, F-5 jti+aud, F-6 limit validation) and the Info (F-7 machine code mixed with human strings) tracked together in issue #26 — Pulkit splits into per-fix PRs as he gets to them. Things checked + clean: tenant scoping (A-01 holds), tenant inference from body silently ignored (A-10 holds), no dangerouslySetInnerHTML anywhere in dashboard, no plaintext secrets in log lines, JWT never in URLs, Helmet CSP + trust proxy correct behind Caddy. Co-Authored-By: Claude Opus 4.7 (1M context) --- qa-log/security-review-pr22.md | 105 +++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 qa-log/security-review-pr22.md diff --git a/qa-log/security-review-pr22.md b/qa-log/security-review-pr22.md new file mode 100644 index 0000000..b58706e --- /dev/null +++ b/qa-log/security-review-pr22.md @@ -0,0 +1,105 @@ +# Security review — PR #22 (retroactive) + +**Reviewer:** `security-reviewer` subagent (agentId `acdae2de12c322caa`) +**Date:** 2026-05-13 +**Diff range:** `git diff 69fd27e..0c325fb` (8,652 lines across 46 files) +**Production state at review time:** `main` @ `0d1741d` (live on `https://zeroauth.dev`); `dev` @ `876fac3` +**Reason for retroactive review:** PR #22 touched all four security-reviewer trigger surfaces (auth, crypto, audit, tenant boundaries) and merged without the subagent running. CLAUDE.md mandates the subagent on any change to these surfaces — discipline-debt clearance, not a pre-merge gate. Tracking issue: [#26](https://github.com/pulkitpareek18/ZeroAuth/issues/26). + +--- + +## Summary + +Net risk is **Medium**. Tenant scoping is correctly enforced — the PR's most load-bearing security property holds, and there's even a regression test for A-10. The actual issues are (a) a documented-vs-implemented drift on JWT storage that materially changes the XSS blast radius, (b) email enumeration on signup, and (c) missing rate-limit + actor-attribution on the new authenticated write paths. **No Critical. No need to rotate keys.** + +## Findings + +### F-1 — Console JWT stored in localStorage; threat model A-09 claims "client memory" + +- **Severity:** Medium +- **Threat-model mapping:** A-09 (drift) +- **Location:** `dashboard/src/lib/api.ts:14,29-46`; `docs/threat_model.md:105` +- **Description:** `api.ts` persists the console JWT to `localStorage['zeroauth.console_token']`, but `docs/threat_model.md` A-09 explicitly asserts the token "lives in client memory and is replayed on every API call." That's wrong as shipped. localStorage is readable by any script with execution capability on the SPA origin (CSP-bypassing extensions, future supply-chain compromise of a dashboard dep, future innerHTML mistake). Mitigation (d) "short-lived (24h)" is the only thing standing between a one-shot XSS and 24h of cross-tenant API access. The governance repo's `docs/threat-model/dashboard.md` documents the localStorage choice but the API repo's `docs/threat_model.md` doesn't reflect that — pick one and reconcile. +- **Reproduction:** Open the live dashboard, sign in, run `localStorage.getItem('zeroauth.console_token')` in DevTools — returns the JWT. +- **Recommended remediation:** Either (a) move the JWT to an HttpOnly, SameSite=Strict, Secure cookie set by `/api/console/login`+`/signup` and rely on cookie auth for `/api/console/*`, or (b) keep localStorage but update `docs/threat_model.md` A-09 to reflect reality, shorten the JWT to ~2h with silent refresh, and add a `jti` allow-list so logout server-side invalidates. Run the `threat-model-update` skill. +- **Verification after fix:** A-09 row matches the code; logout from tab A invalidates the token in tab B; the jest suite covers token revocation. + +### F-2 — Email enumeration on `/api/console/signup` + +- **Severity:** Medium +- **Threat-model mapping:** A-05 +- **Location:** `src/routes/console.ts:132-137` +- **Description:** The 409 `email_taken` distinguishes "registered" from "available" addresses. Combined with `authLimiter` capped at 10/15min/IP, a botnet can enumerate the tenant directory (high signal for spear-phishing and stuffed-credential targeting). A-05 calls this out but the implementation still emits the distinguishing 409. +- **Reproduction:** `curl -X POST .../api/console/signup -d '{"email":"target@bank.in","password":"AlsoLong1!"}'` — 409 reveals account presence vs 201 / 400. +- **Recommended remediation:** Return an opaque 202 ("If the email is new, a verification link has been sent") regardless of pre-existing account, then send the verification email on a worker. Interim: return a uniform 400 `invalid_request` and log the duplicate server-side. +- **Verification after fix:** Black-box test: response shape for `existing@x.com` is byte-identical to `fresh@x.com`. + +### F-3 — Audit log not written with console operator attribution + +- **Severity:** Medium +- **Threat-model mapping:** A-01 (forensic gap), audit-log completeness rule +- **Location:** `src/routes/console.ts:454-501, 522-575`; `src/services/platform.ts:154-164, 232-245, 292-302, 381-391` +- **Description:** Every state-changing console route (POST `/devices`, PATCH `/devices/:id`, POST `/users`, PATCH `/users/:id`) writes an audit row inside `platform.ts`, but with `actor_type: 'api_key'` and `actor_id: undefined` because the console routes never pass an `actorId`. Console actions therefore appear in `audit_events` as anonymous `api_key` operations with `actor_id IS NULL`. Strictly satisfies the constitution ("Never expose admin actions without an audit row") but the row is mislabelled and unattributable to the human operator email in the JWT — degrades A-01 forensics. +- **Reproduction:** Authenticate to the console, POST `/api/console/devices`, then `SELECT actor_type, actor_id FROM audit_events ORDER BY created_at DESC LIMIT 1;` — `('api_key', NULL)`. +- **Recommended remediation:** Add a 4th argument to console-side calls: `createDevice(tenantId, env, input, { actorType: 'console', actorEmail: req.console.email })`, and have `recordAuditEvent` accept and store the email in `metadata.actor_email`. The `actorType` enum already includes `'console'` (used by signup). +- **Verification after fix:** A jest test asserts that a POST `/api/console/devices` results in an audit row with `actor_type='console'` and `metadata.actor_email='dev@example.com'`. + +### F-4 — No per-tenant rate-limit on authenticated console write routes + +- **Severity:** Low +- **Threat-model mapping:** A-05 (extension) +- **Location:** `src/routes/console.ts:64-74, 254-310, 454-575` +- **Description:** Only `/signup` and `/login` carry the 10/15min limiter. POST `/keys`, POST `/devices`, POST `/users`, DELETE `/keys/:id` rely solely on the global 300/15min limiter from `src/app.ts:50`. A stolen JWT can mint 300 API keys before the global limiter throttles. The 10-active-keys-per-tenant guard on POST `/keys` helps but doesn't apply to `/devices` or `/users`. +- **Recommended remediation:** Add a per-tenant write limiter (e.g. 60 writes/15min keyed on `req.console.tenantId`) to the four mutating routes. + +### F-5 — Console JWT lacks `jti` and audience claim + +- **Severity:** Low +- **Threat-model mapping:** A-09 +- **Location:** `src/routes/console.ts:78-90` +- **Description:** Tokens have no `jti` and no `aud`. If a console session is suspected of compromise, the only mitigation is suspending the tenant outright. +- **Recommended remediation:** Add `jti: crypto.randomUUID()` and `aud: 'zeroauth-console'`, and verify `aud` in `verifyConsoleToken`. Track a small revoked-jti set in Redis (already wired into compose). + +### F-6 — `parseInt` on `?limit=` without guard + +- **Severity:** Low +- **Location:** `src/routes/console.ts:407, 442, 510, 585, 609` +- **Description:** `parseInt(String(req.query.limit), 10)` returns `NaN` for `?limit=abc`. `sanitizeLimit` in `platform.ts` likely catches it, but the routes should reject early with 400. +- **Recommended remediation:** Wrap parsing in `Number.isInteger(parsed) && parsed > 0 && parsed <= 1000`, else 400 `invalid_limit`. + +### F-7 — Error machine-code field carries human strings in 2 handlers + +- **Severity:** Informational +- **Location:** `src/routes/console.ts:122, 204` +- **Description:** Violates the `{ error: '', message: '' }` convention in CLAUDE.md. Other routes in the same file follow it correctly. + +## Things checked + clean (no finding) + +- **Tenant scoping:** every `pool.query` in `platform.ts` includes `tenant_id = $1` (and `environment = $2` where applicable). No string concatenation; all `pg` parameterised. **A-01 holds.** +- **Tenant inference from request body:** `console.ts` reads `tenantId` exclusively from `(req as any).console.tenantId`. The body's `tenantId` is silently ignored. The `tests/console-proxy.test.ts:101-110, 156-164` cases prove this. **A-10 holds.** +- **XSS sinks:** No `dangerouslySetInnerHTML`, `eval`, or `document.write` anywhere under `dashboard/src/`. +- **JWT in URLs:** No URL contains the JWT (always in `Authorization` header), so no Referer leakage. +- **Secret leakage in logs:** No plaintext password, JWT, or full API key in any `logger.*` call. Only `tenantId`, `email`, `keyPrefix`, `environment`. The signup response does include the full API key one time, which is intentional. +- **`jwt.verify` algorithm:** uses HS256 implicitly via jsonwebtoken's default + the `issuer` option constraint. Acceptable for now; will become a finding when RS256 lands. +- **Bcrypt timing:** `authenticateTenant` calls `verifyPassword` only on a matched row — there's a small timing oracle for "email exists vs not", aligns with F-2 rather than a separate finding. +- **Helmet + trust proxy:** CSP and `trust proxy 1` set correctly in `src/app.ts` for rate-limit IP keying behind Caddy. + +## Recommendations beyond findings + +1. Run `threat-model-update` skill to reconcile A-09 with the localStorage implementation, then file an ADR `0006-console-jwt-cookie-vs-localstorage.md` (numbering after counsel-engagement ADR-0005). +2. Add `actor_type='console'` + `actor_email` plumbing through `platform.ts`. Touches every audit-row writer; do as one commit. +3. Add Redis-backed `jti` revocation list — also unblocks "log out everywhere" UX. +4. While in `console.ts`, add zod via the `dep-add` skill; the manual `if (!field)` checks are now repeated 8 times. +5. Add a CSP `report-uri` (open item in threat model) so future localStorage-readers raise a signal. + +## Tests that would have caught real findings + +- **F-1:** `dashboard/src/lib/api.test.ts` should assert "no JWT is found in `localStorage` after `logout()`" — already passes, but extend with "no JWT is ever in `sessionStorage` either" once cookie migration happens. +- **F-2:** `tests/console-proxy.test.ts` should add: response body for signup with an existing email is byte-identical to signup with a fresh email (both 202). +- **F-3:** Add: POST `/api/console/devices` with JWT email `dev@example.com` produces audit row with `actor_type='console'` and `metadata.actor_email='dev@example.com'`. +- **F-4:** Add a rate-limit test for `/api/console/devices` POST (61st request → 429), guarded by `NODE_ENV !== 'test'`-aware fixture. +- **F-5:** Add: revoking a `jti` makes subsequent requests with that token return 401 `session_revoked`. + +--- + +LAST_UPDATED: 2026-05-13 From edfef730b1eb11ebfd3e158accfc42d0a24cbc33 Mon Sep 17 00:00:00 2001 From: Pulkit Pareek Date: Wed, 13 May 2026 12:52:57 +0530 Subject: [PATCH 4/5] =?UTF-8?q?Plan=20mode:=20B02=20verifier=20service=20s?= =?UTF-8?q?plit-out=20=E2=80=94=20design=20doc?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CLAUDE.md mandates plan mode for any change to src/services/zkp.ts. B02 is Week 2 Day 1 work; starting plan mode three days early on Day 3 of Week 1 so Thursday morning opens with a committed plan. The design doc lays out two paths: - Plan A — full B02: new pulkitpareek18/ZeroAuth-Verifier Rust repo with arkworks Groth16, axum HTTP shell, SQLite WAL append-only audit with hash chain, reproducible docker buildx. Recommended. ~3 days of work (Thu + Fri + Mon Week 2 morning if slips). - Plan B — TypeScript workspace inside the existing API repo: peel snarkjs into verifier/ with its own package.json. ~1 day. Lower security wins, faster delivery. - Plan C — defer B02 to Week 2 Day 1 as the brainstorm says; spend Thu/Fri closing PR #22 Mediums (issue #26) and W05 prep. The doc spells out the migration order for Plan A (Thursday scaffold + verifier-core + verify HTTP path; Friday audit log + hash chain + reproducible build + integration), the threat-model deltas (canonical A-02 mitigation moves to verifier; new A-V01 through A-V05 in governance/docs/threat-model/verifier.md), test strategy (unit + property + negative + hash-chain + reproducible- build + API regression + E2E), risks, non-goals, and the eight decisions Pulkit + Amit need to make at the W05 Friday review. Default if no decision is made by EOD Wednesday: Plan C (defer). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/design/verifier-service-split.md | 440 ++++++++++++++++++++++++++ 1 file changed, 440 insertions(+) create mode 100644 docs/design/verifier-service-split.md diff --git a/docs/design/verifier-service-split.md b/docs/design/verifier-service-split.md new file mode 100644 index 0000000..c35cb94 --- /dev/null +++ b/docs/design/verifier-service-split.md @@ -0,0 +1,440 @@ +# B02 — Verifier service split-out · Design doc (plan mode) + +> **Author:** Pulkit Pareek +> **Date:** 2026-05-13 (Day 3 of Week 1) +> **Status:** PROPOSED — awaiting Pulkit's pick between Plan A (Rust, separate repo) and Plan B (TypeScript workspace) before any code is written +> **Mandate:** plan mode required per `CLAUDE.md` for any change to `src/services/zkp.ts` +> **Reviewers:** Pulkit (technical), Amit (governance + go-to-market implications) +> **Pairs with:** `B02_verifier_service_bootstrap.md` (the build prompt), `governance: docs/threat-model/verifier.md` (the component threat model stub that gets fleshed out by this work) + +--- + +## 1. Why this exists + +Today, ZKP verification lives inline inside the central API repo at [`src/services/zkp.ts`](../../src/services/zkp.ts) — 209 lines of TypeScript that load `snarkjs` dynamically, hold the verification key in module state, and call straight into the Express request handlers. That's correct for the v0 / pre-pilot world (today). It is **not** what we want when: + +- A buyer's security team asks for the verifier's blast radius. Today the answer is "as broad as the Node process — the same heap holds the verifier, the audit log writer, the API key cache, and the SAML demo gate." That's not a defensible answer in a regulated industry. +- The trusted setup files (`identity_proof.wasm`, `identity_proof.zkey`, `identity_proof.vkey.json`) need to be cryptographically pinned to a specific service build. Today they're loaded from disk by `initZKP()` with no provenance check. +- The B19 load test (planned Week 6) needs a stable HTTP target with predictable latency. The current inline call goes through Express middleware + tenant auth + audit-log-write before the actual `groth16.verify` — load characteristics are dominated by everything *except* the verifier. +- The `cryptographer-reviewer` subagent's scope (per `CLAUDE.md` standing instruction §5) covers `src/services/zkp.ts`. Every change to the API repo's other files creates ambiguity about whether the cryptographer needs to look at it. Splitting the verifier into its own repo gives the cryptographer a tight, well-defined surface. + +B02 in the dev brainstorm puts this work in Week 2 Day 1 — i.e. Monday 2026-05-18. We're starting plan mode three days early (Wednesday Day 3 of Week 1) so Pulkit walks into Monday with a committed plan instead of a blank page. + +## 2. Current state — what we're peeling + +### 2.1 The file + +[`src/services/zkp.ts`](../../src/services/zkp.ts), 209 lines. Public surface: + +| Function | Used by | What it does | +|---|---|---| +| `initZKP()` | `src/server.ts:6` startup | Dynamically imports `snarkjs`, loads `verificationKey` from `config.zkp.vkeyPath` | +| `verifyBiometricProof(req)` | `src/routes/zkp.ts`, `src/routes/v1/zkp.ts` | Orchestrator: validates timestamp window (5 min), nonce format (UUIDv4), publicSignals shape (3 elements), then calls `verifyProofOffChain` and optionally `verifyProofOnChain` | +| `verifyProofOffChain(proof, pub)` | internal | Pure `snarkjs.groth16.verify` call | +| `getCircuitInfo()` | `src/routes/zkp.ts`, `src/routes/v1/zkp.ts` | Reads config: wasmPath, vkeyAvailable, verifyOnChain | +| `isZKPReady()` | `src/routes/health.ts:3` | Health check — is `snarkjs` imported | + +### 2.2 What's wrong with this surface, today + +1. **Module-state singletons** (`snarkjs`, `verificationKey`). Process restart re-loads from disk; no provenance check on the vkey file. If an attacker overwrites the vkey on disk between deploys, the next restart silently accepts the modified key. +2. **The fallback mode is dangerous.** When `verificationKey` is missing, `verifyBiometricProof` falls back to `isValidProofStructure` — a shape check that returns `true` for any well-formed Groth16 envelope. This is intentional for dev-without-compiled-circuit, but on production it would mean "no vkey = all proofs valid". `src/services/zkp.ts:124-128` logs a `warn` but doesn't refuse to serve. Open finding. +3. **Replay window not bound to issued nonces** (per threat-model A-02). Today the nonce is format-checked but not cross-referenced against a `issued_nonces` table — within the 5-min window, the same proof can be replayed. The dev brainstorm's A-02 explicitly calls this a "high residual risk" item. +4. **No verifier-local audit log.** Audit events about verifications are written to the API's Postgres `audit_events` table (good — tenant-scoped, retained 7y), but the *verifier itself* keeps no append-only local log. If the API's Postgres is compromised, an attacker can rewrite the audit history. The cryptographer-reviewer's mitigation in the brainstorm is "verifier has its own append-only SQLite with hash chain, independent of Postgres." +5. **`snarkjs` is a JavaScript implementation of Groth16.** It's correct + widely used, but the surface area of its dep tree (transitive: `ffjavascript`, `web-worker`, `@iden3/...`) is large and Node-only. The cryptographer-reviewer subagent's instructions specifically call out that audit-class verifier code prefers Rust + `arkworks` (~10 transitive deps, all audited). + +### 2.3 The five callers + +```text +src/server.ts — calls initZKP() at boot, before app.listen() +src/routes/zkp.ts — legacy /api/auth/zkp/verify (still served) +src/routes/v1/zkp.ts — /v1/auth/zkp/verify (the canonical surface) +src/routes/v1/zkp.ts — /v1/auth/zkp/circuit-info (read-only metadata) +src/routes/health.ts — GET /api/health includes isZKPReady() +``` + +The migration must preserve every one of those routes' externally-observable behavior. `tests/zkp.test.ts` is the regression net (run against the existing inline implementation today, must stay green after the split). + +## 3. The fork in the road — Plan A vs Plan B + +This is the decision Pulkit needs to make before Thursday morning. I lay out both honestly. **I recommend Plan A** for reasons in §3.3, but Plan B is defensible. + +### 3.1 Plan A — full B02 (Rust verifier in its own repo) + +**Repo:** new `pulkitpareek18/ZeroAuth-Verifier` (public, MIT, Rust). + +**What gets built:** + +- Rust binary, listens on `:3001` (loopback only — never internet-exposed). +- `POST /verify` — accepts `{ proof, public_signals, tenant_id, environment, circuit_version, correlation_id }`, returns `{ verified: bool, verifier_audit_id: string, latency_ms: number, circuit_version: string }`. +- `GET /health` — version + readiness. +- `GET /metrics` — Prometheus, fields redacted of any tenant-identifying data. +- Cargo workspace, two crates: `verifier-core` (the Groth16 logic) and `verifier-service` (axum HTTP shell). +- SQLite WAL-mode database `audit.db`, append-only via SQL triggers blocking UPDATE + DELETE. Schema: one table `verifier_events` with hash chain (see §4.4). +- Reproducible Docker build via `docker buildx build --provenance=true --sbom=true`. Build twice on clean machines → identical image digest. +- Three founding ADRs in the new repo: 0001 verifier architecture, 0002 Groth16/BN254 (acknowledging that the existing circuit uses BN128 = BN254-modular-equivalent), 0003 SQLite append-only. + +**What the API repo keeps:** + +- `src/services/zkp.ts` shrinks to ~40 lines — just an HTTP client to the verifier service. +- The five callers remain unchanged. +- A new config `config.zkp.verifierUrl` (defaults `http://localhost:3001`). +- The dev `docker-compose.yml` adds a `verifier` service. + +**Crate selection** (per B02 quality bar, minimal + audited): + +| Crate | Why | Pinned to | ADR scope | +|---|---|---|---| +| `arkworks-groth16` + `ark-bn254` + `ark-ff` | Groth16 verifier over BN254 | 0.5.x | first use → one bundle ADR is acceptable per B02 §1 | +| `axum` + `tower` + `tower-http` | HTTP server | 0.7.x | bundled | +| `tracing` + `tracing-subscriber` | Structured logs | 0.1.x | bundled | +| `serde` + `serde_json` | (de)serialization | 1.x | bundled | +| `rusqlite` + `r2d2_sqlite` | SQLite with connection pool | 0.30 / 0.22 | bundled | +| `sha2` | Hash chain | 0.10.x | bundled | +| `hex` + `uuid` | small utilities | latest | bundled | +| `proptest` (dev) | property tests for the verifier | 1.x | bundled | + +`unsafe` blocks: **zero** allowed without a per-block ADR. + +**Effort estimate:** 2.5–3.5 days of focused work for the bootstrap quality bar. Achievable Thu (Day 4) + Fri (Day 5) + Monday morning if it slips. + +### 3.2 Plan B — TypeScript split into a sub-workspace (the pragmatic shortcut) + +**Repo:** stays in `pulkitpareek18/ZeroAuth`. New directory `verifier/` becomes a separate npm workspace. + +**What gets built:** + +- `verifier/package.json` — own dependencies (`snarkjs`, `express`, `pg`, etc.) — fully isolated from the API repo's deps. +- `verifier/src/index.ts` — small Express server on `:3001`, single `POST /verify` route. +- Same SQLite audit log + hash chain as Plan A. +- Dockerfile stage `verifier-build` is added; production image grows by ~80 MB. + +**Effort estimate:** 1–1.5 days. Achievable Thursday alone. + +### 3.3 Which plan and why + +**Recommendation: Plan A.** + +Three reasons in priority order: + +1. **The cryptographer-reviewer subagent's standing instructions** (per `CLAUDE.md` §5) effectively assume Rust + arkworks. The reviewer's competence is calibrated against the arkworks API; reviewing a snarkjs split adds a calibration layer. +2. **The "no outbound network calls" constraint** (B02 §Constraints) is harder to enforce in Node — every transitive npm dep could opt into a fetch call. In Rust, an `axum`-only service with `default-features = false` on everything else has a much smaller "outbound by accident" surface. +3. **The reproducible build constraint** is feasible in both languages but trivial in Rust + buildx + cargo-lock vs gymnastic in Node (npm install non-determinism, transitive native modules). + +**Counter-argument for Plan B:** time. If the demo battery is still HOLD by Friday and we have no signed buyer, spending 3 days on a Rust rewrite when a 1-day Node split would buy 80% of the security wins is suboptimal. **Compromise:** ship Plan B first, treat it as the "v0 split" that gets the routing surface right, and migrate to Plan A (Rust) in Week 4 once the IoT firmware is the dominant work. This means writing the design doc twice — once now (Plan B), once in Week 4 (Plan A). Real cost: ~1 extra day of design work + the throwaway Node code. + +**My pick:** Plan A. The brainstorm framed this as Week 2 Day 1 specifically because the first SOW conversations are 4 weeks out — there's just barely enough runway to get the verifier into the Rust-on-arkworks shape that pilot buyers will expect. Slipping to Plan B now means slipping again in Week 4, which is when the IoT firmware also lands; double-loading week 4 is the worst time. + +But — and this matters — **Pulkit is the only engineer.** If Pulkit's Rust capacity is limited (the brainstorm doesn't claim Pulkit is a Rust expert; it claims Claude Code can scaffold Rust), there's a real risk that the Rust path eats 5 days instead of 3. Pulkit decides. + +### 3.4 Decision needed today + +I need one of: + +- **A.** "Go Plan A (Rust separate repo)." → tomorrow I scaffold the Rust crate. +- **B.** "Go Plan B (TypeScript workspace)." → tomorrow I peel the Node code into `verifier/`. +- **C.** "Hold — start B02 next week as the brainstorm says, do something else Thursday." → I roll Thursday into closing PR #22's three Mediums (issue [#26](https://github.com/pulkitpareek18/ZeroAuth/issues/26)) and the W05 review prep. + +If no decision by EOD Wednesday, default = C (defer). + +--- + +## 4. The plan (Plan A) + +The rest of this doc assumes Plan A. If we pick B, I produce a separate, shorter doc. + +### 4.1 Repo layout + +```text +zeroauth-verifier/ +├── CLAUDE.md ← constitution; references governance: docs/shared/* +├── README.md +├── LICENSE ← MIT (matches API repo) +├── Cargo.toml ← workspace +├── Cargo.lock ← committed +├── verifier-core/ +│ ├── Cargo.toml +│ └── src/ +│ ├── lib.rs ← public API: verify_proof(), VerificationKey +│ ├── groth16.rs ← arkworks wrapping +│ ├── circuit_loader.rs ← load + checksum the vkey at startup +│ └── errors.rs +├── verifier-service/ +│ ├── Cargo.toml +│ └── src/ +│ ├── main.rs ← axum boot +│ ├── routes/ +│ │ ├── verify.rs ← POST /verify +│ │ ├── health.rs ← GET /health +│ │ └── metrics.rs ← GET /metrics +│ ├── audit/ +│ │ ├── schema.rs ← SQL migrations +│ │ ├── writer.rs ← append-only writer with hash chain +│ │ └── verify_chain.rs ← reconstruct + validate chain +│ └── config.rs +├── circuits/ ← symlink or copy of the trusted setup files +│ ├── identity_proof.vkey.json +│ └── CHECKSUMS.txt ← SHA-256 of every trusted-setup file +├── tests/ +│ ├── verify_integration.rs +│ ├── audit_append_only.rs ← negative test: UPDATE/DELETE fail +│ ├── hash_chain.rs ← reproducible reconstruction +│ └── property/ ← proptest fuzzing of proof structure +├── Dockerfile ← multi-stage, --provenance=true +├── docker-compose.yml ← dev-only, for local API↔verifier +├── adr/ +│ ├── 0001-verifier-architecture.md +│ ├── 0002-groth16-bn254-not-plonk.md +│ └── 0003-sqlite-append-only.md +└── .github/workflows/ + ├── ci.yml ← cargo test --release + clippy + └── reproducible-build.yml ← builds twice, asserts image digest match +``` + +### 4.2 HTTP shape + +`POST /verify` — request body: + +```json +{ + "proof": { + "pi_a": ["...", "...", "1"], + "pi_b": [["...","..."],["...","..."],["1","0"]], + "pi_c": ["...", "...", "1"], + "protocol": "groth16", + "curve": "bn128" + }, + "public_signals": ["...", "...", "..."], + "tenant_id": "uuid", + "environment": "live|test", + "circuit_version": "v1", + "correlation_id": "uuid" +} +``` + +Response 200: + +```json +{ + "verified": true, + "verifier_audit_id": "uuid", + "latency_ms": 12, + "circuit_version": "v1" +} +``` + +Response 400 on malformed input; 503 on key-not-loaded; 500 only on unexpected internal panic (which should never happen — every panic site is an `expect` with a documented invariant). + +**No tenant data in the response.** Just the boolean verdict + an opaque audit reference + latency for observability. + +### 4.3 Audit log schema + +SQLite, WAL mode for crash safety + concurrent readers: + +```sql +CREATE TABLE verifier_events ( + id TEXT PRIMARY KEY, -- UUID v4 + tenant_id TEXT NOT NULL, + environment TEXT NOT NULL, -- 'live' | 'test' + circuit_version TEXT NOT NULL, + correlation_id TEXT NOT NULL, -- traces back to API's audit_events row + verified INTEGER NOT NULL, -- 0 | 1 + proof_hash TEXT NOT NULL, -- SHA-256 of canonical(proof) — full proof never stored + pub_signals_hash TEXT NOT NULL, -- SHA-256 of canonical(public_signals) + latency_us INTEGER NOT NULL, + created_at TEXT NOT NULL, -- ISO 8601 UTC + prev_hash TEXT NOT NULL, -- chain pointer + entry_hash TEXT NOT NULL -- SHA-256(canonical(this row excluding entry_hash) || prev_hash) +); + +CREATE INDEX idx_verifier_tenant_env_created + ON verifier_events (tenant_id, environment, created_at DESC); + +-- Append-only triggers +CREATE TRIGGER verifier_events_no_update + BEFORE UPDATE ON verifier_events + BEGIN SELECT RAISE(ABORT, 'verifier_events is append-only'); END; + +CREATE TRIGGER verifier_events_no_delete + BEFORE DELETE ON verifier_events + BEGIN SELECT RAISE(ABORT, 'verifier_events is append-only'); END; +``` + +Genesis row inserted at first boot with `prev_hash = '0'.repeat(64)`. + +### 4.4 Hash chain construction + +Per B02 §5: + +```text +entry_hash = sha256(canonical_serialize(entry_without_entry_hash) || prev_hash) +``` + +Canonical serialization: JSON with sorted keys, no whitespace, UTF-8. Implementation: `serde_json` with the `preserve_order` feature disabled (default → sorts) + bytes pumped to `sha2::Sha256`. + +The `verify_chain.rs` test reconstructs the chain from a clean DB checkout and asserts each `entry_hash` matches a re-computation. If any row's `entry_hash` doesn't match its `prev || serialize(row)`, the chain is broken — alert. + +### 4.5 Verification key cache strategy + +- Loaded at startup from `circuits/identity_proof.vkey.json`. +- File SHA-256 compared against `CHECKSUMS.txt` (which is committed); mismatch → refuse to start. +- Cached as a parsed `ark_groth16::VerifyingKey` in an `Arc<>` for cheap clone-per-request. +- **No reload at runtime.** Updating the vkey requires a service restart. ADR-0001 captures this. + +### 4.6 Reproducible build + +```dockerfile +# Dockerfile (verifier) +FROM rust:1.85-slim-bookworm@sha256: AS builder +WORKDIR /src +COPY Cargo.toml Cargo.lock ./ +COPY verifier-core/Cargo.toml verifier-core/ +COPY verifier-service/Cargo.toml verifier-service/ +RUN cargo fetch --locked +COPY . . +RUN cargo build --release --locked --frozen + +FROM gcr.io/distroless/cc-debian12@sha256: +COPY --from=builder /src/target/release/verifier-service /verifier +COPY circuits/ /circuits/ +EXPOSE 3001 +USER 1000:1000 +ENTRYPOINT ["/verifier"] +``` + +Build command in CI: `docker buildx build --provenance=true --sbom=true --output type=oci,dest=verifier.oci . ` + +Reproducibility check (the `.github/workflows/reproducible-build.yml`): build twice in fresh runners; assert `sha256sum verifier.oci` matches across both runs. If it doesn't, fail the workflow + open an issue. + +### 4.7 API repo changes + +Inside `pulkitpareek18/ZeroAuth`: + +1. **`src/services/zkp.ts` shrinks** to ~40 lines. New surface: + + ```typescript + export async function verifyBiometricProof(req: ZKPVerificationRequest): Promise { + const res = await fetch(`${config.zkp.verifierUrl}/verify`, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + proof: req.proof, + public_signals: req.publicSignals, + tenant_id: req.tenantId, + environment: req.environment, + circuit_version: 'v1', + correlation_id: req.correlationId ?? uuidv4(), + }), + signal: AbortSignal.timeout(2000), + }); + // ... map to ZKPVerificationResponse + } + ``` + +2. **`config.zkp.verifierUrl`** added; default `http://localhost:3001`. Production sets via env var. +3. **`docker-compose.yml`** adds the verifier service; production stack adds it to `prod` profile. +4. **`isZKPReady()`** becomes a 1-second timeout fetch to `${verifierUrl}/health`. The API's `/api/health` aggregates. +5. **`tests/zkp.test.ts`** stays green — but now requires either (a) the verifier service running, or (b) a mock `fetch` for unit tests. I'll add a `tests/__mocks__/verifier.ts` that mocks `global.fetch` for the verifier URL. + +### 4.8 Migration order (Thursday + Friday) + +**Thursday Day 4 — scaffold + verifier service** + +1. Morning: `gh repo create pulkitpareek18/ZeroAuth-Verifier --public`. Clone locally. Add `CLAUDE.md` (copying conventions from API repo). +2. `cargo init --bin verifier-service && cargo new --lib verifier-core` workspace setup. +3. Implement `verifier-core` with arkworks Groth16. Write unit + property tests **first**. +4. Implement `verifier-service` HTTP shell with axum. Single `/verify` route, no audit log yet. +5. Wire `tests/verify_integration.rs` — starts the server in-process, posts a known-good proof, expects 200. +6. End-of-day target: `cargo test --release` green; `curl -X POST http://localhost:3001/verify` works against a known-good proof. + +**Friday Day 5 — audit log + reproducible build + integration** + +1. Morning: SQLite migrations + writer + hash chain. Append-only triggers + negative tests. +2. `Dockerfile` + `docker buildx --provenance`. Run twice; verify identical digest. +3. Wire API repo's `src/services/zkp.ts` to point at `${verifierUrl}/verify`. Update `tests/zkp.test.ts` with the fetch mock. +4. Run end-to-end: API receives `POST /v1/auth/zkp/verify`, forwards to verifier, returns the result. +5. Run `cryptographer-reviewer` subagent on the verifier repo's diff. +6. Run `security-reviewer` subagent on the API repo's `src/services/zkp.ts` change. +7. Open PR in API repo: `Replace inline zkp with HTTP client to zeroauth-verifier`. +8. Update governance repo `docs/threat-model/verifier.md` from stub → real component threat model. +9. Update governance repo `release-coordination/matrix.md` with a new compatibility set `pre-release-2`. + +### 4.9 Test plan + +| Test | Lives in | What it proves | +|---|---|---| +| `verifier-core` unit | `zeroauth-verifier: verifier-core/src/lib.rs` | arkworks Groth16 accepts the known-good fixture | +| Property tests | `zeroauth-verifier: tests/property/` | Random well-formed proofs are rejected; only fixture passes | +| Negative tests | `zeroauth-verifier: tests/verify_integration.rs` | Wrong public signals → 200 with `verified: false` | +| Append-only | `zeroauth-verifier: tests/audit_append_only.rs` | `UPDATE verifier_events …` → SQL trigger aborts; same for DELETE | +| Hash chain | `zeroauth-verifier: tests/hash_chain.rs` | After N writes, `verify_chain.rs` reconstructs every `entry_hash` from `prev_hash || canonical(row)`. Mutating any column breaks the chain. | +| Reproducible build | `.github/workflows/reproducible-build.yml` | Two clean builds produce identical OCI digest | +| API repo regression | `pulkitpareek18/ZeroAuth: tests/zkp.test.ts` | After the split, every existing test stays green | +| End-to-end | `pulkitpareek18/ZeroAuth: dashboard/e2e/happy-path.spec.ts` | Signup → first key → verification call → audit log entry — all still works | + +### 4.10 Threat model deltas + +After the split, update `pulkitpareek18/ZeroAuth-Governance: docs/threat-model/`: + +- **`canonical.md`** — A-02 (replayed proof verification) — mitigation summary updates: "issued-nonce binding lives in the verifier service, not the API" +- **`api.md`** — A-02 section pointer changes from "primary mitigation lives in API" to "delegated to verifier" +- **`verifier.md`** — promoted from stub to first-class: + - A-V01 — Verifier audit log tamper via direct SQLite write + - A-V02 — Verification key swap on disk between deploys + - A-V03 — Side-channel attack via timing on `pi_a` length variations + - A-V04 — Resource exhaustion via crafted proof inputs (mitigated: every input bounded; arkworks deserializer hardened) + - A-V05 — Cross-tenant verification via spoofed `tenant_id` in `/verify` request (mitigated: API is the only client; verifier trusts API but logs `tenant_id` for forensic correlation) + +### 4.11 Risks + open questions + +1. **Rust toolchain on Pulkit's machine** — verified? If not, day 4 morning starts with `rustup install stable` and learning curve cost. +2. **arkworks BN254 vs our existing circuit's BN128.** They're the same curve (BN254 is the modern name for what `snarkjs` calls `bn128`). The vkey format is compatible — `snarkjs` exports include the BN254 G1/G2 points in a JSON shape arkworks can parse with a small adapter. **TODO:** verify before Thursday — if the shapes diverge, the work doubles. +3. **Issued-nonce binding (A-02)** is an open finding. The verifier-side split is a natural place to add the `issued_nonces` SQLite table. Plan A.5 (the bonus): include the issued-nonce binding in the v0 verifier release. Adds ~2 hours. +4. **Performance regression.** The current inline call is a function invocation; the split is a localhost HTTP round-trip. Expected overhead ~1-2ms per call. Acceptable, but should be measured (B19 load test target). +5. **What does production deployment look like?** Today, single `node dist/server.js` on the VPS. Plan A adds a second process (`verifier`) on the same VPS, separate user, separate filesystem, separate systemd unit (or Docker compose service). The Caddyfile doesn't change (verifier never exposed). **Deployment ADR needed.** +6. **Backup of the SQLite audit log.** The Postgres `audit_events` table is the primary audit record; the SQLite is a tamper-evident replica. Backup cadence: nightly snapshot + offsite. The Postgres backup ADR (operational suite open item) covers this — track in the same place. + +## 5. Non-goals + +Explicitly NOT in this design: + +- The B19 load test (separate Week 6 work) +- A multi-region verifier (deferred — single region until the first non-Indian tenant) +- A Plonk verifier (Groth16 is committed; switching curves is a separate ADR) +- An on-chain verifier rotation procedure (handled by `governance: docs/shared/security-policy.md` §3.7) +- A WebAssembly verifier for client-side replay (interesting but out of scope; would require separate threat model) + +## 6. Out-of-scope, but worth flagging for Week 3+ + +- The IoT firmware (B03, Week 3) will need to call the verifier directly (loopback inside the same edge device). The HTTP shape designed here lets that drop in unchanged. Good outcome. +- The mobile SDK (B04, Week 5) does NOT call the verifier — proof generation happens on-device, verification happens server-side. So the SDK only ever calls the API. The HTTP shape designed here doesn't affect the SDK. +- The `B19_k6_verifier_load_test` build prompt will target `POST /verify` directly. We get B19 readiness for free. + +--- + +## 7. Decision matrix — for Pulkit + Amit at the W05 review + +| Decision | Options | Recommendation | +|---|---|---| +| Plan A (Rust) vs Plan B (TS workspace) vs hold | A / B / C | **A** | +| Repo structure | One workspace (verifier-core + verifier-service) vs single crate | **Workspace** (per B02 §2) | +| Audit log location | SQLite local to verifier vs Postgres central | **SQLite local** (per B02 §4) — defense in depth | +| Hash chain inclusion | v0 or v1 of verifier | **v0** — non-negotiable per B02 §5 | +| Issued-nonce binding | v0 or v1 | **v0** — closes the A-02 high residual finding | +| Reproducible build | v0 or v1 | **v0** — per B02 quality bar | +| Deployment | Same VPS / Docker compose vs separate VPS | **Same VPS, separate container** (cost) | +| Verifier-API auth | Static shared secret vs mTLS vs none | **Static shared secret** for v0 (loopback only); mTLS in v1 once we have a real PKI | + +## 8. What I need from Pulkit before Thursday morning + +1. Plan A vs B vs C — pick one. +2. (If A:) Rust toolchain ready on dev machine? `rustc --version` ≥ 1.85. +3. (If A:) Confirmation that the existing `circuits/identity_proof.vkey.json` is BN254-compatible — I'll verify the JSON shape Thursday morning, but if you already know, save me the half-hour. +4. (If A:) Permission to create `pulkitpareek18/ZeroAuth-Verifier` as a public repo. +5. Acknowledgement that this work spans Thu + Fri and may bleed into Monday Week 2. The other Day 4/5 items (closing PR #22's Mediums) get re-prioritized. + +If no answer by EOD Wednesday: default = **C (defer to Week 2 Day 1, do PR #22 Mediums Thursday/Friday)**. + +--- + +LAST_UPDATED: 2026-05-13 +OWNER: Pulkit Pareek From d187c77daffd97e6edbacfb262a6579fc3a880ac Mon Sep 17 00:00:00 2001 From: Pulkit Pareek Date: Wed, 13 May 2026 13:17:36 +0530 Subject: [PATCH 5/5] =?UTF-8?q?Address=20PR=20#22=20security=20findings=20?= =?UTF-8?q?(issue=20#26)=20=E2=80=94=206=20of=207=20closed,=20F-2=20deferr?= =?UTF-8?q?ed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes F-1, F-3, F-4, F-5, F-6, F-7. Leaves F-2 open and tracked because the real fix needs email infrastructure that doesn't exist yet. F-1 — Reconcile threat_model.md A-09 with localStorage reality Doc lied that the console JWT "lives in client memory"; in fact it's persisted to localStorage["zeroauth.console_token"]. Rewrote A-09 to document the actual choice + the trade-off + the open ADR (cookie migration) so the doc tells the truth about the code. Pointer to the governance repo's authoritative component-level dashboard.md. F-3 — Plumb actor_type='console' through audit log Service functions createDevice/updateDevice/createTenantUser/ updateTenantUser now take an `actor: AuditActor` parameter ({ type, id, email }) instead of a positional actorId. Console routes pass { type: 'console', id: tenantId, email: req.console.email }; v1 routes pass { type: 'api_key', id: apiKey.id }. The audit row's actor_type now reflects who actually performed the action, and the operator's email lands in metadata.actor_email when set. F-4 — Per-tenant write rate-limiter New consoleWriteLimiter (60 writes / 15 min, keyed on req.console.tenantId) on POST /keys, DELETE /keys/:id, POST /devices, PATCH /devices/:id, POST /users, PATCH /users/:id. A stolen JWT now burns through 60 writes, not 300, before throttling — and the limit is per tenant, not per IP, so it disincentivises the actual attack class. F-5 — Add jti + aud to console JWT issueConsoleToken now sets `jwtid: randomUUID()` and `audience: 'zeroauth-console'`. verifyConsoleToken verifies the audience explicitly. Console JWTs are therefore rejected on /v1 (and vice versa) once /v1 grows its own JWT layer. The jti is the seam for the Redis-backed revocation list (still open — separate ADR). F-6 — Validate ?limit= query New parseLimit() helper rejects non-integer, ≤0, or >1000 with a thrown RangeError, caught per-route to return 400 invalid_limit. Replaces five identical `parseInt(String(req.query.limit), 10)` sites. F-7 — Machine-code in error: field Two console handlers (/signup and /login) used the human string "Email and password are required." in the error field. Now they use invalid_request + a message field, matching the codebase convention. F-2 — Email enumeration on /api/console/signup — DEFERRED The byte-identical fix (always 202 + verification email) requires email infrastructure we don't have yet. The interim option ("uniform 400 invalid_request") also leaks (existing→400 vs fresh→201). Left the 409 in place with an explanatory comment, kept the finding open on issue #26 as a subtask gated on email-service adoption ADR. Tests 64 → 68 passing. Added: F-5 audience-mismatch test (JWT minted with aud='zeroauth-v1' is rejected with 401 session_expired); F-6 invalid_limit tests for non-integer ('abc'), lower bound (0), and upper bound (1001) — all 400 invalid_limit. Updated F-3 assertions in console-proxy.test.ts and central-api.test.ts to verify the new 4-positional createDevice/createTenantUser signature including the actor object. Typecheck: clean. Lint: 0 errors, 10 pre-existing warnings unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/threat_model.md | 7 +- src/routes/console.ts | 155 +++++++++++++++++++++++++++++------- src/routes/v1/devices.ts | 28 +++---- src/routes/v1/users.ts | 31 +++----- src/services/platform.ts | 56 +++++++++---- tests/central-api.test.ts | 2 +- tests/console-proxy.test.ts | 62 ++++++++++++++- 7 files changed, 256 insertions(+), 85 deletions(-) diff --git a/docs/threat_model.md b/docs/threat_model.md index f3fc3c1..668477c 100644 --- a/docs/threat_model.md +++ b/docs/threat_model.md @@ -102,10 +102,11 @@ |---|---| | **Class** | Information disclosure / EoP (STRIDE: I + E) | | **Surface** | Anything rendered inside the dashboard SPA at `/dashboard/*` | -| **Description** | The console JWT lives in client memory and is replayed on every API call. If an XSS payload executes in the SPA, the attacker reads the token from memory and uses it from anywhere. | -| **Mitigation** | (a) Strict CSP from Helmet — no `unsafe-eval`, no inline scripts beyond the existing landing-page allowance. (b) React's default escape protects against most reflected XSS. (c) **Never** introduce `dangerouslySetInnerHTML` without an ADR. (d) The console JWT is short-lived (24h) and revocable by tenant suspension. | -| **Test status** | CSP header presence is asserted in `tests/health.test.ts` (indirectly via helmet output). **Missing:** an integration test that asserts no inline `