Harden CLI, leakage, render & director safety (adversarial-review fixes)#9
Conversation
…tor safety) Validated each roast finding against the code and fixed the real, tractable ones (architectural items — full SSRF DNS-pin sandbox, render streaming rewrite, golden-frame visual harness — are deferred; see PR). CLI / IO / leakage: - Remove unimplemented `--config` from help (strict parseArgs threw on it). - Add real per-command `--help`/`-h` to generate/record/render. - Warn on deprecated `--allow-private-network` instead of silently ignoring it. - llm.ts: only surface raw provider error bodies under SUPERCUT_VERBOSE (they can echo prompt/account metadata). - event-log: warn loudly (deduped) when dropping unknown event types. - doctor: actually launch Chromium + probe WebCodecs H.264 in a secure context. - sourceRoutes: exclude test/spec/fixture/story dirs from ingestion. Render: - Token-gate the host HTML (`/`, `/host.html`) — the browser already navigates with `?t=`, so no behavior change. - Write raw `encoded.h264` to a tmp file and unlink it in finally; the take dir is no longer mutated / left with stale partial output. - Lower default H.264 bitrate 40→16 Mbps to cut Chromium memory pressure. Director safety: - Broaden the destructive-action lexicon conservatively (publish, transfer, regenerate, suspend, terminate, downgrade) — clearly irreversible verbs only. - Validate hidden-element reveal order: reject a hidden selector unless a prior action in the scene plausibly reveals it. - Vision QC samples up to 3 frames per scene (incl. the final hold), not one. Tests: +5 (event-log warn + dedupe, source-route excludes, destructive lexicon positives/negatives, hidden-element reveal order). Build clean; 83 unit tests pass.
There was a problem hiding this comment.
What this PR does: Harden CLI, leakage, render & director safety based on adversarial-review findings. It fixes CLI help and error messages, adds a comprehensive doctor WebCodecs probe, excludes test/fixture dirs from route ingestion, token-gates the render host page, writes render temp output outside the take dir and cleans it up, lowers bitrate to reduce memory pressure, expands the destructive-action lexicon, validates hidden-element reveal order, improves vision QC sampling, and adds corresponding tests.
Risk areas: The render index's inner finally block closes the browser without error handling, which could prevent server closure if close() fails.
Verdict:
| if (text.startsWith("[render]")) { | ||
| if (text.includes("FATAL")) fatal = text; | ||
| else if (process.env.SUPERCUT_VERBOSE) console.log(text); | ||
| } |
There was a problem hiding this comment.
🟡 P2 (Medium): The await browser.close() call in the inner finally block is not wrapped in a try/catch. If browser.close() throws, the subsequent server.close() will not be reached, potentially leaving the HTTP server listening on a random port until the Node process exits. Guard the call with a .catch(() => {}) or a try/catch to ensure server cleanup always runs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7d16fb43c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const lastEventT = log.events.reduce((m, e) => Math.max(m, e.t), s.t); | ||
| const sceneEndT = end === Infinity ? lastEventT : end; | ||
| const holdT = Math.max(keyT, sceneEndT - 200); // just inside the final hold |
There was a problem hiding this comment.
Sample the final scene using the last captured frame
When this is the last scene, there is no following scene marker, and lastEventT is only the maximum event timestamp, not the take's last captured frame time. The recorder does not emit an event at the end of an action or at hold_ms, so a closing scene with a long action/hold still samples around firstInteraction + 800ms and can miss the late blank/error/overlay this change is meant to catch.
Useful? React with 👍 / 👎.
| const { chromium } = await import("playwright").catch(() => { | ||
| throw new Error("playwright not installed — run `npm install`"); | ||
| }); |
There was a problem hiding this comment.
Keep doctor checks from throwing on missing Playwright
When playwright is not installed or cannot be imported, this rethrows before the check's try/catch; doctor() does not catch individual check failures, so the command exits through the top-level exception path instead of returning a normal failed check and summary. This hits exactly the dependency-diagnosis path that supercut doctor is supposed to handle.
Useful? React with 👍 / 👎.
Co-Messi
left a comment
There was a problem hiding this comment.
Deep review pass: build and tests pass locally, but I found a few actionable correctness gaps in the hardening changes.
| // the last frame we can attribute to this scene; for the final scene `end` | ||
| // is Infinity, so fall back to the take's last captured frame time. | ||
| const lastEventT = log.events.reduce((m, e) => Math.max(m, e.t), s.t); | ||
| const sceneEndT = end === Infinity ? lastEventT : end; |
There was a problem hiding this comment.
This still misses late failures in the final scene. For the last scene, lastEventT comes from events.json, but capture advances clock during hold_ms without emitting an event, while frames continue to be captured. So sceneEndT becomes the last action time rather than the last captured frame, and the promised final-hold sample can land before the hold where a late blank/error/banner appears. Consider reading the last frames-index.json t_source for the final scene (or passing it out of frameJpegB64) and add a regression with a final scene whose hold extends past the last action.
| "coverage", ".vercel", ".cache", "__tests__", "test", "tests", | ||
| // A5: test/spec/fixture/story dirs hold fake pages and sample data — never | ||
| // real product routes — so keep them out of the crawl seeds and LLM prompt. | ||
| "e2e", "__mocks__", "stories", ".storybook", "cypress", "playwright", "fixtures", |
There was a problem hiding this comment.
The comment and PR claim spec/fixture/story dirs are excluded, but spec is not actually in SKIP_DIRS. A repo with app/spec/.../page.tsx or pages/spec/... will still seed fake routes into the crawl/LLM prompt. Please add spec (and ideally a regression next to the new fixture-dir test).
| // live app is genuinely costly AND it is rarely the intended payoff beat. | ||
| export const DESTRUCTIVE_RE = | ||
| /\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke)\b/i; | ||
| /\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke|publish|transfer|regenerate|suspend|terminate|downgrade)\b/i; |
There was a problem hiding this comment.
This broad transfer term looks too aggressive for the stated criterion. It will hide benign/common demo actions such as “Transfer to list”, “Transfer ticket”, or “Transfer call”, even though the comment says only irreversible/high-blast-radius controls should be excluded. Consider narrowing to phrases like transfer funds|transfer ownership|transfer account|transfer domain (and restore a negative test for a benign transfer label).
| @@ -28,6 +28,14 @@ async function main(): Promise<number> { | |||
| case "doctor": | |||
There was a problem hiding this comment.
doctor --help still runs the dependency checks instead of printing usage, despite the top-level help/README saying any command accepts --help. This can take seconds and can exit non-zero on a machine missing ffmpeg/Chromium. Please intercept rest.includes('--help') || rest.includes('-h') for doctor too, or narrow the docs to only generate/record/render.
- render: guard browser.close() in finally so server.close() always runs (a throwing close otherwise leaked the loopback render port). - qc: sample the final scene's hold against the last captured FRAME, not the last event — frames continue through hold_ms with no event, so a late blank/error during a closing hold was still missed. - doctor: import playwright INSIDE the check's try so a missing/broken install surfaces as a failed check instead of throwing past doctor(). - sourceRoutes: actually add `spec`/`specs` to SKIP_DIRS (was claimed, not done). - inventory: narrow `transfer` to funds/money/ownership/account/domain so benign "Transfer to list/ticket/call" stay filmable. - cli: `doctor --help` prints usage instead of running the (slow, possibly non-zero) dependency checks. Tests: add benign-transfer negatives + Transfer-ownership positive, and a `spec` dir to the source-route exclusion regression. Build clean; 83 unit tests pass.
Addresses the adversarial review in
.roast/REPORT-latest.md. Each finding was validated against the code first; this PR fixes the real, tractable ones with tests. Build is clean and the full unit suite passes (83 tests, +5 new). A render smoke-test confirmed the render-path changes end-to-end.Fixed
CLI / IO / leakage
--configfrom help (strictparseArgsthrew on it → broken first-run UX).--help/-htogenerate/record/render(previously threw).--allow-private-networkinstead of silently ignoring it.llm.ts: raw provider error bodies (can echo prompt/account metadata) are now shown only underSUPERCUT_VERBOSE.event-log: warns loudly (deduped) when dropping unknown event types instead of silently.doctor: now actually launches Chromium and probes WebCodecs H.264 in a secure context.sourceRoutes: excludes test/spec/fixture/story dirs from prompt ingestion.Render
/,/host.html) — browser already navigates with?t=, so no behavior change (smoke-tested).encoded.h264now written to a tmp file and unlinked infinally— the take dir is no longer mutated or left with stale partial output on failure (verified: take dir untouched after render).Director safety
publish,transfer,regenerate,suspend,terminate,downgrade) — clearly-irreversible verbs only; benign verbs (send/save/submit/search) deliberately still allowed.Deferred (architectural — intentionally not rushed into an unreviewed PR)
Verification
npm run buildclean;npx vitest run(unit) → 83 passed.generate --helpprints usage + exits 0;--configgone; deprecation warning fires.render --take <take>succeeds, temp h264 cleaned up, take dir not mutated, lower bitrate confirmed.Do not merge yet — for review.