Skip to content

Harden CLI, leakage, render & director safety (adversarial-review fixes)#9

Merged
Co-Messi merged 2 commits into
mainfrom
fix-roast-hardening
Jun 22, 2026
Merged

Harden CLI, leakage, render & director safety (adversarial-review fixes)#9
Co-Messi merged 2 commits into
mainfrom
fix-roast-hardening

Conversation

@Co-Messi

Copy link
Copy Markdown
Owner

Addresses the adversarial review in .roast/REPORT-latest.md. Each finding was validated against the code first; this PR fixes the real, tractable ones with tests. Build is clean and the full unit suite passes (83 tests, +5 new). A render smoke-test confirmed the render-path changes end-to-end.

Fixed

CLI / IO / leakage

  • Removed unimplemented --config from help (strict parseArgs threw on it → broken first-run UX).
  • Added real per-command --help/-h to generate/record/render (previously threw).
  • Warn on deprecated --allow-private-network instead of silently ignoring it.
  • llm.ts: raw provider error bodies (can echo prompt/account metadata) are now shown only under SUPERCUT_VERBOSE.
  • event-log: warns loudly (deduped) when dropping unknown event types instead of silently.
  • doctor: now actually launches Chromium and probes WebCodecs H.264 in a secure context.
  • sourceRoutes: excludes test/spec/fixture/story dirs from prompt ingestion.

Render

  • Token-gate the host HTML (/, /host.html) — browser already navigates with ?t=, so no behavior change (smoke-tested).
  • Raw encoded.h264 now written to a tmp file and unlinked in finally — the take dir is no longer mutated or left with stale partial output on failure (verified: take dir untouched after render).
  • Lowered default H.264 bitrate 40→16 Mbps to cut Chromium memory pressure (render output 25.5MB→13.7MB on the same take).

Director safety

  • Broadened the destructive-action lexicon conservatively (publish, transfer, regenerate, suspend, terminate, downgrade) — clearly-irreversible verbs only; benign verbs (send/save/submit/search) deliberately still allowed.
  • Validate hidden-element reveal order: a hidden selector is rejected unless a prior action in the scene plausibly reveals it.
  • Vision QC now samples up to 3 frames/scene (incl. the final hold), not one — catches late error/empty states.

Deferred (architectural — intentionally not rushed into an unreviewed PR)

  • Full SSRF DNS-rebind defense (resolve-and-pin / proxy sandbox). The guard remains opt-in with the documented TOCTOU gap; a correct fix spans fetch + browser and needs its own tested change.
  • Render output streaming (vs buffer-then-POST). Mitigated here via the bitrate cut + named OOM backstop, but the streaming rewrite is a separate effort.
  • Golden-frame / SSIM visual regression harness.

Verification

  • npm run build clean; npx vitest run (unit) → 83 passed.
  • CLI: generate --help prints usage + exits 0; --config gone; deprecation warning fires.
  • Render smoke: render --take <take> succeeds, temp h264 cleaned up, take dir not mutated, lower bitrate confirmed.

Do not merge yet — for review.

…tor safety)

Validated each roast finding against the code and fixed the real, tractable ones
(architectural items — full SSRF DNS-pin sandbox, render streaming rewrite,
golden-frame visual harness — are deferred; see PR).

CLI / IO / leakage:
- Remove unimplemented `--config` from help (strict parseArgs threw on it).
- Add real per-command `--help`/`-h` to generate/record/render.
- Warn on deprecated `--allow-private-network` instead of silently ignoring it.
- llm.ts: only surface raw provider error bodies under SUPERCUT_VERBOSE
  (they can echo prompt/account metadata).
- event-log: warn loudly (deduped) when dropping unknown event types.
- doctor: actually launch Chromium + probe WebCodecs H.264 in a secure context.
- sourceRoutes: exclude test/spec/fixture/story dirs from ingestion.

Render:
- Token-gate the host HTML (`/`, `/host.html`) — the browser already navigates
  with `?t=`, so no behavior change.
- Write raw `encoded.h264` to a tmp file and unlink it in finally; the take dir
  is no longer mutated / left with stale partial output.
- Lower default H.264 bitrate 40→16 Mbps to cut Chromium memory pressure.

Director safety:
- Broaden the destructive-action lexicon conservatively (publish, transfer,
  regenerate, suspend, terminate, downgrade) — clearly irreversible verbs only.
- Validate hidden-element reveal order: reject a hidden selector unless a prior
  action in the scene plausibly reveals it.
- Vision QC samples up to 3 frames per scene (incl. the final hold), not one.

Tests: +5 (event-log warn + dedupe, source-route excludes, destructive lexicon
positives/negatives, hidden-element reveal order). Build clean; 83 unit tests pass.
@Co-Messi

Copy link
Copy Markdown
Owner Author

@codex

@Co-Messi

Copy link
Copy Markdown
Owner Author

@Verdict

@verdict-0528 verdict-0528 Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this PR does: Harden CLI, leakage, render & director safety based on adversarial-review findings. It fixes CLI help and error messages, adds a comprehensive doctor WebCodecs probe, excludes test/fixture dirs from route ingestion, token-gates the render host page, writes render temp output outside the take dir and cleans it up, lowers bitrate to reduce memory pressure, expands the destructive-action lexicon, validates hidden-element reveal order, improves vision QC sampling, and adds corresponding tests.
Risk areas: The render index's inner finally block closes the browser without error handling, which could prevent server closure if close() fails.
Verdict: ⚠️ Minor concerns

Comment thread src/render/index.ts
if (text.startsWith("[render]")) {
if (text.includes("FATAL")) fatal = text;
else if (process.env.SUPERCUT_VERBOSE) console.log(text);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 P2 (Medium): The await browser.close() call in the inner finally block is not wrapped in a try/catch. If browser.close() throws, the subsequent server.close() will not be reached, potentially leaving the HTTP server listening on a random port until the Node process exits. Guard the call with a .catch(() => {}) or a try/catch to ensure server cleanup always runs.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7d16fb43c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/director/qc.ts
Comment on lines +133 to +135
const lastEventT = log.events.reduce((m, e) => Math.max(m, e.t), s.t);
const sceneEndT = end === Infinity ? lastEventT : end;
const holdT = Math.max(keyT, sceneEndT - 200); // just inside the final hold

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sample the final scene using the last captured frame

When this is the last scene, there is no following scene marker, and lastEventT is only the maximum event timestamp, not the take's last captured frame time. The recorder does not emit an event at the end of an action or at hold_ms, so a closing scene with a long action/hold still samples around firstInteraction + 800ms and can miss the late blank/error/overlay this change is meant to catch.

Useful? React with 👍 / 👎.

Comment thread src/cli/doctor.ts Outdated
Comment on lines +76 to +78
const { chromium } = await import("playwright").catch(() => {
throw new Error("playwright not installed — run `npm install`");
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Keep doctor checks from throwing on missing Playwright

When playwright is not installed or cannot be imported, this rethrows before the check's try/catch; doctor() does not catch individual check failures, so the command exits through the top-level exception path instead of returning a normal failed check and summary. This hits exactly the dependency-diagnosis path that supercut doctor is supposed to handle.

Useful? React with 👍 / 👎.

@Co-Messi Co-Messi left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep review pass: build and tests pass locally, but I found a few actionable correctness gaps in the hardening changes.

Comment thread src/director/qc.ts Outdated
// the last frame we can attribute to this scene; for the final scene `end`
// is Infinity, so fall back to the take's last captured frame time.
const lastEventT = log.events.reduce((m, e) => Math.max(m, e.t), s.t);
const sceneEndT = end === Infinity ? lastEventT : end;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still misses late failures in the final scene. For the last scene, lastEventT comes from events.json, but capture advances clock during hold_ms without emitting an event, while frames continue to be captured. So sceneEndT becomes the last action time rather than the last captured frame, and the promised final-hold sample can land before the hold where a late blank/error/banner appears. Consider reading the last frames-index.json t_source for the final scene (or passing it out of frameJpegB64) and add a regression with a final scene whose hold extends past the last action.

Comment thread src/director/sourceRoutes.ts Outdated
"coverage", ".vercel", ".cache", "__tests__", "test", "tests",
// A5: test/spec/fixture/story dirs hold fake pages and sample data — never
// real product routes — so keep them out of the crawl seeds and LLM prompt.
"e2e", "__mocks__", "stories", ".storybook", "cypress", "playwright", "fixtures",

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment and PR claim spec/fixture/story dirs are excluded, but spec is not actually in SKIP_DIRS. A repo with app/spec/.../page.tsx or pages/spec/... will still seed fake routes into the crawl/LLM prompt. Please add spec (and ideally a regression next to the new fixture-dir test).

Comment thread src/director/inventory.ts Outdated
// live app is genuinely costly AND it is rarely the intended payoff beat.
export const DESTRUCTIVE_RE =
/\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke)\b/i;
/\b(delete|deactivate|wipe|erase|destroy|cancel\s+(subscription|account|plan)|pay|purchase|buy\s+now|checkout|place\s+order|withdraw|confirm\s+(payment|order)|revoke|publish|transfer|regenerate|suspend|terminate|downgrade)\b/i;

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broad transfer term looks too aggressive for the stated criterion. It will hide benign/common demo actions such as “Transfer to list”, “Transfer ticket”, or “Transfer call”, even though the comment says only irreversible/high-blast-radius controls should be excluded. Consider narrowing to phrases like transfer funds|transfer ownership|transfer account|transfer domain (and restore a negative test for a benign transfer label).

Comment thread src/cli/index.ts
@@ -28,6 +28,14 @@ async function main(): Promise<number> {
case "doctor":

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doctor --help still runs the dependency checks instead of printing usage, despite the top-level help/README saying any command accepts --help. This can take seconds and can exit non-zero on a machine missing ffmpeg/Chromium. Please intercept rest.includes('--help') || rest.includes('-h') for doctor too, or narrow the docs to only generate/record/render.

- render: guard browser.close() in finally so server.close() always runs
  (a throwing close otherwise leaked the loopback render port).
- qc: sample the final scene's hold against the last captured FRAME, not the
  last event — frames continue through hold_ms with no event, so a late
  blank/error during a closing hold was still missed.
- doctor: import playwright INSIDE the check's try so a missing/broken install
  surfaces as a failed check instead of throwing past doctor().
- sourceRoutes: actually add `spec`/`specs` to SKIP_DIRS (was claimed, not done).
- inventory: narrow `transfer` to funds/money/ownership/account/domain so benign
  "Transfer to list/ticket/call" stay filmable.
- cli: `doctor --help` prints usage instead of running the (slow, possibly
  non-zero) dependency checks.

Tests: add benign-transfer negatives + Transfer-ownership positive, and a `spec`
dir to the source-route exclusion regression. Build clean; 83 unit tests pass.
@Co-Messi Co-Messi merged commit 19f3067 into main Jun 22, 2026
2 checks passed
@Co-Messi Co-Messi deleted the fix-roast-hardening branch June 22, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant