feat(ci): reward-hacking integrity gate (delivery-loop hardening #1)#141
Conversation
|
Caution Review failedFailed to post review comments. GitHub was unavailable or timed out while CodeRabbit was posting the review. Please request a new review later if the pull request still needs one. This happened while posting 6 inline comments. Use ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (12)
⏰ Context from checks skipped due to timeout. (2)
🧰 Additional context used📓 Path-based instructions (1)vite.config.ts📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📓 Common learnings📚 Learning: 2026-06-12T13:01:00.346ZApplied to files:
🪛 LanguageTooldocs/adr/0005-delivery-loop-hardening.md[uncategorized] ~52-~52: The official name of this software platform is spelled with a capital “H”. (GITHUB) [uncategorized] ~77-~77: The official name of this software platform is spelled with a capital “H”. (GITHUB) CLAUDE.md[uncategorized] ~101-~101: The official name of this software platform is spelled with a capital “H”. (GITHUB) 🪛 zizmor (1.26.1).github/workflows/ci.yml[warning] 182-185: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false (artipacked) 📝 WalkthroughWalkthroughAdds a new Changesgate-integrity Reward-Hacking CI Gate
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9fbafa5b05
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // (a) coverage thresholds must not drop; the exclude list must not grow. | ||
| const baseCov = parseCoverageThresholds(input.coverage?.base); | ||
| const headCov = parseCoverageThresholds(input.coverage?.head); | ||
| if (baseCov && headCov) { |
There was a problem hiding this comment.
Reject missing coverage thresholds
If a PR removes the thresholds block, or removes one of its keys, parseCoverageThresholds() returns null for the head file and this if (baseCov && headCov) branch is skipped, so no violation is emitted. That disables Vitest coverage threshold enforcement without needing the gate-change label, which is the exact weakening this gate is meant to catch; a parseable base with a missing/unparseable head should be treated as a violation.
Useful? React with 👍 / 👎.
| /** Number of entries in the coverage `exclude: [...]` array. */ | ||
| export function parseCoverageExcludeCount(src) { | ||
| if (!src) return null; | ||
| const m = src.match(/exclude:\s*\[([\s\S]*?)\]/); |
There was a problem hiding this comment.
Count coverage excludes instead of test excludes
The current vite.config.ts declares test.exclude before coverage.exclude, so this first-match regex reads the top-level test exclusion list and returns 3 instead of the coverage exclusion count. A PR can add entries to coverage.exclude and this gate will still compare the unchanged test.exclude count, allowing files to be dropped from coverage without a gate-change label.
Useful? React with 👍 / 👎.
| const TEST_FILE_RE = /\.(test|spec)\.(ts|tsx|mjs|js)$/; | ||
| const MACHINERY_PATHS = new Set([ | ||
| "scripts/check-gate-integrity.mjs", | ||
| "scripts/check-bundle.mjs", |
There was a problem hiding this comment.
Allow legitimate bundle-budget tightening
Because scripts/check-bundle.mjs is listed as machinery, the CLI adds gate-machinery-modified for every budget edit before the budget comparison runs. That means lowering a budget, which the ADR/changelog describe as the always-allowed “good direction,” still fails unless a maintainer adds gate-change; the unit test for lowering a budget misses this because it calls evaluateGate() with machineryTouched: [].
Useful? React with 👍 / 👎.
| /** Count of `expect(` assertion calls. */ | ||
| export function countAssertions(src) { | ||
| if (!src) return 0; | ||
| return (src.match(/\bexpect\s*\(/g) || []).length; |
There was a problem hiding this comment.
Ignore commented-out assertions
countAssertions() counts raw expect( text, so changing an existing assertion to a comment such as // expect(value).toBe(...) removes the actual assertion while keeping the count unchanged. In that scenario the later comparison does not emit test-assertions-removed, letting a stripped test bypass the new gate; strip comments or parse tests before counting assertions.
Useful? React with 👍 / 👎.
9fbafa5 to
98beb8a
Compare
Turn the prose "don't weaken the tests / don't lower thresholds" rule into a deterministic check the in-loop agent cannot fool. scripts/check-gate-integrity.mjs diffs the branch against its merge base and fails if the verification itself was weakened — lowered coverage thresholds, raised bundle budgets, raised e2e retries, deleted/disabled tests or stripped assertions, or any edit to the gate/CI machinery (the script, .github/workflows/**, fixer.md's forbidden list). A maintainer allows a deliberate change with a visible `gate-change` PR label. Added as a PR-scoped required CI job; mirrored locally via `npm run check:gate`.
98beb8a to
ec7b5da
Compare
|
Thanks — all four findings from the Codex review were valid and are now fixed in
(This is the delivery loop working as intended — a different-model review caught real holes the author's own tests missed.) |
What
Adds
gate-integrity— a deterministic CI gate (and localnpm run check:gate) that the in-loop coding agent cannot fool. It diffs the branch against its merge base and fails the PR if the verification itself was weakened:excludelist grown (vite.config.ts)scripts/check-bundle.mjs)retriesraised (playwright.config.ts).skip/.only/.todo/xit/xdescribe) / stripped ofexpect()assertions.github/workflows/**, orfixer.md's forbidden list (the self-guard, so the gate can't be quietly removed)The good direction — raising a threshold, tightening a budget, adding tests — is always allowed. A deliberate weakening is allowed only when a maintainer attaches the visible
gate-changelabel; the job then passes but prints what it allowed, for the audit trail.Why
This is item #1 of a delivery-loop hardening effort (see
docs/adr/0005-delivery-loop-hardening.md). Until now, "don't weaken the tests / don't lower thresholds" was prose only inCLAUDE.md,docs/review-rubric.md, and thefixeragent — the weakest possible defense, since the agent that edits a threshold also passes the check that reads it. This turns that rule into an exit code.Why this PR carries the
gate-changelabelBy design, introducing the gate machinery trips
gate-machinery-modified(it adds the script + the CI job). That is the intended, auditable approval point — not a bug. The label is this PR's recorded authorization.Test / verification
scripts/check-gate-integrity.test.mjs— 20 unit tests (TDD) covering every detection branch + the "good direction allowed" + override cases.origin/main: fails with the 2 expected machinery violations; passes withGATE_HAS_OVERRIDE=true.src/server/stylesruntime code.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation