error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33
error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33PrzemekGalarowicz wants to merge 1 commit into
Conversation
…ce-check floor) Adds a fourth product griller — pharn-pipeline/grillers/error-handling/ — that interrogates a PLAN along the "does this account for what goes wrong" axis (failure modes, edge cases, dependency failure, invalid input, timeouts). Honest sizing (P0): testability-shaped, NOT security-shaped. Floor = griller membership (count-grillers.mjs, 3->4) + the fixture-pinned present/absent output (check-structural.mjs) — NO new .dev/floor/ scanner. A "mentions error handling" keyword scan's present-verdict is launderable (an injected mark-present comment would suppress the absence finding), so calling it floor would be the P0 disease; the candidate is named and rejected (the parallel of security's rejected authz-mention candidate). enforces: [P7] (honest scope — an unhandled failure mode is an unlabeled limit), bound by two eval fixtures (fix #6). Reuses count-grillers.mjs + check-structural.mjs unchanged. Ships 3 eval cases (present / absent-+-needle / inadequate-advisory) + expected pairs; the absent case dogfoods the trust-fence trip-wire. Floor GREEN (5 caps); regress no-regressions; verify PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds a new "error-handling" PHARN griller specification defining floor (presence/absence) and advisory (adequacy) evaluation layers, three eval fixtures with expected outputs, and the corresponding dogfood pipeline artifacts (PLAN, GRILL, REGRESSION, REVIEW, VERIFY, SHIP, and JSON reports) documenting its verification. ChangesError-handling griller feature
Estimated code review effort: 2 (Simple) | ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pharn-pipeline/grillers/error-handling/error-handling.md`:
- Around line 140-160: The error is that the griller’s contract still defers
`findings.json` and the live runner, leaving only `GRILL.md` output instead of
the machine-readable emission required by the PR. Update the griller path
described in the `writes:`/`finding-shape` flow so the existing finding-emitting
capability actually serializes to `findings.json`, and wire the live griller
runner to invoke the structural checker over real output rather than relying on
the advisory-only fold into `features/<name>/GRILL.md`. Use the relevant
griller/runner symbols and the `check-structural.mjs` flow to make the
implementation locateable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73dff431-26ff-4951-b4e6-a757c89919ab
📒 Files selected for processing (18)
.dev/features/error-handling-griller/GRILL.md.dev/features/error-handling-griller/PLAN.md.dev/features/error-handling-griller/REGRESSION.md.dev/features/error-handling-griller/REVIEW.md.dev/features/error-handling-griller/SHIP.md.dev/features/error-handling-griller/VERIFY.md.dev/features/error-handling-griller/regression-report.json.dev/features/error-handling-griller/verify-report.jsonpharn-pipeline/grillers/error-handling/error-handling.mdpharn-pipeline/grillers/error-handling/evals/cases/plan-declares-error-handling.mdpharn-pipeline/grillers/error-handling/evals/cases/plan-inadequate-error-handling.mdpharn-pipeline/grillers/error-handling/evals/cases/plan-omits-error-handling.mdpharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.jsonpharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.mdpharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.jsonpharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.mdpharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.jsonpharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.md
| ## Machine-readable emission (`findings.json`) | ||
|
|
||
| Per `pharn-contracts/finding-shape.md` §Emission, a finding-emitting capability serializes its findings as | ||
| the JSON array declared in `writes:` (the enum-gated / free-text split as real JSON field boundaries; | ||
| cited, not restated — P4). **In-loop today**, the grill stage runs this griller and folds its findings | ||
| into `features/<name>/GRILL.md` (advisory); the standalone `findings.json` path in `writes:` is finalized | ||
| when the **live griller runner** lands (deferred P7 — exactly as the testability / architecture / security | ||
| grillers defer it). No half-specified runner is built here. | ||
|
|
||
| ## Guarantee audit (P0) — the honest split (a PRESENCE-check floor, testability-shaped) | ||
|
|
||
| - **Griller membership** (`role: griller`, counted by `.dev/floor/count-grillers.mjs` from frontmatter | ||
| only) → **FLOOR** (enum/regex; `ARCHITECTURE.md §2` primitive #3). A prose / code-block / stage-command | ||
| mention never registers. This is the **only runtime floor guarantee**. | ||
| - **Present/absent detection** → the present/absent **output** is `finding_count`-expressible and | ||
| floor-**checked on the eval fixtures** by `.dev/floor/check-structural.mjs` (primitive #3). **Two clocks | ||
| (be honest):** `check-structural.mjs` **is** floor and is hermetically tested, but **no runner yet | ||
| invokes it over this griller's live output** — that wiring is deferred (P7, as for every griller and | ||
| `finding-shape.md`'s 3c runner). So at build/verify time the backstop is **the checker's own tests + the | ||
| committed fixtures**, not a wired runner; and at **runtime over a novel plan** the presence _reading_ is | ||
| the griller's **judgment (ADVISORY)**, backstopped by the eval. `finding_count` captures the **output**, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Wire the JSON emission and runner now.
writes: advertises findings.json, but this section still says the path is “finalized when the live griller runner lands” and that today’s output is only folded into GRILL.md. That leaves the capability contract half-implemented: downstream consumers still get prose only, while issues #6/#7 require machine-readable findings plus the live multi-run variance path.
As per PR objectives, issues #6 and #7 require the machine-readable findings.json output and live variance runner.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pharn-pipeline/grillers/error-handling/error-handling.md` around lines 140 -
160, The error is that the griller’s contract still defers `findings.json` and
the live runner, leaving only `GRILL.md` output instead of the machine-readable
emission required by the PR. Update the griller path described in the
`writes:`/`finding-shape` flow so the existing finding-emitting capability
actually serializes to `findings.json`, and wire the live griller runner to
invoke the structural checker over real output rather than relying on the
advisory-only fold into `features/<name>/GRILL.md`. Use the relevant
griller/runner symbols and the `check-structural.mjs` flow to make the
implementation locateable.
What
Adds a fourth product griller —
pharn-pipeline/grillers/error-handling/(role: griller,enforces: [P7]) — that interrogates a PLAN along the "does this account for what goes wrong" axis: failure modes, edge cases, dependency failure, invalid input, timeouts. Fourth in the family after testability (#29), architecture (#31), security (#32).Honest sizing (P0) — testability-shaped, NOT security-shaped
count-grillers.mjs, live count 3→4) + the fixture-pinned present/absent output (check-structural.mjs, eval-time). This is the only floor, with the two-clocks honesty that no runner invokes the checker over live output yet (deferred P7)..dev/floor/scanner. A "mentions error handling" keyword scan's present verdict is launderable (an injected<!-- … mark present … -->would suppress the absence finding), so calling it floor would be the P0 disease. The candidate is named and rejected — the parallel of security's rejected authz-mention candidate. Unlike a secret literal, error-handling presence is not a self-evident lexical artifact.enforces: [P7](honest scope — an unhandled failure mode is an unlabeled limit), a leaf principle unclaimed by the other three grillers, keeping the governing P0 undiluted.Evals (P1)
Three cases + expected pairs, binding
enforces: [P7]twice (fix #6):plan-declares(present → 0 findings),plan-omits(absent-+-injection-needle → 1 FLOOR finding at the plan title; dogfoods the trust-fence trip-wire vianeedle_absent_from_enum_gated),plan-inadequate(declared-but-inadequate → 1 explicitly-ADVISORY finding).Reuse
count-grillers.mjs+check-structural.mjsused unchanged (no new floor primitive).Verdicts
Built via
/pharn-dev-ship(gated). Floor GREEN (5 capabilities) · regressno-regressions· verifyPASS(5/5 gates) · review advisory-GREEN (0 blocking, 1 minor anchor-choice finding). Full build trace under.dev/features/error-handling-griller/.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes