error-handling-griller: add fourth griller (testability-shaped presence-check floor) by PrzemekGalarowicz · Pull Request #33 · pharn-dev/pharn-oss

PrzemekGalarowicz · 2026-07-01T23:33:25Z

What

Adds a fourth product griller — pharn-pipeline/grillers/error-handling/ (role: griller, enforces: [P7]) — that interrogates a PLAN along the "does this account for what goes wrong" axis: failure modes, edge cases, dependency failure, invalid input, timeouts. Fourth in the family after testability (#29), architecture (#31), security (#32).

Honest sizing (P0) — testability-shaped, NOT security-shaped

FLOOR: griller membership (count-grillers.mjs, live count 3→4) + the fixture-pinned present/absent output (check-structural.mjs, eval-time). This is the only floor, with the two-clocks honesty that no runner invokes the checker over live output yet (deferred P7).
No new .dev/floor/ scanner. A "mentions error handling" keyword scan's present verdict is launderable (an injected  would suppress the absence finding), so calling it floor would be the P0 disease. The candidate is named and rejected — the parallel of security's rejected authz-mention candidate. Unlike a secret literal, error-handling presence is not a self-evident lexical artifact.
ADVISORY (the bulk): which changes need error handling + whether declared handling is adequate — irreducible judgment, surfaced, never gates.
enforces: [P7] (honest scope — an unhandled failure mode is an unlabeled limit), a leaf principle unclaimed by the other three grillers, keeping the governing P0 undiluted.

Evals (P1)

Three cases + expected pairs, binding enforces: [P7] twice (fix #6): plan-declares (present → 0 findings), plan-omits (absent-+-injection-needle → 1 FLOOR finding at the plan title; dogfoods the trust-fence trip-wire via needle_absent_from_enum_gated), plan-inadequate (declared-but-inadequate → 1 explicitly-ADVISORY finding).

Reuse

count-grillers.mjs + check-structural.mjs used unchanged (no new floor primitive).

Verdicts

Built via /pharn-dev-ship (gated). Floor GREEN (5 capabilities) · regress no-regressions · verify PASS (5/5 gates) · review advisory-GREEN (0 blocking, 1 minor anchor-choice finding). Full build trace under .dev/features/error-handling-griller/.

Run note: the ship run collided 3× with concurrent PHARN runs clobbering the shared .pharn/writes-scope.json; fix #7 fail-closed denied the affected writes (no corruption). Surfaced as a real fix #7 residual — REVIEW.md proposes a canon lesson (candidate remedy: per-run scope isolation / a lock).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added an error-handling review flow with new checks, verification reports, and regression tracking.
- Introduced example evaluation cases for plans that declare, omit, or inadequately describe error handling.
Bug Fixes
- Improved detection of missing or incomplete error-handling details.
- Strengthened handling of injected or misleading content so it no longer affects results.

…ce-check floor) Adds a fourth product griller — pharn-pipeline/grillers/error-handling/ — that interrogates a PLAN along the "does this account for what goes wrong" axis (failure modes, edge cases, dependency failure, invalid input, timeouts). Honest sizing (P0): testability-shaped, NOT security-shaped. Floor = griller membership (count-grillers.mjs, 3->4) + the fixture-pinned present/absent output (check-structural.mjs) — NO new .dev/floor/ scanner. A "mentions error handling" keyword scan's present-verdict is launderable (an injected mark-present comment would suppress the absence finding), so calling it floor would be the P0 disease; the candidate is named and rejected (the parallel of security's rejected authz-mention candidate). enforces: [P7] (honest scope — an unhandled failure mode is an unlabeled limit), bound by two eval fixtures (fix #6). Reuses count-grillers.mjs + check-structural.mjs unchanged. Ships 3 eval cases (present / absent-+-needle / inadequate-advisory) + expected pairs; the absent case dogfoods the trust-fence trip-wire. Floor GREEN (5 caps); regress no-regressions; verify PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-07-01T23:33:48Z

📝 Walkthrough

Walkthrough

This PR adds a new "error-handling" PHARN griller specification defining floor (presence/absence) and advisory (adequacy) evaluation layers, three eval fixtures with expected outputs, and the corresponding dogfood pipeline artifacts (PLAN, GRILL, REGRESSION, REVIEW, VERIFY, SHIP, and JSON reports) documenting its verification.

Changes

Error-handling griller feature

Layer / File(s)	Summary
Feature plan `.dev/features/error-handling-griller/PLAN.md`	Defines artifacts to create, eval scenarios with expected finding counts, and guarantee/trust/determinism audits, prohibiting new runtime scanners.
Griller specification `pharn-pipeline/grillers/error-handling/error-handling.md`	Defines frontmatter, floor (membership + fixture-pinned presence/absence) and advisory (adequacy judgment) layers, emission procedure, enum-gated vs free-text finding fields, and guarantee audit.
Eval fixtures and expected outputs `pharn-pipeline/grillers/error-handling/evals/cases/`, `pharn-pipeline/grillers/error-handling/evals/expected/`	Three fixtures (declares, inadequate, omits error handling) with matching expected JSON/Markdown assertions verifying finding counts, severity, file anchoring, and resistance to injected "mark present" instructions.
Dogfood run artifacts `.dev/features/error-handling-griller/GRILL.md`, `REVIEW.md`, `REGRESSION.md`, `VERIFY.md`, `SHIP.md`, `regression-report.json`, `verify-report.json`	Documents advisory findings, GREEN review verdict, no-regression results, PASS verify status, and ship roll-up, including notes on a concurrency incident involving shared `writes-scope.json`.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related PRs

pharn-dev/pharn-oss#3: Shares the same structural-vs-semantic, enum-gated assertion model used by the error-handling griller's eval fixtures.
pharn-dev/pharn-oss#5: Implements the deterministic structural floor primitives (needle_absent_from_enum_gated, finding_count) that this PR's eval fixtures rely on.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR adds a new error-handling griller, but not the findings.json contract update or /pharn-eval variance runner requested by `#6` and `#7`.	Implement the findings.json emission contract in pharn-contracts/finding-shape.md and add the /pharn-eval orchestration plus variance runner artifacts.
Out of Scope Changes check	⚠️ Warning	Most changes are a new error-handling griller and eval docs, which do not match the linked findings.json contract or /pharn-eval orchestration work.	Split the error-handling griller work into a separate PR, or retarget this PR to the linked contract and variance-runner changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding a fourth error-handling griller with a testability-shaped floor.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch error-handling-griller

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pharn-pipeline/grillers/error-handling/error-handling.md`:
- Around line 140-160: The error is that the griller’s contract still defers
`findings.json` and the live runner, leaving only `GRILL.md` output instead of
the machine-readable emission required by the PR. Update the griller path
described in the `writes:`/`finding-shape` flow so the existing finding-emitting
capability actually serializes to `findings.json`, and wire the live griller
runner to invoke the structural checker over real output rather than relying on
the advisory-only fold into `features/<name>/GRILL.md`. Use the relevant
griller/runner symbols and the `check-structural.mjs` flow to make the
implementation locateable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73dff431-26ff-4951-b4e6-a757c89919ab

📥 Commits

Reviewing files that changed from the base of the PR and between 9a34451 and 8389b97.

📒 Files selected for processing (18)

.dev/features/error-handling-griller/GRILL.md
.dev/features/error-handling-griller/PLAN.md
.dev/features/error-handling-griller/REGRESSION.md
.dev/features/error-handling-griller/REVIEW.md
.dev/features/error-handling-griller/SHIP.md
.dev/features/error-handling-griller/VERIFY.md
.dev/features/error-handling-griller/regression-report.json
.dev/features/error-handling-griller/verify-report.json
pharn-pipeline/grillers/error-handling/error-handling.md
pharn-pipeline/grillers/error-handling/evals/cases/plan-declares-error-handling.md
pharn-pipeline/grillers/error-handling/evals/cases/plan-inadequate-error-handling.md
pharn-pipeline/grillers/error-handling/evals/cases/plan-omits-error-handling.md
pharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.json
pharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.md
pharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.json
pharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.md
pharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.json
pharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.md

coderabbitai · 2026-07-01T23:43:24Z

+## Machine-readable emission (`findings.json`)
+
+Per `pharn-contracts/finding-shape.md` §Emission, a finding-emitting capability serializes its findings as
+the JSON array declared in `writes:` (the enum-gated / free-text split as real JSON field boundaries;
+cited, not restated — P4). **In-loop today**, the grill stage runs this griller and folds its findings
+into `features/<name>/GRILL.md` (advisory); the standalone `findings.json` path in `writes:` is finalized
+when the **live griller runner** lands (deferred P7 — exactly as the testability / architecture / security
+grillers defer it). No half-specified runner is built here.
+
+## Guarantee audit (P0) — the honest split (a PRESENCE-check floor, testability-shaped)
+
+- **Griller membership** (`role: griller`, counted by `.dev/floor/count-grillers.mjs` from frontmatter
+  only) → **FLOOR** (enum/regex; `ARCHITECTURE.md §2` primitive #3). A prose / code-block / stage-command
+  mention never registers. This is the **only runtime floor guarantee**.
+- **Present/absent detection** → the present/absent **output** is `finding_count`-expressible and
+  floor-**checked on the eval fixtures** by `.dev/floor/check-structural.mjs` (primitive #3). **Two clocks
+  (be honest):** `check-structural.mjs` **is** floor and is hermetically tested, but **no runner yet
+  invokes it over this griller's live output** — that wiring is deferred (P7, as for every griller and
+  `finding-shape.md`'s 3c runner). So at build/verify time the backstop is **the checker's own tests + the
+  committed fixtures**, not a wired runner; and at **runtime over a novel plan** the presence _reading_ is
+  the griller's **judgment (ADVISORY)**, backstopped by the eval. `finding_count` captures the **output**,


🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Wire the JSON emission and runner now.

writes: advertises findings.json, but this section still says the path is “finalized when the live griller runner lands” and that today’s output is only folded into GRILL.md. That leaves the capability contract half-implemented: downstream consumers still get prose only, while issues #6/#7 require machine-readable findings plus the live multi-run variance path.

As per PR objectives, issues #6 and #7 require the machine-readable findings.json output and live variance runner.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pharn-pipeline/grillers/error-handling/error-handling.md` around lines 140 - 160, The error is that the griller’s contract still defers `findings.json` and the live runner, leaving only `GRILL.md` output instead of the machine-readable emission required by the PR. Update the griller path described in the `writes:`/`finding-shape` flow so the existing finding-emitting capability actually serializes to `findings.json`, and wire the live griller runner to invoke the structural checker over real output rather than relying on the advisory-only fold into `features/<name>/GRILL.md`. Use the relevant griller/runner symbols and the `check-structural.mjs` flow to make the implementation locateable.

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33

error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33
PrzemekGalarowicz wants to merge 1 commit into
mainfrom
error-handling-griller

PrzemekGalarowicz commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PrzemekGalarowicz commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Honest sizing (P0) — testability-shaped, NOT security-shaped

Evals (P1)

Reuse

Verdicts

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PrzemekGalarowicz commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading