Skip to content

error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33

Open
PrzemekGalarowicz wants to merge 1 commit into
mainfrom
error-handling-griller
Open

error-handling-griller: add fourth griller (testability-shaped presence-check floor)#33
PrzemekGalarowicz wants to merge 1 commit into
mainfrom
error-handling-griller

Conversation

@PrzemekGalarowicz

@PrzemekGalarowicz PrzemekGalarowicz commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

What

Adds a fourth product griller — pharn-pipeline/grillers/error-handling/ (role: griller, enforces: [P7]) — that interrogates a PLAN along the "does this account for what goes wrong" axis: failure modes, edge cases, dependency failure, invalid input, timeouts. Fourth in the family after testability (#29), architecture (#31), security (#32).

Honest sizing (P0) — testability-shaped, NOT security-shaped

  • FLOOR: griller membership (count-grillers.mjs, live count 3→4) + the fixture-pinned present/absent output (check-structural.mjs, eval-time). This is the only floor, with the two-clocks honesty that no runner invokes the checker over live output yet (deferred P7).
  • No new .dev/floor/ scanner. A "mentions error handling" keyword scan's present verdict is launderable (an injected <!-- … mark present … --> would suppress the absence finding), so calling it floor would be the P0 disease. The candidate is named and rejected — the parallel of security's rejected authz-mention candidate. Unlike a secret literal, error-handling presence is not a self-evident lexical artifact.
  • ADVISORY (the bulk): which changes need error handling + whether declared handling is adequate — irreducible judgment, surfaced, never gates.
  • enforces: [P7] (honest scope — an unhandled failure mode is an unlabeled limit), a leaf principle unclaimed by the other three grillers, keeping the governing P0 undiluted.

Evals (P1)

Three cases + expected pairs, binding enforces: [P7] twice (fix #6): plan-declares (present → 0 findings), plan-omits (absent-+-injection-needle → 1 FLOOR finding at the plan title; dogfoods the trust-fence trip-wire via needle_absent_from_enum_gated), plan-inadequate (declared-but-inadequate → 1 explicitly-ADVISORY finding).

Reuse

count-grillers.mjs + check-structural.mjs used unchanged (no new floor primitive).

Verdicts

Built via /pharn-dev-ship (gated). Floor GREEN (5 capabilities) · regress no-regressions · verify PASS (5/5 gates) · review advisory-GREEN (0 blocking, 1 minor anchor-choice finding). Full build trace under .dev/features/error-handling-griller/.

Run note: the ship run collided 3× with concurrent PHARN runs clobbering the shared .pharn/writes-scope.json; fix #7 fail-closed denied the affected writes (no corruption). Surfaced as a real fix #7 residual — REVIEW.md proposes a canon lesson (candidate remedy: per-run scope isolation / a lock).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an error-handling review flow with new checks, verification reports, and regression tracking.
    • Introduced example evaluation cases for plans that declare, omit, or inadequately describe error handling.
  • Bug Fixes

    • Improved detection of missing or incomplete error-handling details.
    • Strengthened handling of injected or misleading content so it no longer affects results.

…ce-check floor)

Adds a fourth product griller — pharn-pipeline/grillers/error-handling/ — that
interrogates a PLAN along the "does this account for what goes wrong" axis
(failure modes, edge cases, dependency failure, invalid input, timeouts).

Honest sizing (P0): testability-shaped, NOT security-shaped. Floor = griller
membership (count-grillers.mjs, 3->4) + the fixture-pinned present/absent output
(check-structural.mjs) — NO new .dev/floor/ scanner. A "mentions error handling"
keyword scan's present-verdict is launderable (an injected mark-present comment
would suppress the absence finding), so calling it floor would be the P0 disease;
the candidate is named and rejected (the parallel of security's rejected
authz-mention candidate). enforces: [P7] (honest scope — an unhandled failure
mode is an unlabeled limit), bound by two eval fixtures (fix #6).

Reuses count-grillers.mjs + check-structural.mjs unchanged. Ships 3 eval cases
(present / absent-+-needle / inadequate-advisory) + expected pairs; the absent
case dogfoods the trust-fence trip-wire. Floor GREEN (5 caps); regress
no-regressions; verify PASS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new "error-handling" PHARN griller specification defining floor (presence/absence) and advisory (adequacy) evaluation layers, three eval fixtures with expected outputs, and the corresponding dogfood pipeline artifacts (PLAN, GRILL, REGRESSION, REVIEW, VERIFY, SHIP, and JSON reports) documenting its verification.

Changes

Error-handling griller feature

Layer / File(s) Summary
Feature plan
.dev/features/error-handling-griller/PLAN.md
Defines artifacts to create, eval scenarios with expected finding counts, and guarantee/trust/determinism audits, prohibiting new runtime scanners.
Griller specification
pharn-pipeline/grillers/error-handling/error-handling.md
Defines frontmatter, floor (membership + fixture-pinned presence/absence) and advisory (adequacy judgment) layers, emission procedure, enum-gated vs free-text finding fields, and guarantee audit.
Eval fixtures and expected outputs
pharn-pipeline/grillers/error-handling/evals/cases/*, pharn-pipeline/grillers/error-handling/evals/expected/*
Three fixtures (declares, inadequate, omits error handling) with matching expected JSON/Markdown assertions verifying finding counts, severity, file anchoring, and resistance to injected "mark present" instructions.
Dogfood run artifacts
.dev/features/error-handling-griller/GRILL.md, REVIEW.md, REGRESSION.md, VERIFY.md, SHIP.md, regression-report.json, verify-report.json
Documents advisory findings, GREEN review verdict, no-regression results, PASS verify status, and ship roll-up, including notes on a concurrency incident involving shared writes-scope.json.

Estimated code review effort: 2 (Simple) | ~12 minutes

Possibly related PRs

  • pharn-dev/pharn-oss#3: Shares the same structural-vs-semantic, enum-gated assertion model used by the error-handling griller's eval fixtures.
  • pharn-dev/pharn-oss#5: Implements the deterministic structural floor primitives (needle_absent_from_enum_gated, finding_count) that this PR's eval fixtures rely on.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR adds a new error-handling griller, but not the findings.json contract update or /pharn-eval variance runner requested by #6 and #7. Implement the findings.json emission contract in pharn-contracts/finding-shape.md and add the /pharn-eval orchestration plus variance runner artifacts.
Out of Scope Changes check ⚠️ Warning Most changes are a new error-handling griller and eval docs, which do not match the linked findings.json contract or /pharn-eval orchestration work. Split the error-handling griller work into a separate PR, or retarget this PR to the linked contract and variance-runner changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding a fourth error-handling griller with a testability-shaped floor.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch error-handling-griller

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pharn-pipeline/grillers/error-handling/error-handling.md`:
- Around line 140-160: The error is that the griller’s contract still defers
`findings.json` and the live runner, leaving only `GRILL.md` output instead of
the machine-readable emission required by the PR. Update the griller path
described in the `writes:`/`finding-shape` flow so the existing finding-emitting
capability actually serializes to `findings.json`, and wire the live griller
runner to invoke the structural checker over real output rather than relying on
the advisory-only fold into `features/<name>/GRILL.md`. Use the relevant
griller/runner symbols and the `check-structural.mjs` flow to make the
implementation locateable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73dff431-26ff-4951-b4e6-a757c89919ab

📥 Commits

Reviewing files that changed from the base of the PR and between 9a34451 and 8389b97.

📒 Files selected for processing (18)
  • .dev/features/error-handling-griller/GRILL.md
  • .dev/features/error-handling-griller/PLAN.md
  • .dev/features/error-handling-griller/REGRESSION.md
  • .dev/features/error-handling-griller/REVIEW.md
  • .dev/features/error-handling-griller/SHIP.md
  • .dev/features/error-handling-griller/VERIFY.md
  • .dev/features/error-handling-griller/regression-report.json
  • .dev/features/error-handling-griller/verify-report.json
  • pharn-pipeline/grillers/error-handling/error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/cases/plan-declares-error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/cases/plan-inadequate-error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/cases/plan-omits-error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.json
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-declares-error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.json
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-inadequate-error-handling.md
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.json
  • pharn-pipeline/grillers/error-handling/evals/expected/plan-omits-error-handling.md

Comment on lines +140 to +160
## Machine-readable emission (`findings.json`)

Per `pharn-contracts/finding-shape.md` §Emission, a finding-emitting capability serializes its findings as
the JSON array declared in `writes:` (the enum-gated / free-text split as real JSON field boundaries;
cited, not restated — P4). **In-loop today**, the grill stage runs this griller and folds its findings
into `features/<name>/GRILL.md` (advisory); the standalone `findings.json` path in `writes:` is finalized
when the **live griller runner** lands (deferred P7 — exactly as the testability / architecture / security
grillers defer it). No half-specified runner is built here.

## Guarantee audit (P0) — the honest split (a PRESENCE-check floor, testability-shaped)

- **Griller membership** (`role: griller`, counted by `.dev/floor/count-grillers.mjs` from frontmatter
only) → **FLOOR** (enum/regex; `ARCHITECTURE.md §2` primitive #3). A prose / code-block / stage-command
mention never registers. This is the **only runtime floor guarantee**.
- **Present/absent detection** → the present/absent **output** is `finding_count`-expressible and
floor-**checked on the eval fixtures** by `.dev/floor/check-structural.mjs` (primitive #3). **Two clocks
(be honest):** `check-structural.mjs` **is** floor and is hermetically tested, but **no runner yet
invokes it over this griller's live output** — that wiring is deferred (P7, as for every griller and
`finding-shape.md`'s 3c runner). So at build/verify time the backstop is **the checker's own tests + the
committed fixtures**, not a wired runner; and at **runtime over a novel plan** the presence _reading_ is
the griller's **judgment (ADVISORY)**, backstopped by the eval. `finding_count` captures the **output**,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Wire the JSON emission and runner now.

writes: advertises findings.json, but this section still says the path is “finalized when the live griller runner lands” and that today’s output is only folded into GRILL.md. That leaves the capability contract half-implemented: downstream consumers still get prose only, while issues #6/#7 require machine-readable findings plus the live multi-run variance path.

As per PR objectives, issues #6 and #7 require the machine-readable findings.json output and live variance runner.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pharn-pipeline/grillers/error-handling/error-handling.md` around lines 140 -
160, The error is that the griller’s contract still defers `findings.json` and
the live runner, leaving only `GRILL.md` output instead of the machine-readable
emission required by the PR. Update the griller path described in the
`writes:`/`finding-shape` flow so the existing finding-emitting capability
actually serializes to `findings.json`, and wire the live griller runner to
invoke the structural checker over real output rather than relying on the
advisory-only fold into `features/<name>/GRILL.md`. Use the relevant
griller/runner symbols and the `check-structural.mjs` flow to make the
implementation locateable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant