feat(grounding): solution-acceptance gate (solution_evaluate / solution_gate)#100
Merged
Merged
Conversation
…on_gate) Relocate the verifier-gated "done" floor to its correct layer: the verdict / acceptance semantics live in agent-grounding (alongside claim-gate), not in agent-preflight (reverted in agent-preflight #35 as a layer error). preflight stays a pure evidence producer. - src/solution-verdict.ts: a HEAD-pinned verdict marker derived from a real `preflight run --json` (producer != solver; the check set comes from the repo's committed .preflight.json, not call args), written outside the agent-writable evidence-ledger; evaluateGate passes only on a ready verdict at the current HEAD. - server.ts: solution_evaluate (produce) + solution_gate (consume) MCP verbs. - Anti-forge: derived-not-claimed, producer != solver, HEAD-pinned (rework invalidates a green verdict), no-stale-green; fails closed when preflight is unavailable. - 16 tests (core + producer via a stub preflight) + README marker contract. Refs: design 6e045170; reverted preflight slice #34/#35; harness wiring cc43c7a4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer follow-ups (inline): add tests for the two untested fail-closed paths (preflight exits non-zero with unparseable output, and with no output), and document that the verdict pins to the committed HEAD so a re-run is needed after edits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Relocates the verifier-gated "done" floor to its correct layer. The verdict / acceptance semantics now live in agent-grounding (alongside claim-gate); agent-preflight stays a pure evidence producer (the earlier preflight-hosted slice was reverted in agent-preflight#35 as a layer error). Implements the original design (6e045170).
What
packages/grounding-mcp/src/solution-verdict.ts: a HEAD-pinned verdict marker derived from a realpreflight run --json(the verb runs preflight; the check set is the repo's committed.preflight.json, not call args), written outside the agent-writable evidence-ledger.evaluateGatepasses only on a ready verdict at the current HEAD.src/server.ts:solution_evaluate(produce),solution_gate(consume).Anti-hacking properties
readyfrom the real run).Verification
tsc --noEmitclean; full grounding-mcp suite green (47 tests, 18 new intests/solution-verdict.test.ts: core + producer via a stub preflight covering ready / not-ready-on-exit-1 / unparseable-output / empty-output / missing-bin / bad-id, all fail-closed).confidencepoisoning is inert since the gate keys only onready+head). Its two non-blocking follow-ups (the two untested fail-closed branches + a dirty-tree doc note) were fixed in this branch.Scope / follow-ups
v1 floor only. Tracked follow-ups: harness enforcement consuming the marker (harness
cc43c7a4); composing CI / review / unresolved-hypothesis signals into the verdict (the real_from_sessionupgrade); a Goodhart test-count-delta guard; an LLM-judge layer; relative ranking (all in6e045170).Refs: design
6e045170; relocates the reverted agent-preflight slice (#34/#35).