Skip to content

feat(grounding): solution-acceptance gate (solution_evaluate / solution_gate)#100

Merged
LanNguyenSi merged 2 commits into
masterfrom
feat/solution-acceptance-grounding-v1
May 30, 2026
Merged

feat(grounding): solution-acceptance gate (solution_evaluate / solution_gate)#100
LanNguyenSi merged 2 commits into
masterfrom
feat/solution-acceptance-grounding-v1

Conversation

@LanNguyenSi
Copy link
Copy Markdown
Owner

Summary

Relocates the verifier-gated "done" floor to its correct layer. The verdict / acceptance semantics now live in agent-grounding (alongside claim-gate); agent-preflight stays a pure evidence producer (the earlier preflight-hosted slice was reverted in agent-preflight#35 as a layer error). Implements the original design (6e045170).

What

  • packages/grounding-mcp/src/solution-verdict.ts: a HEAD-pinned verdict marker derived from a real preflight run --json (the verb runs preflight; the check set is the repo's committed .preflight.json, not call args), written outside the agent-writable evidence-ledger. evaluateGate passes only on a ready verdict at the current HEAD.
  • Two MCP verbs in src/server.ts: solution_evaluate (produce), solution_gate (consume).
  • Fails closed when preflight is unavailable or its output is unusable (writes no verdict).

Anti-hacking properties

  1. Derived, not claimed (ready from the real run).
  2. Producer != solver (the verb runs preflight; checks from committed config, not call args).
  3. HEAD-pinned (any rework invalidates a green verdict).
  4. No stale green (a not-ready run overwrites red).

Verification

  • tsc --noEmit clean; full grounding-mcp suite green (47 tests, 18 new in tests/solution-verdict.test.ts: core + producer via a stub preflight covering ready / not-ready-on-exit-1 / unparseable-output / empty-output / missing-bin / bad-id, all fail-closed).
  • Real end-to-end dogfood against the actual preflight CLI: evaluate records a ready verdict pinned to HEAD; gate at that HEAD passes; a new commit makes the gate deny with "stale verdict". The spawn-error path also fails closed.
  • Rigorous review subagent: APPROVE (empirically verified ENOENT-vs-exit-1 handling, 16 path-traversal inputs, no false-green path; confidence poisoning is inert since the gate keys only on ready+head). Its two non-blocking follow-ups (the two untested fail-closed branches + a dirty-tree doc note) were fixed in this branch.

Scope / follow-ups

v1 floor only. Tracked follow-ups: harness enforcement consuming the marker (harness cc43c7a4); composing CI / review / unresolved-hypothesis signals into the verdict (the real _from_session upgrade); a Goodhart test-count-delta guard; an LLM-judge layer; relative ranking (all in 6e045170).

Refs: design 6e045170; relocates the reverted agent-preflight slice (#34/#35).

nguyen-si-pp and others added 2 commits May 30, 2026 19:12
…on_gate)

Relocate the verifier-gated "done" floor to its correct layer: the verdict /
acceptance semantics live in agent-grounding (alongside claim-gate), not in
agent-preflight (reverted in agent-preflight #35 as a layer error). preflight
stays a pure evidence producer.

- src/solution-verdict.ts: a HEAD-pinned verdict marker derived from a real
  `preflight run --json` (producer != solver; the check set comes from the
  repo's committed .preflight.json, not call args), written outside the
  agent-writable evidence-ledger; evaluateGate passes only on a ready verdict
  at the current HEAD.
- server.ts: solution_evaluate (produce) + solution_gate (consume) MCP verbs.
- Anti-forge: derived-not-claimed, producer != solver, HEAD-pinned (rework
  invalidates a green verdict), no-stale-green; fails closed when preflight
  is unavailable.
- 16 tests (core + producer via a stub preflight) + README marker contract.

Refs: design 6e045170; reverted preflight slice #34/#35; harness wiring cc43c7a4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer follow-ups (inline): add tests for the two untested fail-closed
paths (preflight exits non-zero with unparseable output, and with no
output), and document that the verdict pins to the committed HEAD so a
re-run is needed after edits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LanNguyenSi LanNguyenSi merged commit dc7360a into master May 30, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants