Add fix_task to verifier.json: mechanical-vs-authority repair routing#146
Merged
Conversation
…n guards The verify cycle (M1-M3) already computes one release verdict in build_release_decision() and projects it onto the report summaries and the agent-facing merge_verdict, but the discipline was enforced only by convention and docstrings. This makes it structural. - Define ReleaseDecisionStatus once in schemas/common.py and reuse it for AgentSummary.verdict, ReviewerSummary.verdict, VerifierVerdict, and ReleaseConsequence.decision (previously four hand-respelled Literals of the same vocabulary). Generated JSON schemas are byte-identical (generate_schemas.py --check clean) - no wire change, no schema bump. - Type _DECISION_TO_VERDICT as dict[ReleaseDecisionStatus, MergeVerdict] and add a totality test, so a new release status without a mapping fails CI instead of silently falling back to human_review_required. - Add a VerifierArtifact model_validator: when a head release_decision is present, merge_verdict and the decision copy MUST be exact projections of it - an inconsistent artifact is impossible to construct. - Centralize the no-decision verdict rule in merge_verdict_for(); delete the divergent inline _merge_verdict in the orchestrator (summaries defaulted to "passed", verify defaulted to "mergeable"/"unknown" - now one rule). - Add tests/test_verdict_contract.py pinning canonical-enum reuse across all verdict surfaces, projection totality + the exact table, the fail-safe (unknown status never auto-passes), and the validator. Full suite: 2302 passed, 4 skipped. Behavior unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…repair routing
fix_task is the single repair instruction a verify run hands to whoever acts
next. Routing is a pure projection of the head scan - never a model judgment:
a coding agent may fix mechanical gaps (every gating finding is autofix_safe),
but any authority gap (approval/idempotency evidence it cannot prove, a
weakened policy, a touched trust root, or degraded evidence) routes to a human
so the agent cannot invent its way to green.
- Add VerifierFixTask {actor, safe_to_attempt, instructions, forbidden_shortcuts,
verification_command} with a validator: an actor='human' task can never be
marked safe_to_attempt (the anti-reward-hacking guarantee).
- Add cli/verify/fix_task.py build_fix_task(): projects release_decision plus
per-finding autofix_safe / requires_human_review into the task; policy_weakened,
trust_root_touched, and insufficient_evidence force the human route.
- Wire fix_task into the verifier artifact and route first_next_action.actor
through the same fix_task so the two agent-facing signals never disagree.
- Render fix_task as the authoritative "Required before merge" block in the PR
comment (falls back to the prior agent-summary path when absent).
- Regenerate docs/verifier-schema.v0.1.json (additive optional field).
- Add tests/test_fix_task_contract.py.
Full suite: 2316 passed, 4 skipped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… single-source next action Addresses review feedback on #146: - Fail closed in the coding-agent route: require every gating finding to be explicitly mechanical (autofix_safe is True AND requires_human_review is False). A finding with None/False routing fields (stale/plugin/legacy) is now treated as an authority gap and routed to a human, never silently marked safe_to_attempt. - Shell-quote the refs in verification_command with shlex.quote (a valid git ref can contain ';', so the unquoted command was injectable); render it through the backtick-stripping _code helper so the PR comment stays Markdown-safe. - first_next_action borrows the agent-summary action only when its implied actor agrees with fix_task; otherwise actor, command, and why are all derived from fix_task so the two agent-facing signals cannot contradict. - Emit a human fix_task for a non-mergeable verdict with no head report (unknown), so the routing table holds and every non-mergeable verdict carries a fix_task. Full suite: 2323 passed, 4 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pengfei-threemoonslab
added a commit
that referenced
this pull request
May 30, 2026
* Surface the self-approval prohibition at the top of verifier.json When a PR weakens the release policy or touches a trust root, a coding agent must not silently self-approve a change to its own gate. That prohibition was only present inside a fix_task instruction (PR #146); promote it to the two fields an agent reads first. - Add _self_approval_note(): the explicit "a coding agent cannot self-approve that change - a human must review it" message for policy_weakened (taking precedence) and trust_root_touched. - verifier.json headline leads with the note when present. - human_review.why leads with the note, and a self-approval note forces human_review.required=True regardless of the verdict path. Full suite: 2346 passed, 4 skipped. No schema change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Self-approval: keep all top-level convenience fields consistent (review fix) Addresses review of #148: a self-approval note forced human_review.required=True, but can_merge_without_human and first_next_action still keyed only off merge_verdict, so the defensive (mergeable + note) path could emit "human review required" and "safe to merge" at once. - _can_merge_without_human returns False whenever a self-approval note exists. - _first_next_action routes to a human review (never the "safe to merge" action) when a self-approval note is present, including the fix_task-None defensive case. - Both thread capability_review from _build_verifier. Clean mergeable behavior (no note) is unchanged; covered by a regression test. Full suite: 2349 passed, 4 skipped. No schema change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
fix_tasktoverifier.json— the single, deterministic repair instruction a verify run hands to whoever must act next. Stacked on #145 (the verdict-contract lock); review that first.Why
A merge verdict of
blocked/human_review_requiredis just friction unless it also says what to do next and who can safely do it.fix_taskis that contract, and its routing encodes the product's core safety boundary: a coding agent may fix mechanical gaps, but an authority gap — approval/idempotency evidence it cannot prove, a weakened policy, a touched trust root — must route to a human so the agent cannot invent its way to green (reward hacking).Routing (a pure projection of the head scan, never a model judgment)
mergeableautofix_safecoding_agenttruerequires_human_reviewhumanfalsepolicy_weakened/trust_root_touchedhumanfalseinsufficient_evidence/unknownhumanfalseRouting is by the per-finding
autofix_safesignal, not the verdict label — a blocked-but-mechanical PR can still route to the agent, and a review-required PR with an authority gap routes to a human.Changes
VerifierFixTaskschema (actor,safe_to_attempt,instructions[],forbidden_shortcuts[],verification_command) with a validator enforcingactor="human"⇒safe_to_attempt=False(a human-authority task can never be marked agent-safe).cli/verify/fix_task.pybuild_fix_task()— deterministic projection;forbidden_shortcutsare the anti-reward-hacking guardrails (no suppression, no severity-lowering, no inventing evidence, no weakening the policy that evaluates the change).first_next_action.actornow routes through the samefix_taskso the two agent-facing signals can't disagree.fix_taskas the authoritative "Required before merge" block (falls back to the prior path when absent).docs/verifier-schema.v0.1.json(additive optional field).tests/test_fix_task_contract.py(13 tests).Verification
python scripts/generate_schemas.py --check: cleanruff check: clean🤖 Generated with Claude Code