Vision
Transform Wave's contract system from mechanical format checks into real quality gates where a separate agent reviews work products, provides structured feedback, and drives rework loops. This closes the gap between "output matches schema" and "output is actually correct."
Today a pipeline step can pass all contracts (JSON schema valid, tests green) while producing a no-op PR, wrong implementation, or missing the issue entirely. The contract system validates shape, not substance.
Target state
handover:
contract:
# Mechanical checks run first (fast, cheap)
- type: test_suite
command: "{{ project.contract_test_command }}"
on_failure: retry
# Agent review runs second (slower, costs tokens, but catches quality issues)
- type: agent_review
reviewer: navigator
model: claude-haiku
criteria_path: .wave/contracts/impl-review-criteria.md
context:
- artifact: assessment
- artifact: impl-plan
- source: git_diff
on_failure: rework
rework_step: fix-implement
The reviewer agent sees the git diff, upstream context (issue, plan), and review criteria. It outputs a structured verdict. If rework is needed, the feedback is injected as an artifact into the rework step — the implementer knows exactly what to fix.
Principles
- Separation of concerns — the agent that did the work never reviews its own work
- Cheap first, expensive second — mechanical checks (schema, tests) run before agent review to avoid wasting tokens on obviously broken output
- Feedback flows forward — review feedback is a first-class artifact, not a log message
- Invisible to simple pipelines — existing
json_schema / test_suite contracts work unchanged. agent_review is opt-in per step
- Cost-bounded — agent reviews have token budgets and model pinning (haiku by default)
Child Issues
1. Expand ContractResult with structured feedback
Scope: internal/contract/
Extend ContractResult to carry rich review output alongside the existing pass/fail:
type ContractResult struct {
Pass bool
Error string
Feedback *ReviewFeedback // nil for mechanical contracts
}
type ReviewFeedback struct {
Verdict string `json:"verdict"` // "pass", "rework", "fail"
Issues []Issue `json:"issues"` // specific problems
Suggestions []string `json:"suggestions"` // improvement ideas
Confidence float64 `json:"confidence"` // 0.0-1.0
}
type Issue struct {
Severity string `json:"severity"` // "critical", "major", "minor"
File string `json:"file,omitempty"`
Detail string `json:"detail"`
}
Backward-compatible — all existing validators return Feedback: nil. The executor checks Feedback only when non-nil.
Acceptance criteria:
2. Implement agent_review contract validator
Scope: internal/contract/agent_review.go (new file)
New contract type that spawns a lightweight agent to review step output.
Input to the reviewer:
- Git diff of the step's worktree changes
- Step output artifacts
- Upstream artifacts specified in
context config
- Review criteria from
criteria_path
- A system prompt enforcing structured JSON output
Output: ReviewFeedback JSON parsed from the agent's response.
Configuration:
type: agent_review
reviewer: navigator # persona name — MUST differ from step persona
model: claude-haiku # model override (cheap by default)
criteria_path: .wave/contracts/review.md # review prompt
context: # what the reviewer sees
- artifact: assessment # from named prior step artifact
- artifact: impl-plan
- source: git_diff # automatic: worktree diff
max_tokens: 8192 # budget cap for the review
timeout: 120 # seconds
Key constraint: The reviewer persona must be different from the step's persona. Validator should enforce this at parse time.
Acceptance criteria:
3. Wire adapter runner into contract validation
Scope: internal/pipeline/executor.go, internal/contract/
Currently validateContract() is purely mechanical — no adapter access. The agent_review type needs to spawn a subprocess.
Change: Pass the AdapterRunner into the contract validation path as an optional dependency. Existing types ignore it.
type ValidatorContext struct {
Runner adapter.AdapterRunner // for agent_review
WorkspaceDir string // for git_diff
ArtifactDir string // for context injection
Manifest *manifest.Manifest // for persona resolution
}
Acceptance criteria:
4. Feed review feedback into rework steps
Scope: internal/pipeline/executor.go — rework step creation path
When on_failure: rework triggers after an agent_review failure, the rework step should receive the full ReviewFeedback as an injected artifact, not just an error string.
Change: In the rework step creation path, if ContractResult.Feedback is non-nil:
- Write
ReviewFeedback to .wave/artifacts/review-feedback.json
- Inject it into the rework step's artifact context
- The rework step's prompt can reference specific issues
Rework step sees:
The implementation was reviewed by navigator and needs rework.
Review verdict: rework
Issues:
- [critical] Missing error handling in handlers_compare.go:45
- [major] Tests don't cover the empty-state path
Suggestions:
- Add a table-driven test for the comparison edge cases
Full review: .wave/artifacts/review-feedback.json
Acceptance criteria:
5. Add git_diff as automatic context source
Scope: internal/workspace/, internal/contract/
The reviewer needs to see what the step changed. Add a workspace method to produce the git diff, and wire it as an automatic context source for agent_review.
// In workspace package
func (w *Workspace) Diff() (string, error) {
// git diff HEAD in the worktree
}
When context includes source: git_diff, the contract validator calls workspace.Diff() and includes the output in the reviewer's context.
Acceptance criteria:
6. Contract composition — run multiple contracts in sequence
Scope: internal/contract/, internal/pipeline/executor.go
Today a step has one contract. For agent review to work properly, steps need multiple contracts that run in order: mechanical first, agent review second.
handover:
contracts: # plural — ordered list
- type: test_suite
command: "{{ project.contract_test_command }}"
on_failure: retry
- type: agent_review
reviewer: navigator
on_failure: rework
The executor runs contracts sequentially. If an early contract fails, later ones are skipped. Each contract can have its own on_failure policy.
Acceptance criteria:
7. Upgrade Wave's own pipelines with agent review
Scope: .wave/pipelines/, .wave/contracts/
After the infrastructure is in place, upgrade Wave's own pipelines to use agent_review:
Priority pipelines:
impl-issue — review the implement step's output (diff + tests)
impl-speckit — review at implement + create-pr steps
ops-pr-review — review the review output itself (meta-review)
Review criteria files to create:
.wave/contracts/impl-review-criteria.md — does the diff match the plan? tests adequate? no leaked files?
.wave/contracts/pr-review-criteria.md — is the PR description accurate? changes scoped correctly?
Acceptance criteria:
8. Observability — review verdicts in dashboard and retros
Scope: internal/webui/, internal/retro/
Make agent reviews visible in the dashboard and retrospectives:
- Run detail page: Show review verdict per step (pass/rework/fail with expandable issues)
- Retros: Track
review_rework as a friction point type (distinct from retry and contract_failure)
- Analytics: Review pass rate per pipeline, average review tokens, rework-after-review rate
Acceptance criteria:
Implementation Order
1. ContractResult expansion (no dependencies, safe)
2. git_diff context source (no dependencies, safe)
3. Contract composition (depends on 1)
4. Wire adapter into validation (depends on 1)
5. agent_review validator (depends on 2, 3, 4)
6. Rework feedback injection (depends on 5)
7. Upgrade Wave's pipelines (depends on 5, 6)
8. Dashboard + retro observability (depends on 5)
Issues 1-4 can be parallelized. Issue 5 is the core. Issues 7-8 validate the system end-to-end.
Cost Model
Agent reviews add token cost per step. With haiku at ~$0.25/MTok input:
| Scenario |
Context size |
Review cost |
Per-pipeline overhead |
| Small diff (< 5KB) |
~3K tokens |
~$0.001 |
~$0.003 (3 steps) |
| Medium diff (5-20KB) |
~10K tokens |
~$0.003 |
~$0.009 |
| Large diff (20-50KB) |
~25K tokens |
~$0.006 |
~$0.018 |
At 100 pipeline runs/day with medium diffs: ~$0.90/day additional cost. Negligible compared to the implementation steps themselves.
Non-Goals
- Replacing mechanical contracts —
test_suite and json_schema remain. Agent review supplements, doesn't replace
- Blocking on review for every step — agent review is opt-in per step, not global
- Self-review — the step persona reviewing its own output. This is architecturally prevented
- Human-in-the-loop review — that's the existing
gate mechanism. Agent review is fully automated
Vision
Transform Wave's contract system from mechanical format checks into real quality gates where a separate agent reviews work products, provides structured feedback, and drives rework loops. This closes the gap between "output matches schema" and "output is actually correct."
Today a pipeline step can pass all contracts (JSON schema valid, tests green) while producing a no-op PR, wrong implementation, or missing the issue entirely. The contract system validates shape, not substance.
Target state
The reviewer agent sees the git diff, upstream context (issue, plan), and review criteria. It outputs a structured verdict. If rework is needed, the feedback is injected as an artifact into the rework step — the implementer knows exactly what to fix.
Principles
json_schema/test_suitecontracts work unchanged.agent_reviewis opt-in per stepChild Issues
1. Expand ContractResult with structured feedback
Scope:
internal/contract/Extend
ContractResultto carry rich review output alongside the existing pass/fail:Backward-compatible — all existing validators return
Feedback: nil. The executor checksFeedbackonly when non-nil.Acceptance criteria:
ContractResulthas optionalFeedbackfieldReviewFeedbackhas JSON tags for serialization2. Implement
agent_reviewcontract validatorScope:
internal/contract/agent_review.go(new file)New contract type that spawns a lightweight agent to review step output.
Input to the reviewer:
contextconfigcriteria_pathOutput:
ReviewFeedbackJSON parsed from the agent's response.Configuration:
Key constraint: The reviewer persona must be different from the step's persona. Validator should enforce this at parse time.
Acceptance criteria:
agent_reviewregistered as a contract typeReviewFeedbackparsed from agent outputmax_tokens)ContractResult{Pass: true}if agent is unavailable (fail-open configurable)3. Wire adapter runner into contract validation
Scope:
internal/pipeline/executor.go,internal/contract/Currently
validateContract()is purely mechanical — no adapter access. Theagent_reviewtype needs to spawn a subprocess.Change: Pass the
AdapterRunnerinto the contract validation path as an optional dependency. Existing types ignore it.Acceptance criteria:
ValidatorContextwith optional adapterjson_schema,test_suite, etc.) unchangedagent_reviewvalidator uses the adapter to spawn review agent4. Feed review feedback into rework steps
Scope:
internal/pipeline/executor.go— rework step creation pathWhen
on_failure: reworktriggers after anagent_reviewfailure, the rework step should receive the fullReviewFeedbackas an injected artifact, not just an error string.Change: In the rework step creation path, if
ContractResult.Feedbackis non-nil:ReviewFeedbackto.wave/artifacts/review-feedback.jsonRework step sees:
Acceptance criteria:
.wave/artifacts/review-feedback.jsonon rework5. Add
git_diffas automatic context sourceScope:
internal/workspace/,internal/contract/The reviewer needs to see what the step changed. Add a workspace method to produce the git diff, and wire it as an automatic context source for
agent_review.When
contextincludessource: git_diff, the contract validator callsworkspace.Diff()and includes the output in the reviewer's context.Acceptance criteria:
Workspace.Diff()returns the uncommitted diff in the worktreesource: git_diffinagent_reviewcontext config6. Contract composition — run multiple contracts in sequence
Scope:
internal/contract/,internal/pipeline/executor.goToday a step has one contract. For agent review to work properly, steps need multiple contracts that run in order: mechanical first, agent review second.
The executor runs contracts sequentially. If an early contract fails, later ones are skipped. Each contract can have its own
on_failurepolicy.Acceptance criteria:
handover.contracts(plural) accepted alongside existinghandover.contract(singular)on_failurepolicycontractstill works (backward-compatible)7. Upgrade Wave's own pipelines with agent review
Scope:
.wave/pipelines/,.wave/contracts/After the infrastructure is in place, upgrade Wave's own pipelines to use
agent_review:Priority pipelines:
impl-issue— review the implement step's output (diff + tests)impl-speckit— review at implement + create-pr stepsops-pr-review— review the review output itself (meta-review)Review criteria files to create:
.wave/contracts/impl-review-criteria.md— does the diff match the plan? tests adequate? no leaked files?.wave/contracts/pr-review-criteria.md— is the PR description accurate? changes scoped correctly?Acceptance criteria:
impl-issueimplement step usesagent_reviewwith navigatorimpl-speckitimplement step usesagent_review8. Observability — review verdicts in dashboard and retros
Scope:
internal/webui/,internal/retro/Make agent reviews visible in the dashboard and retrospectives:
review_reworkas a friction point type (distinct fromretryandcontract_failure)Acceptance criteria:
review_reworktypeImplementation Order
Issues 1-4 can be parallelized. Issue 5 is the core. Issues 7-8 validate the system end-to-end.
Cost Model
Agent reviews add token cost per step. With haiku at ~$0.25/MTok input:
At 100 pipeline runs/day with medium diffs: ~$0.90/day additional cost. Negligible compared to the implementation steps themselves.
Non-Goals
test_suiteandjson_schemaremain. Agent review supplements, doesn't replacegatemechanism. Agent review is fully automated