Parent: #89
Depends on: #91
Summary
Add a semantic reviewer agent pack that rates evidence and release risks using fixed rubrics, producing output/intermediate/semantic_review_report.json.
The agents judge; they do not edit. Python validates and later converts judgments into evidence/release blockers.
Motivation
Several v0.8 risks are semantic and cannot be robustly solved by deterministic code alone:
- whether a source authority is sufficient for a claim type;
- whether a source actually supports the claim or only part of it;
- whether fund-flow / price / inventory metrics mix incompatible scopes;
- whether a legal/policy/company-event claim lacks official source coverage;
- whether institutional branding or confidential labels lack authorization context.
MABW should use agents for rubric-based review, but keep authority in deterministic schemas and policy packs.
Proposed agent files
Add Claude Code agent prompts first, then mirror to other runtimes if needed:
.claude/agents/source-authority-judge.md
.claude/agents/source-support-judge.md
.claude/agents/metric-scope-judge.md
.claude/agents/official-source-coverage-judge.md
.claude/agents/branding-authorization-judge.md
.claude/agents/release-committee-judge.md
Later runtime parity may mirror to:
.opencode/agents/
.codex/agents/
.agents/skills/
Inputs
Judges may read:
output/intermediate/audited_brief.md
output/intermediate/claim_ledger.json
output/source_appendix.md
output/delivery/brief.md
config.yaml
Judges must not mutate:
output/intermediate/audited_brief.md
output/intermediate/claim_ledger.json
output/delivery/*
Output
All judges contribute to:
output/intermediate/semantic_review_report.json
Output must satisfy the contract from #91.
Required judge roles
1. source-authority-judge
Rates whether each source authority fits the claim category.
Example finding:
{
"judge_id": "source_authority_judge",
"finding_type": "source_authority_insufficient",
"claim_category": "legal_trade_remedy",
"source_authority": "reputable_financial_media",
"required_authority_for_mode": "official_legal_regulatory",
"rating": 1,
"rating_label": "insufficient_for_formal_release",
"verification_path": "Attach official legal/regulatory text."
}
2. source-support-judge
Rates whether the cited source/evidence directly supports the claim.
Must identify:
- supported parts;
- unsupported or overbroad parts;
- recommended rewrite if the claim should be narrowed.
3. metric-scope-judge
Rates whether market metrics have comparable scope.
Must check:
- provider;
- universe;
- time window;
- unit;
- classification system;
- calculation method where available.
Typical blocker: multiple A-share fund-flow numbers are cited but come from different providers/universes/time windows.
4. official-source-coverage-judge
Rates whether legal, policy, company-event, exchange, and official statistics claims have appropriate official coverage.
Must distinguish:
- official text present;
- media source only;
- market expectation;
- latest official check missing.
5. branding-authorization-judge
Rates whether institution name, confidential/internal labels, or formal-distribution wording appears without authorization context.
Typical blocker:
Confidential — Internal Use Only
<Institution> research weekly
for <Institution> research use
6. release-committee-judge
Aggregates upstream semantic findings only. It must not re-litigate every claim or override Python policy packs.
Prompt guardrails
Every judge prompt must include:
You are a reviewer, not an editor.
Do not modify the brief, Claim Ledger, delivery files, or source files.
Return schema-valid JSON only.
Use the rubric.
If uncertain, choose the lower release eligibility and require human review.
Your finding is evidence for downstream policy; it is not final publication authority.
Rating scale
Use 0-4:
4: strong / required authority present / directly supported
3: usable, minor caveat
2: draft-usable but not research/formal-release ready
1: insufficient for the selected mode
0: unsupported, misleading, or unauthorized
Acceptance criteria
Non-goals
- Do not implement automatic fact proof.
- Do not implement source recrawl.
- Do not let reviewer agents write evidence_report or release_readiness_report.
- Do not change final report content in this issue.
Parent: #89
Depends on: #91
Summary
Add a semantic reviewer agent pack that rates evidence and release risks using fixed rubrics, producing
output/intermediate/semantic_review_report.json.The agents judge; they do not edit. Python validates and later converts judgments into evidence/release blockers.
Motivation
Several v0.8 risks are semantic and cannot be robustly solved by deterministic code alone:
MABW should use agents for rubric-based review, but keep authority in deterministic schemas and policy packs.
Proposed agent files
Add Claude Code agent prompts first, then mirror to other runtimes if needed:
Later runtime parity may mirror to:
Inputs
Judges may read:
Judges must not mutate:
Output
All judges contribute to:
Output must satisfy the contract from #91.
Required judge roles
1. source-authority-judge
Rates whether each source authority fits the claim category.
Example finding:
{ "judge_id": "source_authority_judge", "finding_type": "source_authority_insufficient", "claim_category": "legal_trade_remedy", "source_authority": "reputable_financial_media", "required_authority_for_mode": "official_legal_regulatory", "rating": 1, "rating_label": "insufficient_for_formal_release", "verification_path": "Attach official legal/regulatory text." }2. source-support-judge
Rates whether the cited source/evidence directly supports the claim.
Must identify:
3. metric-scope-judge
Rates whether market metrics have comparable scope.
Must check:
Typical blocker: multiple A-share fund-flow numbers are cited but come from different providers/universes/time windows.
4. official-source-coverage-judge
Rates whether legal, policy, company-event, exchange, and official statistics claims have appropriate official coverage.
Must distinguish:
5. branding-authorization-judge
Rates whether institution name, confidential/internal labels, or formal-distribution wording appears without authorization context.
Typical blocker:
6. release-committee-judge
Aggregates upstream semantic findings only. It must not re-litigate every claim or override Python policy packs.
Prompt guardrails
Every judge prompt must include:
Rating scale
Use 0-4:
Acceptance criteria
verification_pathfor every warning/blocker.Non-goals