Skip to content

[v0.8] Add semantic reviewer agent pack for evidence and release risk rating #92

Description

@Stahl-G

Parent: #89

Depends on: #91

Summary

Add a semantic reviewer agent pack that rates evidence and release risks using fixed rubrics, producing output/intermediate/semantic_review_report.json.

The agents judge; they do not edit. Python validates and later converts judgments into evidence/release blockers.

Motivation

Several v0.8 risks are semantic and cannot be robustly solved by deterministic code alone:

  • whether a source authority is sufficient for a claim type;
  • whether a source actually supports the claim or only part of it;
  • whether fund-flow / price / inventory metrics mix incompatible scopes;
  • whether a legal/policy/company-event claim lacks official source coverage;
  • whether institutional branding or confidential labels lack authorization context.

MABW should use agents for rubric-based review, but keep authority in deterministic schemas and policy packs.

Proposed agent files

Add Claude Code agent prompts first, then mirror to other runtimes if needed:

.claude/agents/source-authority-judge.md
.claude/agents/source-support-judge.md
.claude/agents/metric-scope-judge.md
.claude/agents/official-source-coverage-judge.md
.claude/agents/branding-authorization-judge.md
.claude/agents/release-committee-judge.md

Later runtime parity may mirror to:

.opencode/agents/
.codex/agents/
.agents/skills/

Inputs

Judges may read:

output/intermediate/audited_brief.md
output/intermediate/claim_ledger.json
output/source_appendix.md
output/delivery/brief.md
config.yaml

Judges must not mutate:

output/intermediate/audited_brief.md
output/intermediate/claim_ledger.json
output/delivery/*

Output

All judges contribute to:

output/intermediate/semantic_review_report.json

Output must satisfy the contract from #91.

Required judge roles

1. source-authority-judge

Rates whether each source authority fits the claim category.

Example finding:

{
  "judge_id": "source_authority_judge",
  "finding_type": "source_authority_insufficient",
  "claim_category": "legal_trade_remedy",
  "source_authority": "reputable_financial_media",
  "required_authority_for_mode": "official_legal_regulatory",
  "rating": 1,
  "rating_label": "insufficient_for_formal_release",
  "verification_path": "Attach official legal/regulatory text."
}

2. source-support-judge

Rates whether the cited source/evidence directly supports the claim.

Must identify:

  • supported parts;
  • unsupported or overbroad parts;
  • recommended rewrite if the claim should be narrowed.

3. metric-scope-judge

Rates whether market metrics have comparable scope.

Must check:

  • provider;
  • universe;
  • time window;
  • unit;
  • classification system;
  • calculation method where available.

Typical blocker: multiple A-share fund-flow numbers are cited but come from different providers/universes/time windows.

4. official-source-coverage-judge

Rates whether legal, policy, company-event, exchange, and official statistics claims have appropriate official coverage.

Must distinguish:

  • official text present;
  • media source only;
  • market expectation;
  • latest official check missing.

5. branding-authorization-judge

Rates whether institution name, confidential/internal labels, or formal-distribution wording appears without authorization context.

Typical blocker:

Confidential — Internal Use Only
<Institution> research weekly
for <Institution> research use

6. release-committee-judge

Aggregates upstream semantic findings only. It must not re-litigate every claim or override Python policy packs.

Prompt guardrails

Every judge prompt must include:

You are a reviewer, not an editor.
Do not modify the brief, Claim Ledger, delivery files, or source files.
Return schema-valid JSON only.
Use the rubric.
If uncertain, choose the lower release eligibility and require human review.
Your finding is evidence for downstream policy; it is not final publication authority.

Rating scale

Use 0-4:

4: strong / required authority present / directly supported
3: usable, minor caveat
2: draft-usable but not research/formal-release ready
1: insufficient for the selected mode
0: unsupported, misleading, or unauthorized

Acceptance criteria

  • Agent prompts exist for all six judge roles.
  • Prompts explicitly forbid editing content artifacts.
  • Prompts require schema-valid JSON.
  • Prompts include rating scale and judge-specific rubric.
  • Prompts require verification_path for every warning/blocker.
  • Prompts state that final authority remains deterministic policy + human approval.
  • At least one synthetic semantic review fixture validates through the contract from [v0.8] Add Claim Evidence Schema v3 and semantic review contract #91.

Non-goals

  • Do not implement automatic fact proof.
  • Do not implement source recrawl.
  • Do not let reviewer agents write evidence_report or release_readiness_report.
  • Do not change final report content in this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions