Skip to content

[v0.8 Epic] Agent-rated Evidence Readiness and Release Eligibility #89

Description

@Stahl-G

Summary

Implement v0.8 as Agent-rated Evidence Readiness and Release Eligibility.

This is not a claim that MABW can automatically prove truth or authorize publication. The v0.8 boundary is:

Semantic reviewer agents rate evidence and release risks using structured rubrics. Deterministic Python validators, policy packs, and control artifacts convert those ratings into warnings, blockers, allowed-use labels, and human-review requirements.

Why this matters

v0.7.x can already produce workflow traceability: material claims are linked to Claim Ledger entries and sources, and finalize produces reader-facing delivery artifacts. But recent real-use review exposed a gap:

  • a run can be workflow/audit pass while still not being ready for formal research or external release;
  • weak-but-present sources can pass traceability checks;
  • mixed metric scopes can look cited but remain non-comparable;
  • legal/policy/company-event claims can rely on media without official text;
  • institutional branding/confidential labels can create release risk even when facts are sourced.

v0.8 should make these risks explicit, machine-readable, and blocking where the selected mode requires it.

Target control surfaces

Add four run-scoped artifacts:

output/intermediate/semantic_review_report.json
output/intermediate/evidence_report.json
output/intermediate/release_readiness_report.json
output/intermediate/human_approval_ledger.json

semantic_review_report.json

Agent-written. Records rubric-based semantic judgments: source authority, source-to-claim support, metric scope, official-source coverage, and branding/authorization risk.

evidence_report.json

Python-written. Converts semantic findings and Claim Ledger metadata into mode-aware evidence status, warnings, blockers, and verification paths.

release_readiness_report.json

Python-written. Aggregates workflow, audit, quality gates, evidence report, finalize report, approval record, branding/use-boundary, and package hygiene into release status.

human_approval_ledger.json

Python-written from explicit human/reviewer commands. Records required reviewer status for formal_release_candidate mode.

Sub-issues / execution sequence

  1. [v0.8] Add mode registry and release_mode plumbing #90 — Add mode registry and release_mode plumbing.
  2. [v0.8] Add Claim Evidence Schema v3 and semantic review contract #91 — Add Claim Evidence Schema v3 and semantic review contract.
  3. [v0.8] Add semantic reviewer agent pack for evidence and release risk rating #92 — Add semantic reviewer agent pack for evidence and release risk rating.
  4. [v0.8] Add evidence_report control surface and policy-pack blocker mapping #93 — Add evidence_report control surface and policy-pack blocker mapping.
  5. [v0.8] Add release_readiness_report, branding gate, approval gate, and package hygiene checks #94 — Add release_readiness_report, branding gate, approval gate, and package hygiene checks.
  6. [v0.8] Add human approval record for release-candidate gating #95 — Add human approval record for release-candidate gating.
  7. [v0.8] Add synthetic regression cases for evidence/release blockers #96 — Add synthetic regression cases for evidence/release blockers.
  8. [v0.8] Update docs, roadmap, and release notes for agent-rated evidence/release boundary #97 — Update docs, roadmap, and release notes for agent-rated evidence/release boundary.

Expected end-state example

A traceable but not release-ready run should be able to produce:

workflow_status: pass
semantic_review_status: completed
evidence_status: conditional
release_status: blocked
release_mode: research_review
allowed_use:
  - internal discussion draft
  - analyst review input
not_allowed_use:
  - formal external publication
  - institution-branded distribution
  - investment recommendation
  - legal or compliance conclusion
blocking_reasons:
  - unauthorized institutional branding
  - mixed metric scope
  - official legal/policy source missing
  - company latest-official check missing

Acceptance criteria

  • release_mode is a first-class run concept.
  • Semantic reviewer outputs are schema-validated and cannot mutate content artifacts.
  • semantic_review_report.json exists and validates.
  • evidence_report.json exists and validates.
  • release_readiness_report.json exists and validates.
  • human_approval_ledger.json exists and validates where required.
  • Policy packs map semantic ratings to mode-specific blockers.
  • Internal draft mode can allow conditional evidence without formal release.
  • Formal release candidate mode requires stronger evidence and human approval.
  • Synthetic fixtures prove workflow_status: pass can coexist with release_status: blocked.
  • README and v0.8 release notes clearly state non-goals: no truth certification, no legal/investment advice, no automatic publication authorization.

Non-goals

  • No automatic semantic truth proof.
  • No automatic source recrawl.
  • No live market-data fetching as part of release readiness.
  • No runtime-specific artifact schema forks.
  • No claim that output quality improved without evaluation evidence.
  • No replacement of analyst, legal, compliance, or authorized publisher approval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions