Skip to content

[v0.8] Add synthetic regression cases for evidence/release blockers #96

Description

@Stahl-G

Parent: #89

Depends on: #90, #91, #93, #94

Summary

Add public-safe synthetic regression cases proving that v0.8 can distinguish workflow completion from evidence readiness and release eligibility.

The core expected behavior:

workflow_status: pass
semantic_review_status: completed
evidence_status: conditional
release_status: blocked

Motivation

The real-world motivating case involved a report that was traceable and finalized but still not ready for institution-branded research distribution because of mixed metric scope, weak evidence tiers, missing official policy/legal/company sources, and branding authorization risk.

Do not add that real report to the repo. Add synthetic fixtures that reproduce the control failure patterns without using real institution branding or private material.

Proposed fixture directories

tests/fixtures/release_cases/unauthorized_institution_branding/
tests/fixtures/release_cases/mixed_fund_flow_scope/
tests/fixtures/release_cases/media_only_legal_policy/
tests/fixtures/release_cases/company_event_missing_latest_official_check/
tests/fixtures/release_cases/third_party_price_snapshot_formal_block/
tests/fixtures/release_cases/formal_release_missing_human_approval/

Required synthetic cases

1. Unauthorized institution branding

A delivery brief contains terms like:

Confidential — Internal Use Only
Example Securities Metals Weekly
for Example Securities research use

No authorization config is present.

Expected:

release_status: blocked
blocking_reason: branding_authorization

2. Mixed fund-flow scope

Synthetic claims include:

  • daily sector net inflow from Source A;
  • weekly concept net inflow from Source B;
  • ETF subscription from Source C.

They are presented together as if comparable.

Expected:

evidence_status: conditional_or_blocked
finding_type: metric_scope_conflict

3. Media-only legal/policy claim

A legal or policy claim uses a reputable media source but no official source.

Expected in research_review and above:

evidence_status: blocked
finding_type: official_source_missing

Expected in internal_draft:

evidence_status: conditional

4. Company event missing latest official check

A company event claim relies on a media summary and lacks an official filing/announcement check.

Expected:

finding_type: latest_official_check_missing
blocked_in: research_review

5. Third-party price snapshot formal block

A market price claim uses a third-party price page. It is acceptable as an internal draft caveat but not as a formal release candidate without exchange/terminal source.

Expected:

internal_draft: conditional
formal_release_candidate: blocked

6. Formal release missing human approval

Evidence and package checks pass, but required human approval is missing.

Expected:

release_status: blocked
blocking_reason: human_approval_missing

Tests

Add tests such as:

tests/test_v080_release_cases.py
tests/test_evidence_policy_modes.py
tests/test_release_readiness_cases.py

Suggested test names:

def test_unauthorized_branding_blocks_release(): ...
def test_mixed_metric_scope_blocks_formal_but_allows_internal_draft(): ...
def test_media_only_legal_policy_blocks_research_review(): ...
def test_company_event_missing_latest_official_check_blocks_research_review(): ...
def test_third_party_price_snapshot_is_conditional_not_pass(): ...
def test_formal_release_requires_human_approval(): ...

Acceptance criteria

  • Fixtures are synthetic and public-safe.
  • No real institution-branded report is committed.
  • Each fixture documents the intended failure pattern.
  • Tests prove mode-specific behavior.
  • At least one fixture produces workflow_status: pass and release_status: blocked.
  • Tests run without live web, LLM calls, or source recrawl.
  • All v0.8 artifacts validate in fixture runs.

Non-goals

  • Do not include private or employer data.
  • Do not include the real motivating report.
  • Do not require live internet.
  • Do not evaluate prose quality.
  • Do not claim output-quality improvement from these cases alone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions