Skip to content

Add progressive degradation fixture levels#127

Merged
ProfRandom92 merged 3 commits into
mainfrom
codex/implement-pr-127-progressive-degradation-levels
May 19, 2026
Merged

Add progressive degradation fixture levels#127
ProfRandom92 merged 3 commits into
mainfrom
codex/implement-pr-127-progressive-degradation-levels

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Motivation

  • Introduce two intermediate deterministic degraded fixtures for the existing coding workflow fixture family so the admissibility degradation curve is progressive (positive → mild → moderate → severe). This enables more granular, deterministic validation of reachability and causality regressions without changing the severe negative fixture.

Description

  • Added two new fixtures: fixtures/coding_workflow_pr_review_mild_v1 and fixtures/coding_workflow_pr_review_moderate_v1, each containing original/, reconstructed/, expected/, and README.md and reusing the original artifact sources from coding_workflow_pr_review_v1/original/ while mutating only the reconstructed side.
  • Mild fixture: removes recovery edges from test_failure → (rollback,escalate_to_human) to break recovery reachability only; preserves ordering, causality, and no-orphan invariant; expects RECOVERY_PATH_INVALID and expected_admissible: false.
  • Moderate fixture: removes recovery edges and also removes the causal edge security_scan_faileddeploy_blocked to break causality; preserves ordering and (where possible) no-orphan invariant; expects RECOVERY_PATH_INVALID and CAUSAL_DEPENDENCY_LOSS and expected_admissible: false.
  • Tests and artifacts updated: tests/test_degradation_curve_generator.py expanded to reference the two new fixtures and added monotonicity and per-fixture failure assertions; regenerated artifacts/layered_admissibility_results.json (4-point curve) and updated docs/benchmarks/layered_admissibility.md to include the full curve and interpretation.

Testing

  • Ran the repository validation and unit test suites; all automated checks passed as listed below.
  • pytest tests/test_degradation_curve_generator.py -q — 12 passed.
  • pytest tests/test_admissibility_scorer.py -q — 10 passed.
  • pytest tests/test_dependency_graph_comparator.py -q — 10 passed.
  • pytest tests/test_contract_validator.py -q — 10 passed.
  • pytest tests/test_fixture_contract_bundle.py -q — 1 passed.
  • pytest tests/test_negative_fixture_contract_bundle.py -q — 1 passed.
  • pytest -q — 177 passed total.
  • npm run check — layout/typecheck/validate/build/test pipeline completed successfully.

Summary:

  • Added two intermediate fixture bundles (coding_workflow_pr_review_mild_v1, coding_workflow_pr_review_moderate_v1), updated tests, regenerated artifact and docs.

Changed files:

  • fixtures/coding_workflow_pr_review_mild_v1/** (new)
  • fixtures/coding_workflow_pr_review_moderate_v1/** (new)
  • tests/test_degradation_curve_generator.py (updated)
  • artifacts/layered_admissibility_results.json (regenerated)
  • docs/benchmarks/layered_admissibility.md (updated)

Risks:

  • Progressive levels are synthetic and scoped to this fixture family and may not cover all real-world degradation modes.
  • Scoring remains unweighted v1 and may need tuning for downstream analyses.

Next:

  • Add an SVG curve visualization, publish a fixture manifest, and expand progressive levels to additional fixture families.

Codex Task

@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
comptextv7 Ready Ready Preview, Comment May 19, 2026 11:16am

@netlify
Copy link
Copy Markdown

netlify Bot commented May 19, 2026

Deploy Preview for comptext-v7 canceled.

Name Link
🔨 Latest commit bc1d609
🔍 Latest deploy log https://app.netlify.com/projects/comptext-v7/deploys/6a0c469018716e0008caa934

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new deterministic degraded fixtures, mild_v1 and moderate_v1, to the coding workflow replay-validation suite, designed to isolate and combine specific contract failures related to recovery reachability and causality. The changes include comprehensive fixture data, updated benchmark documentation, and expanded test coverage for score monotonicity. Review feedback identifies that the expected_layer_scores in the new fixture metadata are incorrect and should be updated to reflect the relational layer failures. Additionally, the reviewer noted an inconsistency in the reconstructed traces, which include an extra setup_workspace event not found in the original traces.

Comment on lines +5 to +10
"expected_layer_scores": {
"structural": 1.0,
"relational": 1.0,
"operational": 1.0,
"governance": 1.0
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expected_layer_scores in this fixture metadata are incorrect. Since this fixture is designed to fail the recovery_path_available contract (which belongs to the relational layer), the expected relational score should be 0.6666666666666666 (2 out of 3 relational contracts passing), not 1.0. This inconsistency makes the fixture ground truth misleading.

Suggested change
"expected_layer_scores": {
"structural": 1.0,
"relational": 1.0,
"operational": 1.0,
"governance": 1.0
},
"expected_layer_scores": {
"structural": 1.0,
"relational": 0.6666666666666666,
"operational": 1.0,
"governance": 1.0
},

@@ -0,0 +1,14 @@
{
"events": [
{"action": "setup_workspace", "step": 0},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reconstructed trace contains a setup_workspace event at step 0 that is not present in the original/trace.json. While this doesn't break the current ordering contracts (which check for subsequences), it introduces an inconsistency between the ground truth and the reconstructed data that should be avoided in deterministic fixtures.

Comment on lines +5 to +10
"expected_layer_scores": {
"structural": 1.0,
"relational": 1.0,
"operational": 1.0,
"governance": 1.0
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expected_layer_scores in this fixture metadata are incorrect. This fixture fails both recovery_path_available and security_causal_block (both in the relational layer). Therefore, the expected relational score should be 0.3333333333333333 (1 out of 3 relational contracts passing), not 1.0.

Suggested change
"expected_layer_scores": {
"structural": 1.0,
"relational": 1.0,
"operational": 1.0,
"governance": 1.0
},
"expected_layer_scores": {
"structural": 1.0,
"relational": 0.3333333333333333,
"operational": 1.0,
"governance": 1.0
},

@@ -0,0 +1,14 @@
{
"events": [
{"action": "setup_workspace", "step": 0},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the mild fixture, the reconstructed trace here includes an extra setup_workspace event at step 0 which is missing from the original trace. It is recommended to keep the reconstructed trace consistent with the original trace events to ensure high-fidelity validation.

@ProfRandom92 ProfRandom92 merged commit b090826 into main May 19, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant