Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions artifacts/layered_admissibility_results.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,53 @@
"relational_score": 1.0,
"structural_score": 1.0
},
{
"expected_admissible": false,
"failed_contracts": [
"recovery_path_available"
],
"failure_labels": [
"RECOVERY_PATH_INVALID"
],
"fixture_id": "coding_workflow_pr_review_mild_v1",
"fixture_path": "fixtures/coding_workflow_pr_review_mild_v1",
"fixture_version": "1.0.0",
"governance_score": 1.0,
"observed_admissible": false,
"operational_score": 1.0,
"overall_admissibility_score": 0.9166666666666666,
"passed_contracts": [
"no_orphan_tool_calls",
"pre_merge_review",
"security_causal_block"
],
"relational_score": 0.6666666666666666,
"structural_score": 1.0
},
{
"expected_admissible": false,
"failed_contracts": [
"recovery_path_available",
"security_causal_block"
],
"failure_labels": [
"CAUSAL_DEPENDENCY_LOSS",
"RECOVERY_PATH_INVALID"
],
"fixture_id": "coding_workflow_pr_review_moderate_v1",
"fixture_path": "fixtures/coding_workflow_pr_review_moderate_v1",
"fixture_version": "1.0.0",
"governance_score": 1.0,
"observed_admissible": false,
"operational_score": 1.0,
"overall_admissibility_score": 0.8333333333333334,
"passed_contracts": [
"no_orphan_tool_calls",
"pre_merge_review"
],
"relational_score": 0.3333333333333333,
"structural_score": 1.0
},
{
"expected_admissible": false,
"failed_contracts": [
Expand Down
7 changes: 6 additions & 1 deletion docs/benchmarks/layered_admissibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,16 @@ Deterministically compare admissibility outcomes across fixture bundles using Co
| fixture_id | expected_admissible | observed_admissible | structural_score | relational_score | operational_score | governance_score | overall_admissibility_score | failure_labels |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| coding_workflow_pr_review_v1 | true | true | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | none |
| coding_workflow_pr_review_mild_v1 | false | false | 1.000 | 0.667 | 1.000 | 1.000 | 0.917 | RECOVERY_PATH_INVALID |
| coding_workflow_pr_review_moderate_v1 | false | false | 1.000 | 0.333 | 1.000 | 1.000 | 0.833 | CAUSAL_DEPENDENCY_LOSS, RECOVERY_PATH_INVALID |
| coding_workflow_pr_review_degraded_v1 | false | false | 1.000 | 0.000 | 0.000 | 1.000 | 0.500 | CAUSAL_DEPENDENCY_LOSS, INVARIANT_VIOLATION, POLICY_ORDER_BROKEN, RECOVERY_PATH_INVALID |

## Interpretation

The positive fixture remains fully admissible while the degraded fixture shows deterministic score loss and explicit failure labels.
- positive fixture remains fully admissible
- mild fixture isolates recovery reachability loss
- moderate fixture combines recovery and causality loss
- severe fixture combines relational and operational failures

## Non-goals

Expand Down
18 changes: 18 additions & 0 deletions fixtures/coding_workflow_pr_review_mild_v1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# coding_workflow_pr_review_mild_v1

Deterministic mild degraded fixture for coding workflow replay-validation contracts.

## Intentional degradations

1. **Reachability degradation**: reconstructed dependency graph removes recovery edges from `test_failure` to `rollback` and `escalate_to_human`, violating `recovery_path_available`.

## Preserved properties

- Ordering sequence remains intact in reconstructed trace.
- No orphan dependency invariant is preserved.

## Expected failures

- `RECOVERY_PATH_INVALID`

This fixture is intentionally synthetic, deterministic, and scoped to this fixture family.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"fixture_id": "coding_workflow_pr_review_mild_v1",
"fixture_version": "1.0.0",
"expected_admissible": false,
"expected_layer_scores": {
"structural": 1.0,
"relational": 0.6666666666666666,
"operational": 1.0,
"governance": 1.0
},
Comment on lines +5 to +10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expected_layer_scores in this fixture metadata are incorrect. Since this fixture is designed to fail the recovery_path_available contract (which belongs to the relational layer), the expected relational score should be 0.6666666666666666 (2 out of 3 relational contracts passing), not 1.0. This inconsistency makes the fixture ground truth misleading.

Suggested change
"expected_layer_scores": {
"structural": 1.0,
"relational": 1.0,
"operational": 1.0,
"governance": 1.0
},
"expected_layer_scores": {
"structural": 1.0,
"relational": 0.6666666666666666,
"operational": 1.0,
"governance": 1.0
},

"notes": "Mild degraded fixture isolating recovery-path reachability loss.",
"must_fail_contracts": [
"recovery_path_available"
],
"expected_failure_labels": [
"RECOVERY_PATH_INVALID"
]
}
18 changes: 18 additions & 0 deletions fixtures/coding_workflow_pr_review_mild_v1/expected/failures.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"expected_failures": [
"RECOVERY_PATH_INVALID"
],
"allowed_failures": [
"ORPHAN_DEPENDENCY",
"DETACHED_DEPENDENCY",
"GRAPH_FRAGMENTATION",
"TEMPORAL_ORDER_VIOLATION"
],
"disallowed_failures": [
"POLICY_ORDER_BROKEN",
"INVARIANT_VIOLATION",
"CYCLE_INTRODUCED",
"REPLAY_NON_REPRODUCIBLE",
"ARTIFACT_INTEGRITY_VIOLATION"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"contract_id": "no_orphan_tool_calls",
"layer": "relational",
"type": "invariant",
"definition": {
"rule": "no_orphan_dependencies"
},
"severity": "HIGH"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"contract_id": "pre_merge_review",
"layer": "operational",
"type": "ordering",
"definition": {
"required_sequence": [
"generate_patch",
"run_tests",
"human_review",
"merge"
]
},
"severity": "CRITICAL"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"contract_id": "recovery_path_available",
"layer": "relational",
"type": "reachability",
"definition": {
"from": "test_failure",
"to": [
"rollback",
"escalate_to_human"
],
"min_paths": 1
},
"severity": "HIGH"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"contract_id": "security_causal_block",
"layer": "relational",
"type": "causality",
"definition": {
"required_causal_edges": [
["security_scan_failed", "deploy_blocked"]
]
},
"severity": "HIGH"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"graph_version": "1.0",
"nodes": [
{"node_id": "generate_patch", "label": "Generate patch", "metadata": {"phase": "build"}},
{"node_id": "run_tests", "label": "Run tests", "metadata": {"phase": "verify"}},
{"node_id": "test_failure", "label": "Test failure", "metadata": {"phase": "verify"}},
{"node_id": "rollback", "label": "Rollback", "metadata": {"phase": "recovery"}},
{"node_id": "security_scan_failed", "label": "Security scan failed", "metadata": {"phase": "security"}},
{"node_id": "deploy_blocked", "label": "Deploy blocked", "metadata": {"phase": "security"}},
{"node_id": "escalate_to_human", "label": "Escalate to human", "metadata": {"phase": "recovery"}},
{"node_id": "human_review", "label": "Human review", "metadata": {"phase": "governance"}},
{"node_id": "merge", "label": "Merge", "metadata": {"phase": "release"}}
],
"edges": [
{"source": "generate_patch", "target": "run_tests", "relation": "PREREQUISITE", "metadata": {}},
{"source": "run_tests", "target": "test_failure", "relation": "CAUSAL", "metadata": {}},
{"source": "run_tests", "target": "security_scan_failed", "relation": "DATA_FLOW", "metadata": {}},
{"source": "test_failure", "target": "rollback", "relation": "RECOVERY", "metadata": {}},
{"source": "test_failure", "target": "escalate_to_human", "relation": "RECOVERY", "metadata": {}},
{"source": "security_scan_failed", "target": "deploy_blocked", "relation": "CAUSAL", "metadata": {}},
{"source": "rollback", "target": "human_review", "relation": "TEMPORAL", "metadata": {}},
{"source": "escalate_to_human", "target": "human_review", "relation": "TEMPORAL", "metadata": {}},
{"source": "human_review", "target": "merge", "relation": "PREREQUISITE", "metadata": {}},
{"source": "run_tests", "target": "merge", "relation": "PREREQUISITE", "metadata": {}},
{"source": "deploy_blocked", "target": "merge", "relation": "BLOCKER", "metadata": {"state": "prevented"}}
]
}
28 changes: 28 additions & 0 deletions fixtures/coding_workflow_pr_review_mild_v1/original/state.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"evidence": {
"pr_id": "PR-122-fixture",
"test_suite": "unit",
"security_gate": "required"
},
"constraints": {
"requires_human_review": true,
"requires_clean_tests_before_merge": true
},
"blockers": [
"test_failure",
"security_scan_failed",
"deploy_blocked"
],
"recovery_paths": {
"test_failure": [
"rollback",
"escalate_to_human"
]
},
"dependencies": {
"merge": [
"human_review",
"run_tests"
]
}
}
13 changes: 13 additions & 0 deletions fixtures/coding_workflow_pr_review_mild_v1/original/trace.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"events": [
{"action": "generate_patch", "step": 1},
{"action": "run_tests", "step": 2},
{"action": "test_failure", "step": 3},
{"action": "rollback", "step": 4},
{"action": "security_scan_failed", "step": 5},
{"action": "deploy_blocked", "step": 6},
{"action": "escalate_to_human", "step": 7},
{"action": "human_review", "step": 8},
{"action": "merge", "step": 9}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
{
"graph_version": "1.0",
"nodes": [
{
"node_id": "generate_patch",
"label": "Generate patch",
"metadata": {
"phase": "build"
}
},
{
"node_id": "run_tests",
"label": "Run tests",
"metadata": {
"phase": "verify"
}
},
{
"node_id": "test_failure",
"label": "Test failure",
"metadata": {
"phase": "verify"
}
},
{
"node_id": "rollback",
"label": "Rollback",
"metadata": {
"phase": "recovery"
}
},
{
"node_id": "security_scan_failed",
"label": "Security scan failed",
"metadata": {
"phase": "security"
}
},
{
"node_id": "deploy_blocked",
"label": "Deploy blocked",
"metadata": {
"phase": "security"
}
},
{
"node_id": "escalate_to_human",
"label": "Escalate to human",
"metadata": {
"phase": "recovery"
}
},
{
"node_id": "human_review",
"label": "Human review",
"metadata": {
"phase": "governance"
}
},
{
"node_id": "merge",
"label": "Merge",
"metadata": {
"phase": "release"
}
}
],
"edges": [
{
"source": "generate_patch",
"target": "run_tests",
"relation": "PREREQUISITE",
"metadata": {}
},
{
"source": "run_tests",
"target": "test_failure",
"relation": "CAUSAL",
"metadata": {}
},
{
"source": "run_tests",
"target": "security_scan_failed",
"relation": "DATA_FLOW",
"metadata": {}
},
{
"source": "security_scan_failed",
"target": "deploy_blocked",
"relation": "CAUSAL",
"metadata": {}
},
{
"source": "rollback",
"target": "human_review",
"relation": "TEMPORAL",
"metadata": {}
},
{
"source": "escalate_to_human",
"target": "human_review",
"relation": "TEMPORAL",
"metadata": {}
},
{
"source": "human_review",
"target": "merge",
"relation": "PREREQUISITE",
"metadata": {}
},
{
"source": "run_tests",
"target": "merge",
"relation": "PREREQUISITE",
"metadata": {}
},
{
"source": "deploy_blocked",
"target": "merge",
"relation": "BLOCKER",
"metadata": {
"state": "prevented"
}
},
{
"source": "run_tests",
"target": "rollback",
"relation": "TEMPORAL",
"metadata": {}
},
{
"source": "run_tests",
"target": "escalate_to_human",
"relation": "TEMPORAL",
"metadata": {}
}
]
}
Loading
Loading