Harden deterministic operational failure taxonomy registration and coverage#141
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a deterministic operational failure taxonomy for replay admissibility validation, consisting of a documentation guide, a Python registry of failure labels, and a test suite to enforce field requirements and naming conventions. Feedback focuses on resolving redundancies between similar labels (such as evidence and recovery path losses), ensuring recursive artifact discovery in tests for full coverage, and extending the validation of banned fuzzy terms to the operational definitions to maintain deterministic semantics.
| "confusion", | ||
| ) | ||
|
|
||
| FAILURE_TAXONOMY: Final[dict[str, dict[str, str]]] = { |
There was a problem hiding this comment.
The taxonomy contains several redundant labels that overlap in operational meaning and trigger conditions. For example, EVIDENCE_SURVIVAL_LOSS (line 52) vs EVIDENCE_LOSS (line 150), and RECOVERY_PATH_LOSS (line 24) vs RECOVERY_PATH_INVALID (line 73). To maintain a truly canonical and deterministic registry, these should be unified or the legacy versions should be explicitly marked as deprecated/aliases to avoid confusion for future fixture development.
|
|
||
| def _collect_artifact_failure_labels() -> set[str]: | ||
| labels: set[str] = set() | ||
| for path in sorted((ROOT / "artifacts").glob("*.json")): |
There was a problem hiding this comment.
The current glob pattern only searches the top-level artifacts directory. Since fixtures are searched recursively (line 18), artifacts should likely be searched recursively as well to ensure full coverage of all generated labels.
| for path in sorted((ROOT / "artifacts").glob("*.json")): | |
| for path in sorted((ROOT / "artifacts").glob("**/*.json")): |
| def test_registered_labels_do_not_use_banned_fuzzy_terms() -> None: | ||
| for label in FAILURE_TAXONOMY: | ||
| normalized = label.lower() | ||
| for banned in BANNED_FUZZY_TERMS: | ||
| assert banned not in normalized, f"label '{label}' contains banned fuzzy term '{banned}'" |
There was a problem hiding this comment.
This test only validates the label names. To fully enforce the project's goal of deterministic operational semantics, the operational_meaning field should also be checked for banned fuzzy terms to prevent non-deterministic language from creeping into the taxonomy definitions.
| def test_registered_labels_do_not_use_banned_fuzzy_terms() -> None: | |
| for label in FAILURE_TAXONOMY: | |
| normalized = label.lower() | |
| for banned in BANNED_FUZZY_TERMS: | |
| assert banned not in normalized, f"label '{label}' contains banned fuzzy term '{banned}'" | |
| def test_registered_labels_do_not_use_banned_fuzzy_terms() -> None: | |
| for label, spec in FAILURE_TAXONOMY.items(): | |
| check_texts = [label.lower(), spec.get("operational_meaning", "").lower()] | |
| for banned in BANNED_FUZZY_TERMS: | |
| for text in check_texts: | |
| assert banned not in text, f"label '{label}' contains banned fuzzy term '{banned}'" |
Motivation
Description
src/validation/failure_taxonomy.pythat enumerates labels withoperational_meaning,observable_trigger,contract_or_invariant_type,severity_class, andnon_goal, plus aBANNED_FUZZY_TERMSguard.docs/failure_taxonomy.mddescribing the canonical source, required fields per label, non-goals, and preferred hardened labels (includingTOOL_ORDER_VIOLATION,RECOVERY_PATH_LOSS,BLOCKER_DETACHMENT,GOVERNANCE_DRIFT,DEPENDENCY_CHAIN_BREAK,EVIDENCE_SURVIVAL_LOSS, andHIGH_CRITICAL_EVIDENCE_LOSS).tests/test_failure_taxonomy.pythat assert fixture and artifact-emitted labels are registered, that every registered label includes all required operational fields, and that banned fuzzy terms are not present.src/validation/failure_taxonomy.py,docs/failure_taxonomy.md,tests/test_failure_taxonomy.py.src/validation/failure_taxonomy.pyfirst when introducing new failure labels, and extend the registry and tests in the same PR as any fixture or artifact changes.Testing
pytest tests/test_failure_taxonomy.py -qwhich passed (4 tests, all OK).pytest tests/test_manifest_fixture_families.py -qwhich passed (3 tests, all OK).pytest tests/test_multi_family_admissibility_artifact.py -qwhich passed (6 tests, all OK).npm run checkwhich runs layout/typecheck/validate/build and the full pytest suite; the full test run succeeded with222 passedandnpm run checkcompleted without errors.Codex Task