Skip to content

Add replay semantic integrity artifact#148

Merged
ProfRandom92 merged 3 commits into
mainfrom
codex/add-deterministic-replay-semantic-integrity-artifact
May 20, 2026
Merged

Add replay semantic integrity artifact#148
ProfRandom92 merged 3 commits into
mainfrom
codex/add-deterministic-replay-semantic-integrity-artifact

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Summary:
Add a deterministic replay semantic integrity artifact and generator that summarizes operational commitment-class survival across all manifest-registered fixture families, relying only on manifest inputs, existing admissibility artifacts, and registered failure labels.

Changed files:

  • scripts/generate_replay_semantic_integrity_artifact.py
  • artifacts/replay_semantic_integrity_results.json
  • tests/test_replay_semantic_integrity_artifact.py
  • package.json (added generate:replay-semantic-integrity script)

Testing:

  • Ran targeted pytest: pytest tests/test_replay_semantic_integrity_artifact.py -q (7 passed).
  • Verified related artifact tests: pytest tests/test_mcp_trace_replay_artifact.py -q (5 passed) and pytest tests/test_multi_family_admissibility_artifact.py -q (6 passed).
  • Verified manifest and taxonomy tests: pytest tests/test_fixture_manifest.py -q (8 passed) and pytest tests/test_failure_taxonomy.py -q (4 passed).
  • Ran full validation: npm run check which executed the project test suite and build, resulting in all tests passing (234 passed in 50.74s).

Risks:

  • Commitment-class mapping is a small deterministic keyword/layer mapping and may classify generic or ambiguous contract IDs into the fallback constraints class, which is test-covered but may require fine-tuning later.

Next:

  • Open PR (branch feature/replay-semantic-integrity-artifact) and monitor CI artifact drift/reproducibility gates to ensure the committed artifact remains stable in remote CI.

Codex Task

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new artifact generation pipeline for replay semantic integrity results, consisting of a generation script, a new JSON artifact, and a comprehensive test suite. The code reviewer identified opportunities to modernize and optimize the Python script by removing redundant OrderedDict usage and streamlining the data aggregation logic for better performance.

from __future__ import annotations

import json
from collections import OrderedDict
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The OrderedDict import is unnecessary as standard dictionaries in Python 3.7+ preserve insertion order. Using regular dictionaries simplifies the code and reduces overhead.

Comment on lines +87 to +113
commitment_classes: OrderedDict[str, dict[str, object]] = OrderedDict()
for commitment_class in COMMITMENT_CLASS_ORDER:
commitment_classes[commitment_class] = {
"passed": 0,
"failed": 0,
"failure_labels": set(),
}

for point in points:
failed_contracts = set(point["failed_contracts"])
for contract_id in point["passed_contracts"] + point["failed_contracts"]:
commitment_class = _class_for_contract(contract_id)
if contract_id in failed_contracts:
commitment_classes[commitment_class]["failed"] += 1
for failure_label in point["failure_labels"]:
commitment_classes[commitment_class]["failure_labels"].add(failure_label)
else:
commitment_classes[commitment_class]["passed"] += 1

serializable_classes: OrderedDict[str, dict[str, object]] = OrderedDict()
for commitment_class in COMMITMENT_CLASS_ORDER:
values = commitment_classes[commitment_class]
serializable_classes[commitment_class] = {
"passed": values["passed"],
"failed": values["failed"],
"failure_labels": sorted(values["failure_labels"]),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The aggregation logic can be optimized by avoiding list concatenation and using set.update() for failure labels. Additionally, OrderedDict is redundant here as standard dictionaries maintain insertion order.

        commitment_classes = {
            cls: {"passed": 0, "failed": 0, "failure_labels": set()}
            for cls in COMMITMENT_CLASS_ORDER
        }

        for point in points:
            failed_contracts = set(point["failed_contracts"])
            for contract_id in point["passed_contracts"]:
                commitment_classes[_class_for_contract(contract_id)]["passed"] += 1
            for contract_id in point["failed_contracts"]:
                cls = _class_for_contract(contract_id)
                commitment_classes[cls]["failed"] += 1
                commitment_classes[cls]["failure_labels"].update(point["failure_labels"])

        serializable_classes = {}
        for commitment_class in COMMITMENT_CLASS_ORDER:
            values = commitment_classes[commitment_class]
            serializable_classes[commitment_class] = {
                "passed": values["passed"],
                "failed": values["failed"],
                "failure_labels": sorted(values["failure_labels"]),
            }

@ProfRandom92 ProfRandom92 merged commit b750b2f into main May 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant