Add replay semantic integrity artifact#148
Merged
ProfRandom92 merged 3 commits intoMay 20, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a new artifact generation pipeline for replay semantic integrity results, consisting of a generation script, a new JSON artifact, and a comprehensive test suite. The code reviewer identified opportunities to modernize and optimize the Python script by removing redundant OrderedDict usage and streamlining the data aggregation logic for better performance.
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from collections import OrderedDict |
Contributor
Comment on lines
+87
to
+113
| commitment_classes: OrderedDict[str, dict[str, object]] = OrderedDict() | ||
| for commitment_class in COMMITMENT_CLASS_ORDER: | ||
| commitment_classes[commitment_class] = { | ||
| "passed": 0, | ||
| "failed": 0, | ||
| "failure_labels": set(), | ||
| } | ||
|
|
||
| for point in points: | ||
| failed_contracts = set(point["failed_contracts"]) | ||
| for contract_id in point["passed_contracts"] + point["failed_contracts"]: | ||
| commitment_class = _class_for_contract(contract_id) | ||
| if contract_id in failed_contracts: | ||
| commitment_classes[commitment_class]["failed"] += 1 | ||
| for failure_label in point["failure_labels"]: | ||
| commitment_classes[commitment_class]["failure_labels"].add(failure_label) | ||
| else: | ||
| commitment_classes[commitment_class]["passed"] += 1 | ||
|
|
||
| serializable_classes: OrderedDict[str, dict[str, object]] = OrderedDict() | ||
| for commitment_class in COMMITMENT_CLASS_ORDER: | ||
| values = commitment_classes[commitment_class] | ||
| serializable_classes[commitment_class] = { | ||
| "passed": values["passed"], | ||
| "failed": values["failed"], | ||
| "failure_labels": sorted(values["failure_labels"]), | ||
| } |
Contributor
There was a problem hiding this comment.
The aggregation logic can be optimized by avoiding list concatenation and using set.update() for failure labels. Additionally, OrderedDict is redundant here as standard dictionaries maintain insertion order.
commitment_classes = {
cls: {"passed": 0, "failed": 0, "failure_labels": set()}
for cls in COMMITMENT_CLASS_ORDER
}
for point in points:
failed_contracts = set(point["failed_contracts"])
for contract_id in point["passed_contracts"]:
commitment_classes[_class_for_contract(contract_id)]["passed"] += 1
for contract_id in point["failed_contracts"]:
cls = _class_for_contract(contract_id)
commitment_classes[cls]["failed"] += 1
commitment_classes[cls]["failure_labels"].update(point["failure_labels"])
serializable_classes = {}
for commitment_class in COMMITMENT_CLASS_ORDER:
values = commitment_classes[commitment_class]
serializable_classes[commitment_class] = {
"passed": values["passed"],
"failed": values["failed"],
"failure_labels": sorted(values["failure_labels"]),
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add a deterministic replay semantic integrity artifact and generator that summarizes operational commitment-class survival across all manifest-registered fixture families, relying only on manifest inputs, existing admissibility artifacts, and registered failure labels.
Changed files:
generate:replay-semantic-integrityscript)Testing:
pytest tests/test_replay_semantic_integrity_artifact.py -q(7 passed).pytest tests/test_mcp_trace_replay_artifact.py -q(5 passed) andpytest tests/test_multi_family_admissibility_artifact.py -q(6 passed).pytest tests/test_fixture_manifest.py -q(8 passed) andpytest tests/test_failure_taxonomy.py -q(4 passed).npm run checkwhich executed the project test suite and build, resulting in all tests passing (234 passed in 50.74s).Risks:
constraintsclass, which is test-covered but may require fine-tuning later.Next:
feature/replay-semantic-integrity-artifact) and monitor CI artifact drift/reproducibility gates to ensure the committed artifact remains stable in remote CI.Codex Task