Add deterministic graph diff artifact#151
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a graph difference artifact generator script, its corresponding output artifact, and a new test suite to ensure the stability and determinism of the generated results. The generator extracts and compares graph edges from technical log payloads across various fixtures. Review feedback highlighted an opportunity to improve the efficiency of the edge extraction process by reducing redundant traversals and suggested filtering out self-loops to prevent potential errors during edge normalization. Additionally, a minor update to a type hint was recommended to accurately reflect that JSON files may contain lists at the root level.
| def _extract_edges_from_payloads(payloads: list[dict[str, Any]]) -> dict[str, tuple[tuple[str, str], ...]]: | ||
| extracted: dict[str, tuple[tuple[str, str], ...]] = {} | ||
| for relation_key in SUPPORTED_RELATION_KEYS: | ||
| edges: list[tuple[str, str]] = [] | ||
| for payload in payloads: | ||
| for relation_value in _collect_relation_values(payload, relation_key): | ||
| edges.extend(_extract_edges_from_relation_value(relation_value)) | ||
| if relation_key == "dependencies": | ||
| for payload in payloads: | ||
| edges.extend(_collect_dependency_graph_edges(payload)) | ||
| extracted[relation_key] = normalize_edges(edges) | ||
| return extracted |
There was a problem hiding this comment.
The current implementation of _extract_edges_from_payloads is inefficient as it performs a full recursive traversal of the payload tree for every key in SUPPORTED_RELATION_KEYS. For large payloads or many relation keys, this results in significant redundant processing. A single-pass traversal that collects all relevant keys would be more efficient.
Additionally, normalize_edges (from topology.py) raises a ValueError if it encounters a self-loop. To make the generator more robust against unexpected data in technical logs (e.g., a state transitioning to itself), consider filtering out self-loops before normalization.
| def _extract_edges_from_payloads(payloads: list[dict[str, Any]]) -> dict[str, tuple[tuple[str, str], ...]]: | |
| extracted: dict[str, tuple[tuple[str, str], ...]] = {} | |
| for relation_key in SUPPORTED_RELATION_KEYS: | |
| edges: list[tuple[str, str]] = [] | |
| for payload in payloads: | |
| for relation_value in _collect_relation_values(payload, relation_key): | |
| edges.extend(_extract_edges_from_relation_value(relation_value)) | |
| if relation_key == "dependencies": | |
| for payload in payloads: | |
| edges.extend(_collect_dependency_graph_edges(payload)) | |
| extracted[relation_key] = normalize_edges(edges) | |
| return extracted | |
| def _extract_edges_from_payloads(payloads: list[Any]) -> dict[str, tuple[tuple[str, str], ...]]: | |
| extracted: dict[str, tuple[tuple[str, str], ...]] = {} | |
| for relation_key in SUPPORTED_RELATION_KEYS: | |
| edges: list[tuple[str, str]] = [] | |
| for payload in payloads: | |
| for relation_value in _collect_relation_values(payload, relation_key): | |
| edges.extend(_extract_edges_from_relation_value(relation_value)) | |
| if relation_key == "dependencies": | |
| for payload in payloads: | |
| edges.extend(_collect_dependency_graph_edges(payload)) | |
| # Filter self-loops to prevent normalize_edges from raising ValueError | |
| extracted[relation_key] = normalize_edges(e for e in edges if e[0] != e[1]) | |
| return extracted |
| ) | ||
|
|
||
|
|
||
| def _load_json(path: Path) -> dict[str, Any]: |
There was a problem hiding this comment.
The type hint for _load_json specifies dict[str, Any], but JSON files can also contain lists at the root level. Since the downstream functions _extract_edges_from_payloads and _collect_relation_values are designed to handle both dictionaries and lists, the return type should be broadened to Any to accurately reflect the possible structures and avoid potential type-checking issues.
| def _load_json(path: Path) -> dict[str, Any]: | |
| def _load_json(path: Path) -> Any: |
Summary:
artifacts/graph_diff_results.json).Changed files:
scripts/generate_graph_diff_artifact.py(new) — generator that discovers fixture payloads, conservatively extracts edges (including explicitdependency_graph.edgeswhen present), normalizes via graph-core helpers, compares usingcompare_edges, and writes deterministic JSON.artifacts/graph_diff_results.json(new) — committed deterministic artifact with per-family and per-fixture edge-diff evidence and a stable top-level schema.tests/test_graph_diff_artifact.py(new) — tests for artifact existence, reproducibility, schema stability, manifest alignment, deterministic/sanitized output, evidence presence, and failure-label scoping.package.json(modified) — addgenerate:graph-diffscript to match existing generator conventions.Testing:
python scripts/generate_graph_diff_artifact.pywhich wroteartifacts/graph_diff_results.json.pytest tests/test_graph_diff_artifact.py -q(passed).pytest tests/test_replay_graph_core.py -q,pytest tests/test_fixture_manifest.py -q, andpytest tests/test_failure_taxonomy.py -q(all passed).npm run generate:graph-diff(invoked the same Python generator, passed).npm run check(includes layout/typecheck/validate/build/test) and observed the test suite passing locally (full test run completed successfully).Scope note:
Risks:
edgeswhen present), so any relations expressed only in prose or implicit forms will be excluded by design.Next:
Codex Task