Skip to content

Add deterministic graph diff artifact#151

Merged
ProfRandom92 merged 1 commit into
mainfrom
codex/add-deterministic-graph-diff-artifact
May 20, 2026
Merged

Add deterministic graph diff artifact#151
ProfRandom92 merged 1 commit into
mainfrom
codex/add-deterministic-graph-diff-artifact

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Summary:

  • Add a deterministic graph-diff artifact generator that uses the replay graph core to extract and compare explicit structured relation edges from manifest-registered fixtures and emits a stable JSON artifact (artifacts/graph_diff_results.json).

Changed files:

  • scripts/generate_graph_diff_artifact.py (new) — generator that discovers fixture payloads, conservatively extracts edges (including explicit dependency_graph.edges when present), normalizes via graph-core helpers, compares using compare_edges, and writes deterministic JSON.
  • artifacts/graph_diff_results.json (new) — committed deterministic artifact with per-family and per-fixture edge-diff evidence and a stable top-level schema.
  • tests/test_graph_diff_artifact.py (new) — tests for artifact existence, reproducibility, schema stability, manifest alignment, deterministic/sanitized output, evidence presence, and failure-label scoping.
  • package.json (modified) — add generate:graph-diff script to match existing generator conventions.

Testing:

  • Ran generator locally: python scripts/generate_graph_diff_artifact.py which wrote artifacts/graph_diff_results.json.
  • Ran targeted pytest: pytest tests/test_graph_diff_artifact.py -q (passed).
  • Ran related unit suites: pytest tests/test_replay_graph_core.py -q, pytest tests/test_fixture_manifest.py -q, and pytest tests/test_failure_taxonomy.py -q (all passed).
  • Ran npm generator script: npm run generate:graph-diff (invoked the same Python generator, passed).
  • Ran full repo checks: npm run check (includes layout/typecheck/validate/build/test) and observed the test suite passing locally (full test run completed successfully).

Scope note:

  • Deterministic graph-diff artifact only; no fixture payload changes, no README changes, no workflow changes, and no runtime/orchestration/LLM/embedding/vector/fuzzy behavior introduced.

Risks:

  • Edge extraction is intentionally conservative and only uses explicit structured relation fields (and explicit dependency graph edges when present), so any relations expressed only in prose or implicit forms will be excluded by design.

Next:

  • Open the PR (prefer Draft per repo guidance) so CI/Actions can run on the hosted platform and reviewers can inspect the generator, artifact, and tests; address any CI review comments if Actions surface environment-specific issues.

Codex Task

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a graph difference artifact generator script, its corresponding output artifact, and a new test suite to ensure the stability and determinism of the generated results. The generator extracts and compares graph edges from technical log payloads across various fixtures. Review feedback highlighted an opportunity to improve the efficiency of the edge extraction process by reducing redundant traversals and suggested filtering out self-loops to prevent potential errors during edge normalization. Additionally, a minor update to a type hint was recommended to accurately reflect that JSON files may contain lists at the root level.

Comment on lines +107 to +118
def _extract_edges_from_payloads(payloads: list[dict[str, Any]]) -> dict[str, tuple[tuple[str, str], ...]]:
extracted: dict[str, tuple[tuple[str, str], ...]] = {}
for relation_key in SUPPORTED_RELATION_KEYS:
edges: list[tuple[str, str]] = []
for payload in payloads:
for relation_value in _collect_relation_values(payload, relation_key):
edges.extend(_extract_edges_from_relation_value(relation_value))
if relation_key == "dependencies":
for payload in payloads:
edges.extend(_collect_dependency_graph_edges(payload))
extracted[relation_key] = normalize_edges(edges)
return extracted
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of _extract_edges_from_payloads is inefficient as it performs a full recursive traversal of the payload tree for every key in SUPPORTED_RELATION_KEYS. For large payloads or many relation keys, this results in significant redundant processing. A single-pass traversal that collects all relevant keys would be more efficient.

Additionally, normalize_edges (from topology.py) raises a ValueError if it encounters a self-loop. To make the generator more robust against unexpected data in technical logs (e.g., a state transitioning to itself), consider filtering out self-loops before normalization.

Suggested change
def _extract_edges_from_payloads(payloads: list[dict[str, Any]]) -> dict[str, tuple[tuple[str, str], ...]]:
extracted: dict[str, tuple[tuple[str, str], ...]] = {}
for relation_key in SUPPORTED_RELATION_KEYS:
edges: list[tuple[str, str]] = []
for payload in payloads:
for relation_value in _collect_relation_values(payload, relation_key):
edges.extend(_extract_edges_from_relation_value(relation_value))
if relation_key == "dependencies":
for payload in payloads:
edges.extend(_collect_dependency_graph_edges(payload))
extracted[relation_key] = normalize_edges(edges)
return extracted
def _extract_edges_from_payloads(payloads: list[Any]) -> dict[str, tuple[tuple[str, str], ...]]:
extracted: dict[str, tuple[tuple[str, str], ...]] = {}
for relation_key in SUPPORTED_RELATION_KEYS:
edges: list[tuple[str, str]] = []
for payload in payloads:
for relation_value in _collect_relation_values(payload, relation_key):
edges.extend(_extract_edges_from_relation_value(relation_value))
if relation_key == "dependencies":
for payload in payloads:
edges.extend(_collect_dependency_graph_edges(payload))
# Filter self-loops to prevent normalize_edges from raising ValueError
extracted[relation_key] = normalize_edges(e for e in edges if e[0] != e[1])
return extracted

)


def _load_json(path: Path) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for _load_json specifies dict[str, Any], but JSON files can also contain lists at the root level. Since the downstream functions _extract_edges_from_payloads and _collect_relation_values are designed to handle both dictionaries and lists, the return type should be broadened to Any to accurately reflect the possible structures and avoid potential type-checking issues.

Suggested change
def _load_json(path: Path) -> dict[str, Any]:
def _load_json(path: Path) -> Any:

@ProfRandom92 ProfRandom92 merged commit 16da165 into main May 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant