Add MCP trace replay artifact#147
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new artifact generation system for MCP trace replays. It includes a JSON artifact containing results for various degradation levels, a Python script to generate this artifact deterministically, and a comprehensive test suite to ensure schema stability and data integrity. Feedback focuses on improving the robustness of the generation script by using absolute paths anchored to the repository root for output and manifest files, ensuring the script functions correctly regardless of the execution context.
| ARTIFACT_ID = "mcp_trace_replay_results_v1" | ||
| FAMILY = "mcp_trace_replay" | ||
| CURVE_LEVELS = ("baseline", "mild", "moderate", "severe") | ||
| OUTPUT_PATH = Path("artifacts/mcp_trace_replay_results.json") |
There was a problem hiding this comment.
The OUTPUT_PATH is currently defined as a relative path, which makes the script sensitive to the current working directory. Since REPO_ROOT is already calculated, it is better to anchor all paths to it for robustness, especially if the script is executed from a subdirectory. I also recommend defining MANIFEST_PATH here to ensure the generator uses the correct manifest file regardless of the execution context.
| OUTPUT_PATH = Path("artifacts/mcp_trace_replay_results.json") | |
| OUTPUT_PATH = REPO_ROOT / "artifacts" / "mcp_trace_replay_results.json" | |
| MANIFEST_PATH = REPO_ROOT / "fixtures" / "manifest.json" |
|
|
||
| def generate_mcp_trace_replay_artifact(output_path: Path = OUTPUT_PATH) -> Path: | ||
| generator = DegradationCurveGenerator() | ||
| fixtures = generator.fixtures_for_manifest_family(FAMILY, levels=CURVE_LEVELS) |
There was a problem hiding this comment.
To ensure the script is fully independent of the working directory, pass the absolute MANIFEST_PATH to the generator. This ensures that DegradationCurveGenerator looks for the manifest in the repository root rather than relative to the current shell location.
| fixtures = generator.fixtures_for_manifest_family(FAMILY, levels=CURVE_LEVELS) | |
| fixtures = generator.fixtures_for_manifest_family(FAMILY, levels=CURVE_LEVELS, manifest_path=MANIFEST_PATH) |
Motivation
mcp_trace_replayfixture family that proves whether MCP-style tool traces preserve tool order, validation-before-action, dependency chains, recovery paths, and capability boundaries under replay reconstruction.Description
scripts/generate_mcp_trace_replay_artifact.pythat reusesDegradationCurveGeneratorto evaluate themcp_trace_replayfamily and emits stable, ordered fixture entries for levelsbaseline,mild,moderate,severe.artifacts/mcp_trace_replay_results.jsonwith formatted decimal-string overall scores, sorted contracts/labels, and a non-LLM/non-external summary block.tests/test_mcp_trace_replay_artifact.pythat assert exact match with the committed artifact, stable schema (no time/env fields), deterministic ordering, and manifest-aligned labels/admissibility.generate:mcp-trace-replaynpm script to follow existing generator script conventions and keep the PR scope limited to artifact generation and tests.Testing
python scripts/generate_mcp_trace_replay_artifact.pyand inspectedartifacts/mcp_trace_replay_results.jsonwhich was produced deterministically and matches the committed file.pytest tests/test_mcp_trace_replay_artifact.py -q(5 passed),pytest tests/test_fixture_manifest.py -q(8 passed),pytest tests/test_multi_family_admissibility_artifact.py -q(6 passed), andpytest tests/test_failure_taxonomy.py -q(4 passed), all succeeding.npm run checkwhich executed layout/typecheck/validate/build/test and completed with the test suite passing (227 passed).Codex Task