From d4ecc869696f08fa35ebc40c8d55d0f67292c4f0 Mon Sep 17 00:00:00 2001
From: ProfRandom92 <159939812+ProfRandom92@users.noreply.github.com>
Date: Sat, 16 May 2026 18:47:11 +0000
Subject: [PATCH 1/2] docs: add paper replay state audit report

- Inspect existing paper replay infrastructure (tests, runner, fixtures, artifacts).
- Add `docs/paper_replay_state_audit.md` documenting findings.
- Link audit report in `README.md`.
- Identify bifurcation between `paper_replay_runner.py` and `KVTCV7Engine`.
- Provide recommendations for Paper Replay Benchmark v1 alignment.
---
 README.md                        |  1 +
 docs/paper_replay_state_audit.md | 57 ++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 docs/paper_replay_state_audit.md

diff --git a/README.md b/README.md
index 9e6ebcc..7b8010a 100644
--- a/README.md
+++ b/README.md
@@ -102,6 +102,7 @@ Comptextv7 turns noisy context into compact operational state, then validates wh
 ## Benchmark family
 
 ### Paper Replay Benchmark
+- **State Audit:** [`docs/paper_replay_state_audit.md`](docs/paper_replay_state_audit.md).
 
 - **Validates:** whether dense technical paper summaries preserve entities, metrics, limitations, and section structure after deterministic replay compression.
 - **Artifact:** [`artifacts/paper_replay_results.json`](artifacts/paper_replay_results.json).
diff --git a/docs/paper_replay_state_audit.md b/docs/paper_replay_state_audit.md
new file mode 100644
index 0000000..fc58ae0
--- /dev/null
+++ b/docs/paper_replay_state_audit.md
@@ -0,0 +1,57 @@
+# Paper Replay State Audit
+
+## Existing Paper Replay Files
+
+The following files constitute the current paper replay infrastructure:
+
+- **Tests & Runners:**
+  - `tests/test_paper_replay_bench.py`: Implements a benchmark using `KVTCV7Engine`.
+  - `tests/utils/paper_replay_runner.py`: Runner for the committed benchmark artifact. **Does not use KVTCV7Engine.**
+  - `tests/test_paper_replay_metrics.py`: Validates the schema and determinism of the runner output.
+- **Fixtures:**
+  - `tests/fixtures/papers/prefixguard_excerpt.txt`
+  - `tests/fixtures/papers/fate_excerpt.txt`
+  - `tests/fixtures/papers/self_consolidating_excerpt.txt`
+- **Artifacts:**
+  - `artifacts/paper_replay_results.json`: The source of truth for current benchmark metrics.
+- **Documentation:**
+  - `docs/paper_replay_benchmark.md`: Overview of the methodology.
+  - `docs/benchmarks/paper_replay.md`: Detailed methodology.
+
+## Current Validation Logic
+
+The current benchmark (`paper_replay_runner.py`) validates:
+- **Extraction:** Deterministic parsing of `TITLE:` and `SECTION:` headers into a structured `OperationalState`.
+- **Compaction:** Reduction of text fields to bounded keyword lists and entity sets.
+- **Survival:** Calculation of keyword overlap (`normalized_keyword_overlap`) and entity retention rates.
+- **Consistency:** A derived score (`replay_consistency`) based on field survival thresholds.
+
+## Engine vs. Substring Checks
+
+- **KVTCV7Engine Usage:** The engine is exercised in `tests/test_paper_replay_bench.py` but is **not** used by the main runner that produces `artifacts/paper_replay_results.json`.
+- **Substring/Keyword focus:** The main runner relies on keyword extraction and set overlap rather than the V7 engine's sliding window or compression signals.
+
+## Fixture Nature
+
+Current fixtures are **curated excerpts**. They are not raw PDFs or full-text scrapes. They are pre-formatted with specific headers (`SECTION: problem`, etc.) to facilitate deterministic extraction.
+
+## Existing Validation Commands
+
+- `python -m tests.utils.paper_replay_runner`: Regenerates the JSON artifact.
+- `pytest tests/test_paper_replay_bench.py`: Tests V7 engine integration with paper text.
+- `npm run layout`: Verifies the existence of the artifact.
+- `npm run check`: Runs all repository checks including layout and tests.
+
+## Gaps & Risks
+
+1. **Bifurcation:** The "official" benchmark results (`paper_replay_results.json`) do not actually measure the `KVTCV7Engine`. They measure a separate keyword-compaction heuristic.
+2. **Logic Duplication:** Extraction logic is slightly different between the test and the runner (e.g., `test_paper_replay_bench.py` uses a simpler line-based parser compared to the runner's utility).
+3. **Weak Validation:** While deterministic, keyword-overlap is a "weak" proxy for operational-state preservation compared to the V7 engine's intended use cases.
+4. **Duplicate Fixture References:** Both the test and the runner hardcode paper specs and fixture paths.
+
+## Recommendation for Paper Replay Benchmark v1
+
+1. **Converge on KVTCV7Engine:** Update the runner to use the V7 engine for the compaction step.
+2. **Unified Extraction:** Extract the extraction logic into a shared utility in `src/validation/paper.py` (or similar) to avoid duplication.
+3. **Artifact Alignment:** Ensure `artifacts/paper_replay_results.json` reflects the performance of the actual engine.
+4. **Test Consolidation:** Merge the metric validation and the bench tests into a consistent suite that guards the V7-backed runner.

From d21c5120203718c482059a042ea30d6bfe9beb5b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alexander=20K=C3=B6lnberger?=
 <159939812+ProfRandom92@users.noreply.github.com>
Date: Sat, 16 May 2026 12:18:48 -0700
Subject: [PATCH 2/2] Update docs/paper_replay_state_audit.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 docs/paper_replay_state_audit.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/paper_replay_state_audit.md b/docs/paper_replay_state_audit.md
index fc58ae0..64e9015 100644
--- a/docs/paper_replay_state_audit.md
+++ b/docs/paper_replay_state_audit.md
@@ -52,6 +52,6 @@ Current fixtures are **curated excerpts**. They are not raw PDFs or full-text sc
 ## Recommendation for Paper Replay Benchmark v1
 
 1. **Converge on KVTCV7Engine:** Update the runner to use the V7 engine for the compaction step.
-2. **Unified Extraction:** Extract the extraction logic into a shared utility in `src/validation/paper.py` (or similar) to avoid duplication.
+2. **Unified Extraction:** Extract the extraction logic into a shared utility in `tests/utils/paper_utils.py` (or similar) to avoid duplication.
 3. **Artifact Alignment:** Ensure `artifacts/paper_replay_results.json` reflects the performance of the actual engine.
 4. **Test Consolidation:** Merge the metric validation and the bench tests into a consistent suite that guards the V7-backed runner.