Skip to content

ProfRandom92/Comptextv7

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comptextv7 logo

Comptextv7

Deterministic operational replay validation for long-horizon AI agents.

Comptextv7 is a deterministic operational replay-validation and state-survivability prototype: it tests whether compact, replay-safe operational state preserves fixture-defined evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals across compression, reconstruction, iterative replay degradation, and CI-audited summaries — without LLM judges, embeddings, vector databases, graph stores, or external APIs.

See docs/research_positioning.md for conservative project positioning and scope boundaries.

CI Python Deterministic Replay No LLM Judging Replay Artifacts Operational State

External Monaco showcase repo · Benchmark explanation · Iterative replay degradation · Replay report


Thesis

Comptextv7 measures whether compressed agent/workflow state can still be replayed as usable operational state — and shows exactly what breaks when compression becomes too aggressive. It extracts fixture-defined evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals; compacts them into replay-safe state; reconstructs them; and validates deterministic survival with committed artifacts rather than LLM judges, embeddings, vector databases, graph stores, or external APIs.

What Comptextv7 does

  • Extracts operational state from checked-in paper and agent/workflow fixtures.
  • Compacts that state into deterministic replay payloads.
  • Replays the compacted state into reconstructed operational state.
  • Validates deterministic survival of required evidence, constraints, blockers, dependencies, recovery paths, and tool-order signals.
  • Labels replay failure modes when operational state is lost or detached.

Why this matters

Summaries can sound fluent while losing blockers, constraints, evidence, dependencies, or recovery paths. Comptextv7 treats that as a measurable replay problem: if compressed state cannot be replayed into usable operational state, the validator records what failed instead of relying on subjective prose quality.

Current signal

Signal Current fixture-bound result
Agent trace replay consistency 1.000000
Paper replay consistency 0.791667
CONSERVATIVE replay consistency 0.895833
BALANCED replay consistency 0.250000
AGGRESSIVE replay consistency 0.125000

BALANCED currently emits these replay failure labels in the comparative degradation fixture summary: EVIDENCE_LOSS and CONSTRAINT_DRIFT.

Interpretation: the profile comparison shows monotonic degradation under increasing compression pressure. That is useful because the benchmark responds to operational-state loss; these results are internal, fixture-bound observations, not external benchmark, production-readiness, or solved-memory claims.

Positioning boundaries

Comptextv7 is a deterministic operational replay-validation and state-survivability prototype. It is complementary to learned context-compression research, RAG evaluation, vector-memory systems, serving-layer cache optimization, and durable workflow infrastructure, but it is not a workflow orchestrator, learned compressor, vector memory system, RAG replacement, KV-cache compressor, autonomous agent framework, production telemetry system, clinical-grade system, or universal AI-memory solution.

For the concise research positioning brief, scope boundaries, and benchmark interpretation, see Research Positioning, Iterative Replay Degradation, the Benchmark Explanation, the committed iterative replay degradation summary, and scripts/validate_replay_artifact_drift.py. The visual Monaco walkthrough lives in the external ProfRandom92/comptext-v7-monaco-showcase repository.

Proof at a glance

Evidence Current result
Paper replay fixtures 3 dense technical papers
Agent trace fixtures 3 multi-step workflows
Paper avg compression 1.347063
Agent avg compression 1.773954
Paper replay consistency 0.791667
Agent replay consistency 1.000000
Agent operational drift 0.000000
Evaluation mode deterministic, no LLM judging
Artifact format committed JSON + CI upload

Sources: artifacts/paper_replay_results.json and artifacts/agent_trace_replay_results.json.

How to read these values

  • Paper replay is lossy under dense technical prose. The current paper fixtures include entities, limitations, sections, and metrics that are harder to preserve after compaction.
  • Agent trace replay is currently near-lossless because traces are structured. The checked-in traces expose explicit tasks, blockers, dependencies, tool order, and recovery actions.
  • 1.000000 replay consistency does not mean solved memory. It means exact preservation under the current structured trace fixtures and current deterministic validator.
  • Operational drift is field loss, not subjective quality. A non-zero drift rate would mean replay lost required operational fields.
  • Iterative replay degradation is a bounded prototype. Repeated compact/replay cycles emit deterministic JSON and Markdown artifacts for reviewing drift curves, collapse points, and failure labels. A small fixture-bound comparison mode contrasts CONSERVATIVE, BALANCED, and AGGRESSIVE compression profiles with deterministic per-profile aggregates, and an additive sensitivity-analysis surface varies bounded replay/compression parameters without external services.

What makes this different

  • Not chat-history storage.
  • Not vector memory.
  • Not model-judged summarization.
  • Not autonomous agent orchestration.
  • Deterministic operational-state replay validation.

Architecture

flowchart LR
    A[Raw Context / Agent Trace]
    --> B[Operational State Extraction]
    B --> C[Compact Replay State]
    C --> D[Replay Reconstruction]
    D --> E[Deterministic Validation]
    E --> F[CI Artifact]
Loading

Comptextv7 turns noisy context into compact operational state, then validates whether replay reconstructs the fields needed to continue work.

Benchmark family

Paper Replay Benchmark

Agent Trace Replay Benchmark

  • Validates: whether multi-step agent workflows preserve active tasks, constraints, dependencies, tool sequences, unresolved blockers, deployment requirements, and recovery actions.
  • Artifact: artifacts/agent_trace_replay_results.json.
  • Method: docs/benchmarks/agent_trace_replay.md.
  • Current avg compression: 1.773954.
  • Current replay consistency: 1.000000.
  • Operational drift: 0.000000.
  • Interpretation: current setup is near-lossless because the fixtures are structured; this is a useful baseline, not a universal memory claim.

Multi-Family Operational Admissibility Benchmark

Iterative Replay Degradation Prototype

  • Validates: how checked-in paper and agent-trace fixtures degrade across bounded repeated compact/replay cycles.
  • Method: docs/iterative_replay_degradation.md.
  • Profile comparison: additive prototype mode compares CONSERVATIVE, BALANCED, and AGGRESSIVE compression profiles using fixture-bound aggregates only: collapse rate, replay consistency, operational drift, evidence survival, and deterministic failure labels.
  • Sensitivity analysis: additive JSON/Markdown surface varies bounded max_context_units, max_families, max_bursts, replay_window_seconds, replay_cycles, and compression_budget_scale values for fixture-bound replay degradation review.
  • Current internal baseline: see the fixture-bound comparative replay degradation results.
  • Interpretation: profile comparison rows are deterministic replay-validation observations for the current fixtures, not general memory, production, or clinical-grade claims.

Complementary adversarial replay stress suite

This suite is a separate long-horizon stress surface under reports/replay_continuity/. It remains useful context, but the focused README narrative is the deterministic operational replay benchmark family above.

System Iteration 25 Iteration 50 Iteration 100 Iteration 250
Naive 0.039 0.039 0.043 0.039
Baseline 0.294 0.294 0.294 0.294
Adaptive 0.679 0.476 0.302 0.302
Comptextv7 1.000 0.995 0.824 0.572

The committed 250-iteration report records Comptextv7 mean final continuity at 0.571783, rounded to 0.572 here. Detail fidelity still degrades: hidden truth survival is 0.570173, and evaluator agreement divergence is 0.421743.

System Approx collapse point
Naive ~1 iteration
Baseline ~10 iterations
Adaptive ~45 iterations
Comptextv7 censored at ~250 iterations in this suite

Visual artifacts

Integrity model

  • no LLM judging;
  • no embeddings;
  • no vector DBs;
  • no external APIs;
  • artifact-backed JSON + CI checks;
  • deterministic hashing foundation (docs/deterministic_hashing.md);
  • audit-friendly and CI reproducible.

Foundational Components

The system relies on the following deterministic foundations:

Limitations

  • Metrics mentioned in benchmarks are fixture-bound baselines and do not reflect real-world universal correctness.
  • Fixtures are curated and checked in.
  • Structured agent traces currently replay near-losslessly.
  • This is not solved AI memory.
  • This is not production telemetry.
  • This is not an autonomous agent framework.
  • Evaluator divergence remains material in the long-horizon stress suite.
  • Iterative degradation remains a bounded fixture prototype; its artifact and summary are review aids, not universal memory claims.

Next technical milestone

Next: continue tightening deterministic replay review surfaces. Keep repeated compact/replay artifacts cheap, deterministic, additive-compatible, and easy to inspect in CI and pull requests.

Validated deterministic replay review flow

Use this short flow when reviewing replay-system changes:

  1. Regenerate or inspect deterministic replay artifacts only from checked-in fixtures.
  2. Compare stable metric fields (replay_consistency, evidence survival rates, operational_drift_rate) and taxonomy fields (failure_labels, failure_mode_counts) rather than prose interpretations.
  3. For iterative degradation and sensitivity review, run python scripts/generate_iterative_replay_degradation_artifacts.py and inspect both the JSON artifact and Markdown summary.
  4. Treat additive artifact fields as forward-compatible when existing deterministic fields remain stable.
  5. Keep claims fixture-bound: no LLM judging, embeddings, external APIs, production-readiness claims, or solved-memory claims.

Review surfaces

The main Comptextv7 repository is the source of truth for deterministic replay-validation evidence: artifacts, benchmarks, failure labels, degradation summaries, and conservative research positioning. The visual Monaco walkthrough now lives separately in the external showcase repository.

Main repo technical evidence

Surface Link
CI Artifact Narrative docs/ci_artifact_narrative.md
Benchmark explanation docs/BENCHMARK_EXPLANATION.md
Replay failure taxonomy docs/operational_replay_failure_taxonomy.md
Iterative replay degradation artifact and CI summary docs/iterative_replay_degradation.md
Comparative replay degradation artifact and CI summary docs/iterative_replay_degradation.md#comparative-replay-degradation-results
Replay sensitivity-analysis artifact and CI summary docs/iterative_replay_degradation.md#replay-sensitivity-analysis-surface
Replay report reports/replay_continuity/validation_report.md
API surface docs/API_SURFACE.md

External Monaco showcase UI

Surface Link
Monaco showcase repository ProfRandom92/comptext-v7-monaco-showcase
Legacy demo walkthrough note docs/DEMO_WALKTHROUGH.md
Legacy showcase readiness note docs/SHOWCASE_READINESS.md

Repository map

Comptextv7/
├── artifacts/                  # committed deterministic replay benchmark JSON
├── benchmarks/                 # deterministic compression, replay, and audit runners
├── contracts/                  # machine-readable validation and handoff contracts
├── dashboard/                  # backend plus React operations console
├── docs/                       # benchmark, artifact, research, and legacy showcase notes
├── reports/replay_continuity/  # adversarial continuity metrics and SVG charts
├── scripts/                    # validation, reporting, and artifact tooling
├── showcase/app/               # legacy in-repo Vite app; Monaco UI lives in external repo
├── src/                        # KVTC engine, audit, and semantic validation modules
├── tests/                      # Python regression and replay validation tests
└── README.md

Safety boundaries

Do not commit:

  • proprietary customer data;
  • secrets, API keys, tokens, cookies, or credentials;
  • raw production logs;
  • unsanitized replay fixtures;
  • private deployment credentials or environment dumps.

Comptextv7 is a deterministic, synthetic-only research prototype for operational replay persistence and reviewable diagnostic infrastructure.

Cloud-first validation

Comptextv7 is biased toward artifact-backed review rather than local machine trust.

Workflow Role
ci.yml Runs deterministic replay, tests, telemetry, and validation gates.
agent-checks.yml Runs repository/report/contract checks plus dashboard validation.
validation_runner.yml Publishes compact cloud validation result artifacts.

Reproducibility

Install the Python test dependency set:

python -m pip install -e '.[test]'

Regenerate deterministic replay artifacts:

python tests/utils/paper_replay_runner.py
python tests/utils/agent_trace_replay_runner.py
python benchmarks/run_replay_continuity.py --iterations 250 --output-dir reports/replay_continuity
python scripts/generate_iterative_replay_degradation_artifacts.py

Use the validation commands in docs/validation.md. The root package.json is a wrapper for reviewer convenience. App dependencies remain in dashboard/app and the legacy in-repo showcase/app; the current Monaco showcase UI is maintained in ProfRandom92/comptext-v7-monaco-showcase.

Root wrapper checks:

npm run layout
npm run typecheck
npm run validate
npm run build
npm test
npm run check

Dashboard app checks:

cd dashboard/app
npm run typecheck
npm run build

Showcase app checks:

cd showcase/app
npm run typecheck
npm run validate
npm run build

Python checks from the repository root:

pytest -q
pytest tests/test_core_foundation_ts.py -q
pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py tests/test_replay_continuity.py -q

Additional repository validation helpers remain available when their surfaces are touched:

python scripts/validate.py replay
python scripts/validate.py token
python scripts/validate.py forensic
python scripts/validate_contracts.py
python scripts/validate_api_exports.py

Releases

No releases published

Packages

 
 
 

Contributors