From 3eaa0a0d3f4f05bda103bbdf9e7a683c1e28611e Mon Sep 17 00:00:00 2001 From: Hinano Hart Date: Wed, 10 Jun 2026 17:03:00 +0900 Subject: [PATCH] docs: clearer README + architecture diagram --- README.md | 93 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 71 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index db6724d..a8d80cf 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,33 @@ [![MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](#installation) +--- + +## What is this + +**ExitKit** is a Python library that measures whether an LLM agent's memory remained the same after an update. You pass two [PAM-format](https://github.com/portable-ai-memory) `MemoryStore` snapshots — one captured before a fine-tune, provider migration, or personalisation pass, one captured after — and get back a single `continuity` score from `0.0` (completely different) to `1.0` (identical), plus a structured breakdown of which memories were added, removed, or silently rewritten. + +The metric combines two components: a **structural identity diff** (which memory objects changed by ID and content hash) and a **semantic drift** score (cosine distance between centroid embeddings). Both components are configurable, and you can plug in any embedding model you like. + +--- + +## Architecture + +```mermaid +flowchart TD + A[Before MemoryStore] --> C[continuity_score] + B[After MemoryStore] --> C + C --> D[_index: build ID to content_hash map] + D --> E[_identity_diff: added removed mutated] + C --> F[_semantic_drift: centroid cosine distance] + F --> G[hashing_embedder default or custom embedder] + E --> H[Weighted sum: drift = w_id times identity_diff + w_sem times semantic_drift] + F --> H + H --> I[DriftReport: continuity identity_diff semantic_drift added removed mutated] +``` + +--- + ## Why LLM agents accumulate memory: preferences, facts, project context. As you fine-tune, switch providers, or run a personalisation pass, the memory mutates. Did the agent stay the same agent, or did you replace it? @@ -20,6 +47,8 @@ ExitKit ports that idea to PAM-format memory snapshots and returns: - the underlying `identity_diff` and `semantic_drift` components, - the weights used (default 0.5 / 0.5, fully configurable). +--- + ## Installation ```bash @@ -28,6 +57,8 @@ pip install exitkit Requires Python 3.11+. +--- + ## Quick start ```python @@ -56,7 +87,9 @@ print(report.added, report.removed, report.mutated) See [`examples/continuer_demo.py`](examples/continuer_demo.py) for a runnable end-to-end demo. -## What the metric does +--- + +## How it works `continuity_score(before, after, *, identity_weight=0.5, semantic_weight=0.5, embedder=None)` returns a `DriftReport`: @@ -73,9 +106,15 @@ See [`examples/continuer_demo.py`](examples/continuer_demo.py) for a runnable en Weights must each lie in `[0, 1]` and sum to 1. +### Default embedder + +The default semantic component uses `hashing_embedder`: a pure-numpy, dependency-light, deterministic bag-of-words projection using the blake2b hashing trick (1024-dimensional, L2-normalised). No external model download required. + +Tokenisation is `\w+` (unicode). Content containing only punctuation, whitespace, or emoji collapses to zero tokens and contributes no semantic signal under the default embedder. + ### Custom embedder -The default semantic component uses a dependency-light hashing bag-of-words (pure numpy, deterministic). For richer signals, pass any `Callable[[Iterable[str]], np.ndarray]`: +For richer signals, pass any `Callable[[Iterable[str]], np.ndarray]`: ```python from sentence_transformers import SentenceTransformer @@ -87,35 +126,20 @@ embed = lambda texts: np.asarray(model.encode(list(texts))) report = continuity_score(before, after, embedder=embed) ``` +--- + ## Design notes -- **Use it as a drift binary classifier.** Threshold the `continuity` score (e.g. `>= 0.8`) to flag whether a fine-tune, migration, or personalisation pass kept the agent's memory identity intact — the toy benchmark in `tests/test_auc.py` shows the default weights are discriminative (ROC-AUC ≥ 0.7 against unrelated agents). +- **Use it as a drift binary classifier.** Threshold the `continuity` score (e.g. `>= 0.8`) to flag whether a fine-tune, migration, or personalisation pass kept the agent's memory identity intact — the toy benchmark in `tests/test_auc.py` shows the default weights are discriminative (ROC-AUC >= 0.7 against unrelated agents). - **`semantic_drift` range depends on the embedder.** With the default `hashing_embedder` (non-negative bag-of-words), cosines lie in `[0, 1]`, so `semantic_drift` is bounded by `[0, 0.5]` for non-empty stores. Pass a sentence-transformers (or other arbitrary-direction) embedder if you need the full `[0, 1]` range. Empty vs. empty (drift = 0.0) and empty vs. non-empty (drift = 1.0) remain reachable under any embedder. - **Weights are applied to drift, not to a normalised scale.** `identity_diff` always spans `[0, 1]`, but with the default embedder `semantic_drift` only spans `[0, 0.5]`. The default `0.5 / 0.5` weighting therefore puts roughly twice the effective weight on the identity component. Pass an embedder that spans the full cosine range, or set `identity_weight` / `semantic_weight` explicitly, if that asymmetry matters for your use case. - **MemoryObject IDs must be unique per store.** `continuity_score` raises `ValueError` if a `MemoryStore` contains duplicate IDs — silently collapsing duplicates produced subtle false-positive `mutated` results. - **Default tokenisation is alphanumeric (`\w+`, unicode).** Memories whose content is only punctuation, whitespace, or emoji collapse to zero tokens under `hashing_embedder` and therefore contribute no semantic signal. Pass a richer custom embedder if those signals matter. - **One component, on purpose.** v0.1 is the *continuer-select* metric only — no UI, no provenance store, no policy engine. Cedar-based export policies and Sigstore-signed manifests are tracked for v0.2. -- **Tracking Truth ≠ aggregation.** ExitKit does not try to aggregate users or vote on values; it measures whether a single agent's memory state continues, in the closest-continuer sense. +- **Tracking Truth != aggregation.** ExitKit does not try to aggregate users or vote on values; it measures whether a single agent's memory state continues, in the closest-continuer sense. - **Convergent goals, not endorsement.** The "agent stays the agent you trained" objective overlaps with the *positive alignment* programme described in Laukkonen et al., *Positive Alignment* (arXiv:2605.10310, 2026); cited as a convergent vision, not a methodological commitment. -## Development - -```bash -git clone https://github.com/hinanohart/exitkit -cd exitkit -python3 -m venv .venv && source .venv/bin/activate -pip install -e ".[dev]" -pytest -ruff check . -mypy -``` - -## References - -- Santhosh Kumar Ravindran. *Portable Agent Memory: A Protocol for Provenance-Verified Memory Transfer Across Heterogeneous LLM Agents.* arXiv:2605.11032 (2026). -- Robert Nozick. *Anarchy, State, and Utopia.* Basic Books, 1974 — Part III, "A Framework for Utopia". -- Robert Nozick. *Philosophical Explanations.* Harvard University Press, 1981 — §1 Tracking Truth and §1 closest-continuer. -- Ruben Laukkonen, Seb Krier, Chloé Bakalar et al. *Positive Alignment: Artificial Intelligence for Human Flourishing.* arXiv:2605.10310 (2026). +--- ## Audit-trail integration (memcanon) @@ -142,6 +166,31 @@ Each record is tagged `source:exitkit` + `schema:memcanon-emit/1`. Memcanon's Article 12(2) paragraph-mapped audit-log artefact (SHAPE only, NOT a conformity assessment). +--- + +## Development + +```bash +git clone https://github.com/hinanohart/exitkit +cd exitkit +python3 -m venv .venv && source .venv/bin/activate +pip install -e ".[dev]" +pytest +ruff check . +mypy +``` + +--- + +## References + +- Santhosh Kumar Ravindran. *Portable Agent Memory: A Protocol for Provenance-Verified Memory Transfer Across Heterogeneous LLM Agents.* arXiv:2605.11032 (2026). +- Robert Nozick. *Anarchy, State, and Utopia.* Basic Books, 1974 — Part III, "A Framework for Utopia". +- Robert Nozick. *Philosophical Explanations.* Harvard University Press, 1981 — §1 Tracking Truth and §1 closest-continuer. +- Ruben Laukkonen, Seb Krier, Chloé Bakalar et al. *Positive Alignment: Artificial Intelligence for Human Flourishing.* arXiv:2605.10310 (2026). + +--- + ## License MIT License — see [LICENSE](LICENSE).