Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 71 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,33 @@
[![MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](#installation)

---

## What is this

**ExitKit** is a Python library that measures whether an LLM agent's memory remained the same after an update. You pass two [PAM-format](https://github.com/portable-ai-memory) `MemoryStore` snapshots — one captured before a fine-tune, provider migration, or personalisation pass, one captured after — and get back a single `continuity` score from `0.0` (completely different) to `1.0` (identical), plus a structured breakdown of which memories were added, removed, or silently rewritten.

The metric combines two components: a **structural identity diff** (which memory objects changed by ID and content hash) and a **semantic drift** score (cosine distance between centroid embeddings). Both components are configurable, and you can plug in any embedding model you like.

---

## Architecture

```mermaid
flowchart TD
A[Before MemoryStore] --> C[continuity_score]
B[After MemoryStore] --> C
C --> D[_index: build ID to content_hash map]
D --> E[_identity_diff: added removed mutated]
C --> F[_semantic_drift: centroid cosine distance]
F --> G[hashing_embedder default or custom embedder]
E --> H[Weighted sum: drift = w_id times identity_diff + w_sem times semantic_drift]
F --> H
H --> I[DriftReport: continuity identity_diff semantic_drift added removed mutated]
```

---

## Why

LLM agents accumulate memory: preferences, facts, project context. As you fine-tune, switch providers, or run a personalisation pass, the memory mutates. Did the agent stay the same agent, or did you replace it?
Expand All @@ -20,6 +47,8 @@ ExitKit ports that idea to PAM-format memory snapshots and returns:
- the underlying `identity_diff` and `semantic_drift` components,
- the weights used (default 0.5 / 0.5, fully configurable).

---

## Installation

```bash
Expand All @@ -28,6 +57,8 @@ pip install exitkit

Requires Python 3.11+.

---

## Quick start

```python
Expand Down Expand Up @@ -56,7 +87,9 @@ print(report.added, report.removed, report.mutated)

See [`examples/continuer_demo.py`](examples/continuer_demo.py) for a runnable end-to-end demo.

## What the metric does
---

## How it works

`continuity_score(before, after, *, identity_weight=0.5, semantic_weight=0.5, embedder=None)` returns a `DriftReport`:

Expand All @@ -73,9 +106,15 @@ See [`examples/continuer_demo.py`](examples/continuer_demo.py) for a runnable en

Weights must each lie in `[0, 1]` and sum to 1.

### Default embedder

The default semantic component uses `hashing_embedder`: a pure-numpy, dependency-light, deterministic bag-of-words projection using the blake2b hashing trick (1024-dimensional, L2-normalised). No external model download required.

Tokenisation is `\w+` (unicode). Content containing only punctuation, whitespace, or emoji collapses to zero tokens and contributes no semantic signal under the default embedder.

### Custom embedder

The default semantic component uses a dependency-light hashing bag-of-words (pure numpy, deterministic). For richer signals, pass any `Callable[[Iterable[str]], np.ndarray]`:
For richer signals, pass any `Callable[[Iterable[str]], np.ndarray]`:

```python
from sentence_transformers import SentenceTransformer
Expand All @@ -87,35 +126,20 @@ embed = lambda texts: np.asarray(model.encode(list(texts)))
report = continuity_score(before, after, embedder=embed)
```

---

## Design notes

- **Use it as a drift binary classifier.** Threshold the `continuity` score (e.g. `>= 0.8`) to flag whether a fine-tune, migration, or personalisation pass kept the agent's memory identity intact — the toy benchmark in `tests/test_auc.py` shows the default weights are discriminative (ROC-AUC 0.7 against unrelated agents).
- **Use it as a drift binary classifier.** Threshold the `continuity` score (e.g. `>= 0.8`) to flag whether a fine-tune, migration, or personalisation pass kept the agent's memory identity intact — the toy benchmark in `tests/test_auc.py` shows the default weights are discriminative (ROC-AUC >= 0.7 against unrelated agents).
- **`semantic_drift` range depends on the embedder.** With the default `hashing_embedder` (non-negative bag-of-words), cosines lie in `[0, 1]`, so `semantic_drift` is bounded by `[0, 0.5]` for non-empty stores. Pass a sentence-transformers (or other arbitrary-direction) embedder if you need the full `[0, 1]` range. Empty vs. empty (drift = 0.0) and empty vs. non-empty (drift = 1.0) remain reachable under any embedder.
- **Weights are applied to drift, not to a normalised scale.** `identity_diff` always spans `[0, 1]`, but with the default embedder `semantic_drift` only spans `[0, 0.5]`. The default `0.5 / 0.5` weighting therefore puts roughly twice the effective weight on the identity component. Pass an embedder that spans the full cosine range, or set `identity_weight` / `semantic_weight` explicitly, if that asymmetry matters for your use case.
- **MemoryObject IDs must be unique per store.** `continuity_score` raises `ValueError` if a `MemoryStore` contains duplicate IDs — silently collapsing duplicates produced subtle false-positive `mutated` results.
- **Default tokenisation is alphanumeric (`\w+`, unicode).** Memories whose content is only punctuation, whitespace, or emoji collapse to zero tokens under `hashing_embedder` and therefore contribute no semantic signal. Pass a richer custom embedder if those signals matter.
- **One component, on purpose.** v0.1 is the *continuer-select* metric only — no UI, no provenance store, no policy engine. Cedar-based export policies and Sigstore-signed manifests are tracked for v0.2.
- **Tracking Truth aggregation.** ExitKit does not try to aggregate users or vote on values; it measures whether a single agent's memory state continues, in the closest-continuer sense.
- **Tracking Truth != aggregation.** ExitKit does not try to aggregate users or vote on values; it measures whether a single agent's memory state continues, in the closest-continuer sense.
- **Convergent goals, not endorsement.** The "agent stays the agent you trained" objective overlaps with the *positive alignment* programme described in Laukkonen et al., *Positive Alignment* (arXiv:2605.10310, 2026); cited as a convergent vision, not a methodological commitment.

## Development

```bash
git clone https://github.com/hinanohart/exitkit
cd exitkit
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check .
mypy
```

## References

- Santhosh Kumar Ravindran. *Portable Agent Memory: A Protocol for Provenance-Verified Memory Transfer Across Heterogeneous LLM Agents.* arXiv:2605.11032 (2026).
- Robert Nozick. *Anarchy, State, and Utopia.* Basic Books, 1974 — Part III, "A Framework for Utopia".
- Robert Nozick. *Philosophical Explanations.* Harvard University Press, 1981 — §1 Tracking Truth and §1 closest-continuer.
- Ruben Laukkonen, Seb Krier, Chloé Bakalar et al. *Positive Alignment: Artificial Intelligence for Human Flourishing.* arXiv:2605.10310 (2026).
---

## Audit-trail integration (memcanon)

Expand All @@ -142,6 +166,31 @@ Each record is tagged `source:exitkit` + `schema:memcanon-emit/1`. Memcanon's
Article 12(2) paragraph-mapped audit-log artefact (SHAPE only, NOT a
conformity assessment).

---

## Development

```bash
git clone https://github.com/hinanohart/exitkit
cd exitkit
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check .
mypy
```

---

## References

- Santhosh Kumar Ravindran. *Portable Agent Memory: A Protocol for Provenance-Verified Memory Transfer Across Heterogeneous LLM Agents.* arXiv:2605.11032 (2026).
- Robert Nozick. *Anarchy, State, and Utopia.* Basic Books, 1974 — Part III, "A Framework for Utopia".
- Robert Nozick. *Philosophical Explanations.* Harvard University Press, 1981 — §1 Tracking Truth and §1 closest-continuer.
- Ruben Laukkonen, Seb Krier, Chloé Bakalar et al. *Positive Alignment: Artificial Intelligence for Human Flourishing.* arXiv:2605.10310 (2026).

---

## License

MIT License — see [LICENSE](LICENSE).