Empathic memory engine for AI companions. Event-level semantic retrieval + typed belief vocabulary + emotion-aware ranking.
Part of the Garden project — therapeutic-grade AI companion infrastructure.
Most memory engines for LLM companions answer the wrong question:
"What text is most similar to this query?"
Garden Pulse answers the harder one:
"Given who this person is, what moment from their life should surface right now?"
Pure cosine retrieval treats these pairs identically:
| Pair | Cosine says | Empathic companion needs |
|---|---|---|
| "relapsed after 10 years sober" vs "had a beer with a friend" | same-ish | wildly different — one is crisis, one is mundane |
| invitation from a close friend vs invitation from a distant coworker | same-ish | wildly different — relationship weight matters |
| "found a paper on memory retrieval" vs "read a tweet about memory" | same-ish | wildly different — one changes your project |
Garden Pulse disambiguates via typed belief weights, emotional signatures, and recency per belief class.
┌─────────────────────────────────────────────────────────┐
│ Ingestion: conversations → typed observations │
│ triage (Sonnet) → extract (Opus) → graph apply │
├─────────────────────────────────────────────────────────┤
│ Storage: SQLite + event-level embeddings │
│ • entities / relations / facts │
│ • events with typed belief_class + confidence_floor │
│ • event_embeddings (text-embedding-3-large, 3072d) │
├─────────────────────────────────────────────────────────┤
│ Retrieval: │
│ │
│ score = cosine(query, event) │
│ × exp(-λ[belief_class] · days_ago) │
│ (floored at cosine × confidence_floor) │
│ │
│ 5 belief classes (axiom → hypothesis) │
│ confidence_floor preserves core wounds │
│ archivable flag pins defining moments │
└─────────────────────────────────────────────────────────┘
Chat/extract calls go through an alias-driven model registry, never raw
vendor IDs. Aliases are resolved against a RegistryConfig with strict
policy validation, then dispatched through provider adapters that share
a common Go interface.
┌─────────────────────────────────────────────────────────┐
│ Caller (server, extractor, smoke CLI) │
│ │ │
│ ▼ │
│ model.Router.Chat({alias, system, messages, ...}) │
│ │ policy: pro/do_only must be DigitalOcean │
│ │ policy: api.openai.com forbidden in P0 │
│ │ policy: runtime provider override blocked │
│ ▼ │
│ Provider adapter │
│ • anthropic → api.anthropic.com (subscription) │
│ • doinference → inference.do-ai.run (DO Pro) │
│ • openaicompat → local LM Studio / vLLM / proxy │
└─────────────────────────────────────────────────────────┘
The production server (cmd/pulse) wires the router with the Anthropic
provider and the anthropic/opus alias by default — this preserves the
existing subscription billing path. To swap models or add a backend,
edit examples/models.json (or set PULSE_MODELS_PATH) — no code
change required. Secrets stay in environment variables (api_key_env
in the registry); JSON config files are validated with
DisallowUnknownFields and never carry keys.
Smoke a model end-to-end without touching the server:
go run ./cmd/pulse-smoke --model anthropic/opus --prompt "respond with ok"Design notes: docs/superpowers/specs/2026-05-03-pulse-multi-provider-model-layer.md.
8 LLM judges × 8 model checkpoints (6 vendor families) × 35 tests × 5 axes, on bench v3 corpus:
| System | overall | core | stateful | chain | multi_signal |
|---|---|---|---|---|---|
| cosine | 4.83 | 6.05 | 0.17 | 1.75 | 5.34 |
| bm25 | 3.07 | 4.24 | 0.01 | 1.83 | 1.59 |
| hybrid | 4.43 | 5.65 | 0.29 | 2.58 | 3.64 |
| Pulse v3 | 6.38 | 6.90 | 6.44 | 4.50 | 6.26 |
Delta vs best baseline per axis: overall +1.55 (+32% vs cosine), stateful +6.15 (×22 vs hybrid / ×38 vs cosine), chain +1.92 (+74% vs hybrid), core +0.85 (no regression vs cosine). Opus wins 26/35 tests (74%). Krippendorff α on stateful axis = 0.81 (strong) — cross-judge consensus.
Judges: Moonshot Kimi K2.6 + K2-0711-preview, Z.ai GLM-5 + GLM-5.1, Alibaba Qwen3-Max, DeepSeek V3.2, OpenAI GPT-5.4, Anthropic Claude Opus 4.7 — 8 model checkpoints across 6 vendor families.
| benchmark | score | notes |
|---|---|---|
| LongMemEval_S (ICLR 2025, 500 Qs) | 68.89% | overall, -3.2 pts vs oracle |
| ES-MemEval (Feb 2026, 1427 Qs) | 76% (1.519/2.0 LLM-judge) | comparable to gpt-4o+RAG |
| LoCoMo (ACL 2024, 1986 Qs × 10 convs) | 32.51% F1, 62.78% adv refusal | cosine + Cohere embed |
These three are run with Pulse v2_pure — the cosine-plus-recency baseline Pulse v3 collapses to when no state / emotion / anchor signals are provided (none of the external datasets carry those fields). v3 == v2_pure on this data by construction, so the external numbers validate Pulse's foundation, not v3's conditional boosts specifically (those are validated only on bench v3's stateful / chain / multi-signal axes, which is what empathic-memory bench v3 exists for).
See github.com/nikshilov/bench for reproduction scripts and raw JSON.
On a 47-query empathic subset of the project owner's real personal corpus (85 events, 3-judge cross panel: gpt-4o / gpt-4o-mini / gemini-2.5-flash):
| System | Mean /30 |
|---|---|
| Pulse v2_pure | 28.71 ± 1.40 🏆 |
| LangMem | 28.95 ± 1.61 |
| sqlite-vec | 28.82 ± 1.44 |
| LlamaIndex | 28.09 ± 2.86 |
| Mem0 (infer=False) | 21.75 ± 0.61 |
Pulse v2_pure still wins the OpenAI-embedding cluster by +6.96 pts (+33%) over Mem0 on that bench. Key finding at the time: storage format matters more than ranking sophistication — full event text retrieval beats digest/fact extraction on the same embedder. v3 extends v2 with conditional boosts that stack only when their signals genuinely exist.
v3 wraps v2_pure with five conditional boosts — each term activates only when its input signal exists, so queries without state / emotion / anchor information produce bit-identical results to v2_pure (no regression on plain retrieval).
score = cosine
× exp(-λ[belief_class, user_flag] · days_ago) # anchor-aware decay
× (1 + β · emotion_alignment) if query_emotion ≥ 0.5 # conditional emotion boost
× (1 + γ · state_fit) if body stressed/restored # conditional state boost
× (1 + δ_anchor · user_flag) if rank ≤ 8 # anchor-priority
× (1 + δ_date · date_proximity) if snapshot_days_ago set # date-proximity
Key ideas:
- Anchor-aware decay (λ_anchor = 0.001) — events with
user_flag=true(structural-truth anchors like marriage, grief, identity events) decay twice as slowly as regular events. Half-life 693d vs 347d. Matches the v2user_modeltier exactly. - Conditional gating — Phase D (2026-04-20) proved that an always-on emotion cosine term monotonically hurts retrieval (β=0 → β=3 drops NDCG from 43.77 to 27.72). v3 activates emotion boost only when the query has a dominant emotion (max ≥ 0.5 after query-emotion inference). Same discipline applies to state boost (gated by body-stressed or body-restored signals) and anchor boost (gated by
user_flag=true). - Emotion-hint query augmentation (Phase 5.5) — when
user_state.mood_vectorhas a dominant emotion, a short hint string is appended to the query before embedding (e.g. "conflict navigation repair" for anger, "wound self-blame rejection" for shame). This is what lifts the stateful axis from 3.60 to 6.60 single-handedly. - Date-proximity boost (Phase 5.2) — when
user_state.snapshot_days_agois provided (e.g. from a real Apple Health snapshot), events whosedays_agois close get a small boost via a stepped curve (same day = 1.0, within a week = 0.7, etc.). - Chain expansion — if
return_chain=True, top-K events are expanded viaevent_chainstable (BFS depth 3) and returned as an ordered sequence rather than a set.
Schema additions (migration 015):
event_emotions— Plutchik-10 floats per event (joy, sadness, anger, fear, trust, disgust, anticipation, surprise, shame, guilt)event_chains—parent_id → child_idwithstrengthandkind(for causal/temporal links)query_emotion_cache— inference cache for the emotion classifier
Source files:
scripts/extract/retrieval_v3.py— full Python implementationscripts/extract/emotion_classifier.py— Qwen-based Plutchik taggerinternal/store/migrations/015_emotions_chains.sql— schema
Tests: scripts/tests/test_retrieval_v3.py — includes no-regression property (v3 with no state == v2_pure).
Five typed classes with per-class decay rates (migration 014):
| Class | Decay λ | Half-life | Floor default | Use case |
|---|---|---|---|---|
axiom |
0.0 | ∞ | 0.0 | Permanent truths — core wounds, companion identity |
self_model |
0.0005 | ~1400 days | 0.0 | Companion's introspective facts |
user_model |
0.001 | ~700 days | 0.0 | User's psychological profile, long-term preferences |
operational |
0.003 | ~230 days | 0.0 | Day-to-day context (default) |
hypothesis |
0.005 | ~140 days | 0.0 | Provisional reads awaiting confirmation |
confidence_floor ∈ [0, 1] — minimum post-decay score. A user_model belief with floor=0.85 stays salient against semantic match even at 10+ years old.
archivable: 0 — pins event against consolidation/archival.
provenance — audit trail: memory_pattern / interactive_memory / idle_background / sleep_reflection / manual.
- Python 3.11+
- Go 1.22+
- OpenAI API key (for
text-embedding-3-large) - Anthropic API key (for extraction pipeline: Sonnet triage + Opus extract)
git clone git@github.com:nikshilov/pulse.git
cd pulse
# Go build
go build -o bin/pulse ./cmd/pulse
# Python deps (if using scripts)
pip install openai anthropicmake build # compile bin/pulse
make test # Go (./...) + Python (scripts/tests/)
make run # start the server on 127.0.0.1:18789
make demo # ingest -> retrieve end-to-end (see examples/03-end-to-end)
make help # list all targetsRunnable examples live in examples/ — three minimal
Python scripts (stdlib only) demonstrating ingest, retrieval, and a
chained end-to-end demo against a running server.
# Creates pulse.db with all 14 migrations applied
bin/pulseimport sqlite3
from extract.retrieval_v2 import embed_events, retrieve_events
con = sqlite3.connect("pulse.db")
# One-time: backfill embeddings for existing events
embed_events(con, embedder_model="openai-text-embedding-3-large")
# Retrieve top-3 relevant events
events = retrieve_events(
con,
query="how are things with my partner today?",
top_k=3,
embedder_model="openai-text-embedding-3-large",
)
for e in events:
print(f" #{e['id']} ({e['belief_class']}, {e['days_ago']}d ago): {e['text'][:80]}")
print(f" score={e['score']:.3f} cosine={e['cosine']:.3f} λ={e['effective_lambda']}")INSERT INTO events (title, description, sentiment, ts, belief_class, confidence_floor, archivable)
VALUES (
'core-wound',
'Never been chosen without proving value first. Mother''s conditional worth.',
-2.0,
'2020-01-15T00:00:00Z',
'axiom', -- no decay
0.85, -- preserved even if retrieval finds 0.2 cosine match
0 -- never archivable
);python3 -m pytest scripts/tests/ -q
# 387 passed, 7 skippedTest coverage includes:
- Retrieval correctness (cosine, recency, top-k bounds, cross-model isolation)
- Belief vocabulary (axiom zero-decay, hypothesis fast-decay, floor preservation, CHECK constraints)
- Graceful fallback on pre-migration-014 databases
- All 18 migrations applied cleanly
- E2E extraction pipeline
- Domain marker coercion + glossary post-process propagation
- Qwen3 multi-pass extraction (single / two-pass critic / four-pass specialist)
- Backend dispatch (anthropic-legacy / sonnet-max / local-qwen3) + fallback rate-limit guards
After the DigitalOcean billing crisis (2026-05-06), the extraction pipeline moved from Anthropic SDK to a pluggable backend system. All backends are selectable per-run via env or CLI:
# Default: Sonnet 4.6 via Max subscription (free, best quality)
python scripts/pulse_extract.py --db pulse.db --backend sonnet-max
# Local-only: Qwen3-30B-A3B 4bit MLX via mlx-lm
PULSE_EXTRACT_QWEN3_TWO_PASS=true \
python scripts/pulse_extract.py --db pulse.db --backend local-qwen3
# Per-dimension specialist: 4 sequential Qwen3 calls (entities → R/F/Ev)
PULSE_EXTRACT_QWEN3_SPECIALIST=true \
python scripts/pulse_extract.py --db pulse.db --backend local-qwen3
# Opt-in Opus quality (free under Max but +67% latency)
PULSE_EXTRACT_MAX_MODEL=opus \
python scripts/pulse_extract.py --db pulse.db --backend sonnet-max
# Legacy Anthropic SDK path (preserved as rollback)
python scripts/pulse_extract.py --db pulse.db --backend anthropic-legacy --budget 5Bench results (60 fixtures, see bench repo):
| backend | F1 (lenient) | latency | $ per 10K obs |
|---|---|---|---|
| sonnet-max + glossary | 0.41 | 30s | $0 (Max plan) |
| opus + glossary | 0.41 (similar) | 50s | $0 (Max plan) |
| local-qwen3 4bit MLX (1-pass) | 0.29 | 14s | $0 |
| local-qwen3 + 2-pass critic | 0.45 (Phase A) | 28s | $0 |
| anthropic-legacy (Sonnet+Opus) | 0.45 | 35s | ~$200 |
Production default: sonnet-max. local-qwen3 for offline / opt-in bulk.
Facts and events carry an explicit domain enum: real | fiction_content | fiction_meta | meta_authorial. This separates real-life observations from
authorial work on a fiction project (e.g. the user's autobiographical novel
"Соня"), preventing retrieval from mixing fictional events with real-life
decisions.
Powered by extract/pulse_glossary.py — 21 known fictional characters +
book-work regex patterns. Glossary hint is injected into extraction system
prompts; post-process propagates domain from fiction-kind entities to their
facts/events as a safety net.
scripts/pulse_book_skeleton.py ingests Obsidian-vault chapters as
lightweight observations (~500 chars per chapter: title, top characters,
first paragraph anchor). Avoids extracting every scene as graph events
(would explode graph size 4×). Full chapter content stays on disk;
lazy_load_chapter() reads on demand when retrieval semantically points
to a chapter and full content is needed.
scripts/bench_graph_viewer.py --db PULSE.db --out viewer.html produces a
self-contained HTML viewer with vis-network: convex-hull domain zone
overlays, per-entity primary_domain colour-coding, click-to-detail panel
with facts/events/relations/observations, search/filter, optional cluster-by-
domain physics. Used to inspect Pulse memory visually after extraction.
- v1 — entity-level keyword-BFS retrieval (superseded)
- v2_pure — event-level semantic retrieval (current production default)
- Belief vocabulary — migration 014, 5 typed classes
- v3 emotion + state graph — Plutchik-10 tags, chain table, conditional emotion/state/anchor/date boosts, SOTA on bench v3 (overall +32% vs cosine, stateful ×22 vs hybrid)
- External validation — LongMemEval_S 68.89%, ES-MemEval 76%, LOCOMO 32.51%
- Judge-built GT bench at scale — 200+ queries, multi-corpus
- MCP server — expose
retrieve_memoryas a tool for any MCP-compatible harness - Longitudinal evaluation — track retrieval quality as user's corpus grows over 6+ months
| Dimension | Mem0 | Zep | Graphiti | LangMem | sqlite-vec | Garden Pulse |
|---|---|---|---|---|---|---|
| Storage format | LLM-extracted facts | Messages | Temporal KG | Key-value | Vector only | Full events + typed classes |
| Retrieval | Vector | Vector+graph | Cypher+vector | Vector | Vector | Vector + per-class decay + floor |
| Emotional weight | none | none | none | none | none | built-in (v3 shipped) |
| Stateful retrieval | no | no | no | no | no | yes (mood_vector + body state) |
| Belief types | none | none | none | none | none | 5 typed classes |
| Core-wound preservation | no | no | no | no | no | yes (confidence_floor) |
| Empathic bench | — | — | — | — | — | SOTA on bench v3 (6.38 overall vs cosine 4.83; stateful ×22 vs hybrid / ×38 vs cosine) |
Garden Pulse is purpose-built for personal, emotional memory where events carry weight beyond their semantic content. Other engines are excellent at "find similar text" — Pulse answers "what matters for this person now?".
See CONTRIBUTING.md for dev setup, test commands,
code style, and PR guidelines. Runnable examples in examples/.
Issue tracker: https://github.com/nikshilov/pulse/issues
MIT — see LICENSE.
Built as part of Garden. Maintained by Elle and Nikita Shilov.