A real memory system for LLM assistants. Cross-model, cross-session, hybrid-retrieval, bi-temporal — exposed via the Model Context Protocol so Claude Code, Codex, Cursor, and any future MCP client share the same persistent brain.
LLM context windows are big but not persistent. Every new session you re-explain your stack, re-state your preferences, re-paste the same docs, and lose the small observations that build into expertise. Vanilla RAG over a folder of files doesn't fix it — it has no memory of when a fact stopped being true, no graph of how things relate, no awareness of the conversation that produced a note.
LLM Memory gives an assistant a real memory:
- Notes that supersede each other when reality changes (
X used to … now …is a first-class operation, not a deletion). - A graph that links a fact to the conversation that produced it, traversable bi-temporally (what did I believe last March? what's still valid now?).
- A retrieval pipeline that returns the right thing, not the most recent — vector + BM25 + personalized PageRank + LLM rerank, with per-stream provenance attached to every hit.
- Cross-model on purpose: Claude, GPT, Gemini, and Anthropic-via-Bedrock all read from the same store. Your memory follows you between assistants.
A typical Claude Code session — three calls, three different shapes of memory:
The same store is also reachable as plain HTTP (POST /api/search, GET /api/notes, GET /api/backup/download), through a Blazor admin UI, or via the memory CLI.
| 🧬 Hybrid retrieval | Vector (pgvector cosine, 3072-dim) + BM25 (tsvector) + Graph PPR (HippoRAG-2-style) + optional cross-modal image vector. Reciprocal Rank Fusion → time decay → LLM rerank. Per-hit provenance. |
| 📝 Multi-note extraction | Each ingested episode is split by an LLM into 1-5 atomic Zettelkasten-style notes (Decision / Pattern / Observation / Learning / Error). Single batched embedding call. |
| 🕒 Bi-temporal graph | AGE Cypher with valid_from / valid_to / recorded_at / invalidated_at. Supersession is a first-class operation, not a delete. Cytoscape graph viewer renders dashed edges for invalidated facts. |
| 🔗 A-MEM auto-linking | New notes are linked to similar prior notes by similarity + LLM judgment, building a self-organizing knowledge web rather than a flat list. |
| 🪞 Reflection hierarchy | Background service synthesizes recent notes into reflections; meta-reflections fold across prior reflections to surface long-arc themes (Letta sleep-time pattern). |
| 🔐 Multi-tenant by RLS | Org → User → Project hierarchy enforced at the database layer via Postgres RLS + a memory_app NOBYPASSRLS role. API keys (SHA-256-hashed) resolve tenant scope. Admin keys gate /api/secrets/*. |
| 🎛️ Multi-provider LLM | Azure OpenAI (Foundry v1), OpenAI direct, Anthropic, AWS Bedrock chat, Google Vertex chat. Switch via config — no code change. Embedding provider is independent from chat. |
| 🧰 MCP, REST, Web, CLI | MCP stdio for Claude Code / Codex / Cursor; MCP HTTP at /mcp for cloud agents; REST at /api/*; Blazor admin UI at /; memory CLI for ops. |
| 📦 Backup, eval, ops | One-click tenant zip (/api/backup/download), retrieval eval harness (Recall@K + MRR), Markdown round-trip (Obsidian-compatible), Azure Key Vault → OpenBao → JSON secret chain. |
Working. Hybrid retrieval, multi-note extraction, bi-temporal supersession, A-MEM auto-linking, reflection hierarchy, multi-tenant RLS, multi-provider LLM, MCP stdio + HTTP, Blazor admin UI, Azure Key Vault → OpenBao → JSON secret-source chain, backup zip, image embeddings, webhooks, dashboard — all live and verified end-to-end.
Retrieval baseline on the dev tenant: Recall@1 = 93%, Recall@3 = 100%, MRR = 0.96 across 15 LLM-generated queries.
Security: tenant isolation enforced at the DB layer (NOBYPASSRLS), /api/secrets/* requires admin-scoped API keys, header-based tenant fallback is dev-only, CORS allow-listed in production. See SECURITY.md.
Claude Code / Codex / Cursor ──stdio──▶ Memory.Mcp.Stdio
Other MCP clients ──HTTP+SSE──▶ Memory.Api (also: REST + Blazor)
Web UI (Blazor WASM) ──HTTPS──▶ Memory.Api
Memory.Pipeline ingest, search, reflect, save-filter, query expansion, linker
Memory.Llm multi-provider IChatClient + IEmbeddingGenerator
Memory.Storage EF Core + AGE Cypher + pgvector
Memory.Secrets Azure KV → OpenBao → JSON config-provider chain
Memory.Tenancy Org/User/Project AsyncLocal scope
┌──────────────────────────────────────┐
│ PostgreSQL 16 │
│ • pgvector (cosine, 3072-dim) │
│ • Apache AGE 1.6 (memory_graph) │
│ • RLS via memory_app NOBYPASSRLS │
│ • bi-temporal edges │
│ (valid_from/to, recorded_at, │
│ invalidated_at) │
└──────────────────────────────────────┘
Memory.AppHost (.NET Aspire 13) orchestrates the API and binds an external connection string for Postgres — local dev uses an existing local Postgres (no Docker, by project rule).
Deep dive: docs/ARCHITECTURE.md has the full data-flow diagrams for save_episode and search_memory, the key invariants, and the operational gotchas.
- .NET 10 SDK 10.0.101 (pinned in
global.json). - Postgres 16 with both extensions:
pgvector0.8+Apache AGE1.6+ (no native Windows build — see "AGE on Windows" below)
- One LLM provider with chat + embeddings. Currently wired: Azure OpenAI (Foundry v1), OpenAI direct, Anthropic, AWS Bedrock chat, Google Vertex chat.
- (Optional) Azure Key Vault and/or OpenBao if you want secrets out of
appsettings.Local.json.
wsl --install -d Ubuntu-22.04
wsl -d Ubuntu-22.04 -- bash -lc '
sudo apt update &&
sudo apt install -y postgresql-16 postgresql-server-dev-16 build-essential git &&
cd /tmp && git clone https://github.com/apache/age.git && cd age &&
git checkout release/PG16/1.6.0 &&
make PG_CONFIG=/usr/lib/postgresql/16/bin/pg_config &&
sudo make PG_CONFIG=/usr/lib/postgresql/16/bin/pg_config install
'Add to /etc/postgresql/16/main/postgresql.conf inside WSL:
shared_preload_libraries = 'age'
restart Postgres, then create the graph from psql:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT create_graph('memory_graph');WSL2 forwards localhost ports to Windows automatically — the .NET app on Windows connects to Host=localhost;Port=5432 (or whatever port you used).
# 1. Create the database
createdb -h localhost -U postgres llm_memory
psql -h localhost -U postgres -d llm_memory -f scripts/init-db.sql
# 2. Apply EF Core migrations as superuser. Creates schemas, tables, RLS policies,
# and the memory_app role with NOBYPASSRLS. (postgres bypasses RLS; memory_app
# does NOT — your runtime must connect as memory_app.)
export MEMORY_DESIGN_CONNSTR="Host=localhost;Port=5432;Database=llm_memory;Username=postgres;Password=YOUR_PASS"
dotnet ef database update --project src/Memory.Storage
# 3. Seed an Org / User / Project — the tenant scope every request will run under.
dotnet run --project src/Memory.Cli -- init \
--connection-string "$MEMORY_DESIGN_CONNSTR" \
--org "MyOrg" --user-email "me@example.com" --user-name "Me" \
--project "default" --embedding-model "text-embedding-3-large"
# → prints three GUIDs (org / user / project)
# 4. Mint an API key for /api/* and /mcp.
dotnet run --project src/Memory.Cli -- api-key create \
--connection-string "$MEMORY_DESIGN_CONNSTR" \
--org <org-guid> --user <user-guid> --project <project-guid> \
--name "claude-code"
# Add --admin if this key should be allowed to call /api/secrets/*.
# 5. Configure Memory.Api appsettings.Local.json (copy from .example, fill in
# LLM creds and the runtime connection string with Username=memory_app).
# 6. Run.
dotnet run --project Memory.AppHostThe Aspire dashboard prints the URLs. Click memory-api; visit /, /openapi/v1.json, /api/health. The Blazor admin UI is on Memory.Web.
For Claude Code / Codex CLI:
dotnet build src/Memory.Mcp.Stdio
cp .mcp.json.example .mcp.json # then fill in your bearer tokenThe MCP server registers save_episode, search_memory, get_entity, reflect, find_related_notes — and they appear as tools in your client.
All optional, all opt-in via appsettings.Local.json or env vars.
| Section / env var | Effect |
|---|---|
Linking:Enabled=true |
A-MEM auto-link new notes to similar prior notes |
Reranker:Enabled=true |
LLM reranker scores top fused candidates (off → RRF order) |
GraphRetrieval:Enabled=true |
PPR over the entity graph as a 3rd RRF stream |
TimeDecay:Enabled=true (HalfLifeDays=30) |
Recency boost on hits before reranking |
QueryExpansion:Enabled=true (MaxQueryWords=4) |
LLM rewrites short queries into 2-3 variants |
SaveFilter:Enabled=true (MinScore=0.30) |
LLM judges importance pre-ingest; drops noise |
Cors:AllowedOrigins:[…] |
Production CORS allowlist (dev allows everything) |
ReflectionSchedule:Enabled=true + Tenants:[…] |
BG synthesis per tenant on a fixed interval |
MEMORY_KV_URI |
Pull secrets from Azure Key Vault (DefaultAzureCredential) |
MEMORY_BAO_ADDR + MEMORY_BAO_TOKEN |
Pull secrets from OpenBao / HashiCorp Vault |
The secret-source chain is JSON < OpenBao < Azure KV (later wins); each layer is opt-in by setting its env vars.
src/
Memory.Domain/ entities, typed IDs, NoteKind enum
Memory.Tenancy/ ITenantContext + AmbientTenantContext (AsyncLocal)
Memory.Storage/ EF Core DbContext, AGE Cypher wrapper, RLS interceptor
Memory.Llm/ 5-provider gateway over Microsoft.Extensions.AI
Memory.Pipeline/ ingest, search (vector+BM25+PPR+rerank), reflect,
save-filter, query-expansion, A-MEM linker
Memory.Mcp/ MCP tools shared across stdio + HTTP
Memory.Mcp.Stdio/ console exe — local MCP for Claude Code / Codex CLI
Memory.Cli/ `memory` CLI — init, api-key, backup, chat, tenants, eval
Memory.Api/ ASP.NET Core: REST + MCP HTTP + secrets admin
Memory.Web/ Blazor WASM — Search, Notes, Entities, Graph, Secrets
Memory.Secrets/ IConfigurationSource chain: Azure KV / OpenBao / JSON
Memory.ServiceDefaults/ Aspire shared OpenTelemetry / resilience
Memory.AppHost/ .NET Aspire orchestration
tests/ Domain / Storage / Pipeline / E2E
scripts/ init-db.sql, smoke-test-api.sh, smoke-test-mcp.sh
docs/ seven focused docs — see Documentation below
Built-in eval harness for measuring search quality:
# 1. Sample N recent notes; LLM writes one realistic query per note (gold = note id)
memory eval gen-queries --count 30 --out eval-queries.json
# 2. Replay through /api/search; print Recall@K + MRR
memory eval run --top-k 10Run before/after a pipeline tweak (Reranker / GraphRetrieval / QueryExpansion / TimeDecay env vars) to see the actual delta. Without numbers, every "improvement" is a guess.
memory init Seed an organization / user / project tenant scope.
memory api-key {create,list,revoke}
Manage Memory.Api bearer-token API keys.
--admin marks a key as eligible for /api/secrets/*.
memory backup {dump,restore,download}
Tenant data backup. dump/restore = direct DB JSON;
download = HTTP-streamed .zip from any deploy.
memory chat Conversational REPL against /api/search etc.
memory tenants {provision-schema,list,drop-schema}
Schema-per-org tenancy foundation.
memory eval {gen-queries,run}
Retrieval evaluation (Recall@K + MRR).
memory help (or memory <cmd> help) prints flags.
Three layers of coverage, gated so a fresh checkout doesn't need any creds:
- Pure unit (
Memory.Domain.Tests,Memory.Llm.Tests,Memory.Api.Tests, fast tests inMemory.Pipeline.Tests): no DB, no LLM, no network. RRF fusion math, slug generation, API-key hashing, admin-scope filter, LLM gateway provider routing, etc. ~46 tests; runs in seconds.dotnet test --filter "FullyQualifiedName!~Live"
- Live integration (
LivePg-flagged tests inMemory.Storage.TestsandMemory.Pipeline.Tests): hit the user's local Postgres + AGE + pgvector. Provision viaMEMORY_TEST_PG_PASSWORD/MEMORY_TEST_PG_PORTenv vars. By project rule: real DB, no Testcontainers. - Live LLM (
LiveLlm-flagged tests): real provider calls (no mocks ofIChatClient/IEmbeddingGenerator). Gated byMEMORY_LIVE_LLM_TESTS=1so they never fire by accident — token cost is real.
scripts/smoke-test-api.sh is the comprehensive E2E probe — health, list
endpoints, search variants, faceted filters, expansion, RLS isolation.
scripts/smoke-test-mcp.sh exercises the stdio MCP path: initialize →
save_episode → search_memory → reflect.
Detailed docs live under docs/:
| Doc | What it covers |
|---|---|
| ARCHITECTURE.md | Module map, data flow diagrams for save_episode and search_memory, key invariants, where state lives, operational gotchas. |
| API.md | Every HTTP endpoint with curl examples — health, search, ingest, streaming chat, webhooks, eval, secrets admin, backup, MCP transport. |
| CONFIGURATION.md | Every config section + env var. Defaults, sources, the secret-source chain (Azure KV → OpenBao → JSON). |
| MCP-INTEGRATION.md | How to wire to Claude Code, Codex CLI, Cursor, Continue, ChatGPT desktop. Cross-model usage patterns. |
| USE-CASES.md | Practical setups for programming notes, health log, personal life, research, shared collaboration. |
| PRIVACY.md | What leaves your machine, by default. Per-provider retention. Recommended setups for sensitive content. Threat model. |
| OPERATIONS.md | Daily start-up, healthcheck, mint API keys, backup/restore, Markdown round-trip, eval, migrations, OpenBao + Azure KV ops, troubleshooting. |
Release notes for each version: CHANGELOG.md.
Found a vulnerability? See SECURITY.md. Tenant isolation, key hashing, admin-scoped secret endpoints, and the secret-source chain are the load-bearing pieces — please report cleanly before opening a public issue.
MIT — © 2026 Damian Tarnowski. Use it, fork it, ship it. No warranty.