docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license by OpenCircuitDev · Pull Request #70 · OpenCircuitDev/opencircuitmodel

OpenCircuitDev · 2026-06-11T22:59:30Z

Summary

Track 2, item 1 from docs/AGENT_OPERATIONS.md — plan-only deliverable for activating the INACTIVE mem0-v3-locomo sandbox. No code. The plan is at docs/superpowers/plans/2026-06-11-mem0-v3-locomo-activation-plan.md.

Plan scope (5 sections per the Task 3 directive)

LoCoMo dataset acquisition — repo, file, size, license, access requirements.
Harness design — files + flow, reusing the patterns from mem0-library-retrieval-recall (feat(bench): flip mem0-library-retrieval-recall to ACTIVE + discovered upstream bug #56, just merged) and amnesia-ab.
Hardware/runtime on the operator's dev box (Ollama llama3 8B Q4).
Mapping onto the locked expected.json contract — ≥88 confirm, <80 refute.
Wall-time + blockers.

Plus 7 bite-sized tasks (Task 0 = verify upstream, Task 7 = run the verdict).

NEEDS_APPROVAL — operator decision required before Tasks 2+ can run

LoCoMo ships under Creative Commons Attribution-NonCommercial 4.0 International (verified by API fetch of LICENSE.txt on snap-research/locomo@main). The OCM repo is Apache 2.0. This intersection is the only spec-touch decision and is flagged as a 4-option multi-choice in §1 of the plan.

The plan-side default is (a) fetch-on-run, no bundle — keeps the repo's Apache 2.0 file surface clean and the dataset never ships inside OCM. SHA-pinned URL for reproducibility. ~3 MB per fresh-cache run.

Operator can pick (b) bundle with attribution, (c) synthetic proxy, or (d) defer. Switching costs are documented per-option.

Key facts the plan locked down

Aspect	Value	Notes
Dataset	`data/locomo10.json` on `snap-research/locomo`	10 conversations × ~300 turns × 9K tokens avg
Size	2.81 MB (2,805,274 bytes)	Single HTTPS GET; NOT >1GB
Auth	None	No account, no API key, no payment
License	CC BY-NC 4.0	The one operator decision
Embedder	`sentence-transformers/all-MiniLM-L6-v2`	Same as #56's hermetic config
Vector store	`faiss-cpu`	Same as #56
Mem0 mode	`add(infer=False)` + `search(filters={"user_id": ...}, top_k=10)`	Same as #56
LLM (`--with-llm` only)	Ollama llama3 8B Q4 via `host.docker.internal:11434`	Same as `amnesia-ab`
Verdict run wall-time	~10-15 min for the 3-repeat run	Under the 1800s `timeout_seconds`
`--with-llm` diagnostic	OFF by default; ~30-50 min when on	Operator-asked-for E2E path, kept out of the contract metric

Mapping onto the contract (no expected.json changes)

Field	Where the value comes from
`primary_value` (`locomo_recall_score`)	`mean per-conversation recall × 100` (0-100 scale matching the published 91.6)
`secondary_value` (`tokens_retrieved_p50`)	True per-QA p50 across all questions (Task 5 fixes the naive per-conv approximation)
`status` flip INACTIVE → ACTIVE	Task 6, in the same commit that lands `bench.py` + `docker-compose.yml`
`blocked_on: ["...MCP local mode not yet packaged..."]`	Reframed in README — that's a daemon concern, not a verdict blocker; this sandbox tests the library-driven retrieval pattern (Mem0's `Memory.from_config()`, same as #56)

Test plan

This PR is documentation. No CI tests apply. The plan's own tasks (when run later) follow TDD as written.

Plan covers all 5 sections of the directive (self-review checklist at end of doc)
LoCoMo dataset facts verified by live API fetch against snap-research/locomo
License determined by fetching + base64-decoding LICENSE.txt from the repo
Patterns from feat(bench): flip mem0-library-retrieval-recall to ACTIVE + discovered upstream bug #56 + amnesia-ab cited with file paths
expected.json contract field-by-field mapping
Wall-time estimate broken down per phase
NEEDS_APPROVAL formatted as multi-choice per directive
Operator picks (a/b/c/d) on Q1 ← unblocks Tasks 2-7

What this PR does NOT do

No code changes (consistent with "plan only, no execution" in the directive).
No expected.json changes — the contract is already locked.
No dataset download — that happens in Task 2 only after Q1 is resolved.

🤖 Generated with Claude Code

…ense Plan-only deliverable (Task 3 per Pilot directive) for activating the INACTIVE mem0-v3-locomo sandbox. Covers the 5 sections the directive asked for: 1. LoCoMo dataset facts: snap-research/locomo, locomo10.json = 2.81 MB, no auth, single ~3 MB HTTPS GET. **License = CC BY-NC 4.0** — NEEDS_APPROVAL surfaced as Q1 with a 4-option multi-choice block (the only spec-touch decision; plan-side default = (a) fetch-on-run, no bundle, which keeps the repo's Apache 2.0 surface clean). 2. Harness design: file structure + conversation→memory mapping (1 Mem0 user_id per conversation, 1 memory per turn with metadata={"dia_id","session"} via infer=False), reusing #56's hermetic Mem0 config (huggingface MiniLM + faiss-cpu) and amnesia-ab's Ollama-via-host.docker.internal pattern. Optional --with-llm diagnostic for the operator-asked-for E2E path; default off so the verdict run stays under 5 min/repeat. 3. Hardware: Windows + Ollama 127.0.0.1:11434 + llama3 8B Q4 (when --with-llm on) + MiniLM-L6 for Mem0 retrieval. No new infrastructure; identical to amnesia-ab + #56 combined. README correction flagged: the spec wording "Qwen3 8B" pre-dates the Ollama wiring landed in v0.1.1; current model is llama3 8B Q4 — Task 6 fixes the README. 4. Maps onto the locked expected.json contract field-by-field: primary_value = mean per-conv recall × 100 (0-100 scale matching the published 91.6); secondary_value = true per-QA p50 tokens across all QAs; verdict logic walked through and variance bound argued. 5. Wall-time: ~2.5-4 min per repeat (retrieval-only), ~10-15 min for 3-repeat verdict; --with-llm adds ~30-50 min. Comfortably under 1800s timeout. Blockers list reconciled — the "OpenMemory MCP packaging" item in expected.json.blocked_on is a daemon-side concern, not a verdict blocker; the README clarifies this. Plus 7 bite-sized tasks (Task 0 verify upstream → Task 7 run the verdict). Task 1 is the operator-coordination NEEDS_APPROVAL gate; Tasks 2-7 are blocked on Q1 resolution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

OpenCircuitDev merged commit 9726862 into main Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license#70

docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license#70
OpenCircuitDev merged 1 commit into
mainfrom
plan/mem0-v3-locomo-activation

OpenCircuitDev commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OpenCircuitDev commented Jun 11, 2026

Summary

Plan scope (5 sections per the Task 3 directive)

NEEDS_APPROVAL — operator decision required before Tasks 2+ can run

Key facts the plan locked down

Mapping onto the contract (no expected.json changes)

Test plan

What this PR does NOT do

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant