Skip to content

docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license#70

Merged
OpenCircuitDev merged 1 commit into
mainfrom
plan/mem0-v3-locomo-activation
Jun 11, 2026
Merged

docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license#70
OpenCircuitDev merged 1 commit into
mainfrom
plan/mem0-v3-locomo-activation

Conversation

@OpenCircuitDev

Copy link
Copy Markdown
Owner

Summary

Track 2, item 1 from docs/AGENT_OPERATIONS.mdplan-only deliverable for activating the INACTIVE mem0-v3-locomo sandbox. No code. The plan is at docs/superpowers/plans/2026-06-11-mem0-v3-locomo-activation-plan.md.

Plan scope (5 sections per the Task 3 directive)

  1. LoCoMo dataset acquisition — repo, file, size, license, access requirements.
  2. Harness design — files + flow, reusing the patterns from mem0-library-retrieval-recall (feat(bench): flip mem0-library-retrieval-recall to ACTIVE + discovered upstream bug #56, just merged) and amnesia-ab.
  3. Hardware/runtime on the operator's dev box (Ollama llama3 8B Q4).
  4. Mapping onto the locked expected.json contract≥88 confirm, <80 refute.
  5. Wall-time + blockers.

Plus 7 bite-sized tasks (Task 0 = verify upstream, Task 7 = run the verdict).

NEEDS_APPROVAL — operator decision required before Tasks 2+ can run

LoCoMo ships under Creative Commons Attribution-NonCommercial 4.0 International (verified by API fetch of LICENSE.txt on snap-research/locomo@main). The OCM repo is Apache 2.0. This intersection is the only spec-touch decision and is flagged as a 4-option multi-choice in §1 of the plan.

The plan-side default is (a) fetch-on-run, no bundle — keeps the repo's Apache 2.0 file surface clean and the dataset never ships inside OCM. SHA-pinned URL for reproducibility. ~3 MB per fresh-cache run.

Operator can pick (b) bundle with attribution, (c) synthetic proxy, or (d) defer. Switching costs are documented per-option.

Key facts the plan locked down

Aspect Value Notes
Dataset data/locomo10.json on snap-research/locomo 10 conversations × ~300 turns × 9K tokens avg
Size 2.81 MB (2,805,274 bytes) Single HTTPS GET; NOT >1GB
Auth None No account, no API key, no payment
License CC BY-NC 4.0 The one operator decision
Embedder sentence-transformers/all-MiniLM-L6-v2 Same as #56's hermetic config
Vector store faiss-cpu Same as #56
Mem0 mode add(infer=False) + search(filters={"user_id": ...}, top_k=10) Same as #56
LLM (--with-llm only) Ollama llama3 8B Q4 via host.docker.internal:11434 Same as amnesia-ab
Verdict run wall-time ~10-15 min for the 3-repeat run Under the 1800s timeout_seconds
--with-llm diagnostic OFF by default; ~30-50 min when on Operator-asked-for E2E path, kept out of the contract metric

Mapping onto the contract (no expected.json changes)

Field Where the value comes from
primary_value (locomo_recall_score) mean per-conversation recall × 100 (0-100 scale matching the published 91.6)
secondary_value (tokens_retrieved_p50) True per-QA p50 across all questions (Task 5 fixes the naive per-conv approximation)
status flip INACTIVE → ACTIVE Task 6, in the same commit that lands bench.py + docker-compose.yml
blocked_on: ["...MCP local mode not yet packaged..."] Reframed in README — that's a daemon concern, not a verdict blocker; this sandbox tests the library-driven retrieval pattern (Mem0's Memory.from_config(), same as #56)

Test plan

This PR is documentation. No CI tests apply. The plan's own tasks (when run later) follow TDD as written.

  • Plan covers all 5 sections of the directive (self-review checklist at end of doc)
  • LoCoMo dataset facts verified by live API fetch against snap-research/locomo
  • License determined by fetching + base64-decoding LICENSE.txt from the repo
  • Patterns from feat(bench): flip mem0-library-retrieval-recall to ACTIVE + discovered upstream bug #56 + amnesia-ab cited with file paths
  • expected.json contract field-by-field mapping
  • Wall-time estimate broken down per phase
  • NEEDS_APPROVAL formatted as multi-choice per directive
  • Operator picks (a/b/c/d) on Q1 ← unblocks Tasks 2-7

What this PR does NOT do

  • No code changes (consistent with "plan only, no execution" in the directive).
  • No expected.json changes — the contract is already locked.
  • No dataset download — that happens in Task 2 only after Q1 is resolved.

🤖 Generated with Claude Code

…ense

Plan-only deliverable (Task 3 per Pilot directive) for activating the
INACTIVE mem0-v3-locomo sandbox. Covers the 5 sections the directive
asked for:

1. LoCoMo dataset facts: snap-research/locomo, locomo10.json = 2.81 MB,
   no auth, single ~3 MB HTTPS GET. **License = CC BY-NC 4.0** —
   NEEDS_APPROVAL surfaced as Q1 with a 4-option multi-choice block (the
   only spec-touch decision; plan-side default = (a) fetch-on-run, no
   bundle, which keeps the repo's Apache 2.0 surface clean).

2. Harness design: file structure + conversation→memory mapping (1
   Mem0 user_id per conversation, 1 memory per turn with
   metadata={"dia_id","session"} via infer=False), reusing #56's
   hermetic Mem0 config (huggingface MiniLM + faiss-cpu) and
   amnesia-ab's Ollama-via-host.docker.internal pattern. Optional
   --with-llm diagnostic for the operator-asked-for E2E path; default
   off so the verdict run stays under 5 min/repeat.

3. Hardware: Windows + Ollama 127.0.0.1:11434 + llama3 8B Q4 (when
   --with-llm on) + MiniLM-L6 for Mem0 retrieval. No new infrastructure;
   identical to amnesia-ab + #56 combined. README correction flagged:
   the spec wording "Qwen3 8B" pre-dates the Ollama wiring landed in
   v0.1.1; current model is llama3 8B Q4 — Task 6 fixes the README.

4. Maps onto the locked expected.json contract field-by-field:
   primary_value = mean per-conv recall × 100 (0-100 scale matching the
   published 91.6); secondary_value = true per-QA p50 tokens across all
   QAs; verdict logic walked through and variance bound argued.

5. Wall-time: ~2.5-4 min per repeat (retrieval-only), ~10-15 min for
   3-repeat verdict; --with-llm adds ~30-50 min. Comfortably under
   1800s timeout. Blockers list reconciled — the "OpenMemory MCP
   packaging" item in expected.json.blocked_on is a daemon-side concern,
   not a verdict blocker; the README clarifies this.

Plus 7 bite-sized tasks (Task 0 verify upstream → Task 7 run the
verdict). Task 1 is the operator-coordination NEEDS_APPROVAL gate;
Tasks 2-7 are blocked on Q1 resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@OpenCircuitDev OpenCircuitDev merged commit 9726862 into main Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant