docs(plans): mem0-v3-locomo activation plan + NEEDS_APPROVAL on license#70
Merged
Merged
Conversation
…ense
Plan-only deliverable (Task 3 per Pilot directive) for activating the
INACTIVE mem0-v3-locomo sandbox. Covers the 5 sections the directive
asked for:
1. LoCoMo dataset facts: snap-research/locomo, locomo10.json = 2.81 MB,
no auth, single ~3 MB HTTPS GET. **License = CC BY-NC 4.0** —
NEEDS_APPROVAL surfaced as Q1 with a 4-option multi-choice block (the
only spec-touch decision; plan-side default = (a) fetch-on-run, no
bundle, which keeps the repo's Apache 2.0 surface clean).
2. Harness design: file structure + conversation→memory mapping (1
Mem0 user_id per conversation, 1 memory per turn with
metadata={"dia_id","session"} via infer=False), reusing #56's
hermetic Mem0 config (huggingface MiniLM + faiss-cpu) and
amnesia-ab's Ollama-via-host.docker.internal pattern. Optional
--with-llm diagnostic for the operator-asked-for E2E path; default
off so the verdict run stays under 5 min/repeat.
3. Hardware: Windows + Ollama 127.0.0.1:11434 + llama3 8B Q4 (when
--with-llm on) + MiniLM-L6 for Mem0 retrieval. No new infrastructure;
identical to amnesia-ab + #56 combined. README correction flagged:
the spec wording "Qwen3 8B" pre-dates the Ollama wiring landed in
v0.1.1; current model is llama3 8B Q4 — Task 6 fixes the README.
4. Maps onto the locked expected.json contract field-by-field:
primary_value = mean per-conv recall × 100 (0-100 scale matching the
published 91.6); secondary_value = true per-QA p50 tokens across all
QAs; verdict logic walked through and variance bound argued.
5. Wall-time: ~2.5-4 min per repeat (retrieval-only), ~10-15 min for
3-repeat verdict; --with-llm adds ~30-50 min. Comfortably under
1800s timeout. Blockers list reconciled — the "OpenMemory MCP
packaging" item in expected.json.blocked_on is a daemon-side concern,
not a verdict blocker; the README clarifies this.
Plus 7 bite-sized tasks (Task 0 verify upstream → Task 7 run the
verdict). Task 1 is the operator-coordination NEEDS_APPROVAL gate;
Tasks 2-7 are blocked on Q1 resolution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Track 2, item 1 from
docs/AGENT_OPERATIONS.md— plan-only deliverable for activating the INACTIVEmem0-v3-locomosandbox. No code. The plan is atdocs/superpowers/plans/2026-06-11-mem0-v3-locomo-activation-plan.md.Plan scope (5 sections per the Task 3 directive)
mem0-library-retrieval-recall(feat(bench): flip mem0-library-retrieval-recall to ACTIVE + discovered upstream bug #56, just merged) andamnesia-ab.expected.jsoncontract —≥88confirm,<80refute.Plus 7 bite-sized tasks (Task 0 = verify upstream, Task 7 = run the verdict).
NEEDS_APPROVAL — operator decision required before Tasks 2+ can run
LoCoMo ships under Creative Commons Attribution-NonCommercial 4.0 International (verified by API fetch of
LICENSE.txtonsnap-research/locomo@main). The OCM repo is Apache 2.0. This intersection is the only spec-touch decision and is flagged as a 4-option multi-choice in §1 of the plan.The plan-side default is (a) fetch-on-run, no bundle — keeps the repo's Apache 2.0 file surface clean and the dataset never ships inside OCM. SHA-pinned URL for reproducibility. ~3 MB per fresh-cache run.
Operator can pick (b) bundle with attribution, (c) synthetic proxy, or (d) defer. Switching costs are documented per-option.
Key facts the plan locked down
data/locomo10.jsononsnap-research/locomosentence-transformers/all-MiniLM-L6-v2faiss-cpuadd(infer=False)+search(filters={"user_id": ...}, top_k=10)--with-llmonly)host.docker.internal:11434amnesia-abtimeout_seconds--with-llmdiagnosticMapping onto the contract (no expected.json changes)
primary_value(locomo_recall_score)mean per-conversation recall × 100(0-100 scale matching the published 91.6)secondary_value(tokens_retrieved_p50)statusflip INACTIVE → ACTIVEbench.py+docker-compose.ymlblocked_on: ["...MCP local mode not yet packaged..."]Memory.from_config(), same as #56)Test plan
This PR is documentation. No CI tests apply. The plan's own tasks (when run later) follow TDD as written.
snap-research/locomoLICENSE.txtfrom the repoamnesia-abcited with file pathsexpected.jsoncontract field-by-field mappingWhat this PR does NOT do
expected.jsonchanges — the contract is already locked.🤖 Generated with Claude Code