feat(scripts): audit-bd-memories — near-duplicate + retired-surface scanner (soc-lgq4 #memory-audit)#377
Open
boshu2 wants to merge 2 commits into
Open
feat(scripts): audit-bd-memories — near-duplicate + retired-surface scanner (soc-lgq4 #memory-audit)#377boshu2 wants to merge 2 commits into
boshu2 wants to merge 2 commits into
Conversation
fbf987a to
086329e
Compare
…canner (soc-lgq4 #memory-audit) 154+ bd memories accumulating. This script computes pairwise Jaccard on content tokens (default threshold 0.65) and scans for retired-surface keywords. Emits markdown report at .agents/audits/bd-memories-<date>.md; does not auto-delete (operator runs bd forget selectively). Flags: --threshold --out --stdout --retired --no-retired --no-dups --json. Pure shell + awk; bd stubbed in tests for determinism. Smoke against real memories surfaced 3 near-dup pairs + 23 retired-surface mentions. Closes-scenario: soc-lgq4#memory-audit Bounded-context: BC1-Corpus Evidence: scripts/audit-bd-memories.sh Evidence: tests/scripts/audit-bd-memories.bats
086329e to
290a0ea
Compare
Owner
Author
|
Parking pending soc-g2qd (/evolve --mode=loop epic). Both this PR's surface and the epic touch overlapping infrastructure; resolving the upstream contract first avoids rebase churn. Will rebase + resume after epic ships. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
154+ bd memories as of 2026-05-20. Without curation, recall quality degrades — same lesson stored multiple ways, and old lessons reference surfaces that have been retired (Ollama, shepherd-cron, gemma pipelines, etc.). No existing tool surfaces these candidates.
What
scripts/audit-bd-memories.sh— pure-shell audit that:bd memoriesoutput (text, not JSON —bd memoriesdoesn't support--jsontoday).ollama,shepherd-cron,openclaw,gemma,morai-codex,d:\\dream,dreamworker; override with--retired <csv>)..agents/audits/bd-memories-<date>.mdwith Summary, Near-duplicates, and Retired-surface-candidates sections.Does NOT auto-delete. Operator reviews and selectively runs
bd forget <key>.Flags:
--threshold <0..1>--out <path>--stdout--retired <csv>--no-retired--no-dups--jsonTest
11 bats tests, all passing. Cover: empty corpus, count parsing on 4-memory fixture, default file-write path,
--stdout, near-dup detection, threshold sensitivity (0.99 rejects 0.75 jaccard),--no-dups/--no-retiredsuppression,--retiredoverride, unknown-flag error, missing-bd-binary error. Stubsbdvia PATH for deterministic test input.Live signal
Smoke-run against this repo's real
bd memories(154 memories) detected 3 near-duplicate pairs and 23 retired-surface mentions — actionable backlog for the next memory-curation pass.Performance
154 memories × pairwise = ~12k Jaccard computations. Wall-clock ~3 min on this box. Acceptable for an audit; not in any hot path.
Closes-scenario: soc-lgq4#memory-audit
Bounded-context: BC1-Corpus
Evidence: scripts/audit-bd-memories.sh
Evidence: tests/scripts/audit-bd-memories.bats