Persistent SQLite DB index with cache-aware search by tony · Pull Request #47 · tony/agentgrep

tony · 2026-06-06T17:52:17Z

Summary

Add a persistent SQLite DB index (ADR 0005): agentgrep db sync|status|explain materializes discovered agent history into a local WAL-mode database with an FTS5 text index, a source ledger with freshness state, and stable source/record ids. The DB is derived state — Codex, Claude, Cursor, Gemini, Grok, Pi, and OpenCode stores remain the source of truth.
Add cache-aware search execution: grep and search gain --cache auto|require|off (and --no-cache). The default auto serves a query from the DB index only when it can answer it and falls back to the live scanner otherwise; require insists on the DB path, and unsupported cache-required queries fail as clean CLI errors instead of tracebacks.
Add live stderr sync progress with cooperative early exit — Enter stops at the next source boundary and keeps partial results; structured stdout stays machine-readable in JSON and NDJSON modes.
Add cache-fast sync defaults: source_state fingerprint checks skip unchanged sources unless --force is passed, and a source-id index keeps per-source replacement fast.
Add human-readable semantic summaries as the default text output for db payloads, with explicit --json/--ndjson machine modes.
Add the read-only db_status MCP tool.

Changes by area

DB index

src/agentgrep/db.py: DbStore (SQLite/WAL/FTS5 schema: source ledger plus source_state, normalized records), DbRuntime sync/status/search, sync progress protocol, freshness fingerprints, and a default path under the XDG cache directory overridable via AGENTGREP_DB or --db.

Search integration

src/agentgrep/__init__.py, src/agentgrep/_engine/: cache-mode plumbing through SearchRuntime so search-shaped commands consult the DB index under --cache auto|require and preserve the live-scan path under --no-cache.

CLI

src/agentgrep/cli/parser.py, src/agentgrep/cli/render.py: the db command group with progress flags, semantic human summaries in text mode, JSON/NDJSON machine modes, and clean CLI errors for unsupported cached queries.

MCP

src/agentgrep/mcp/tools/db_tools.py, src/agentgrep/mcp/models.py: the read-only db_status tool with its pydantic response model and capability registration.

Docs

ADR 0005; per-command CLI pages for db, plus argparse-backed pages for the existing search and ui commands; a DB-index development page; DB cache controls in the configuration guide.

Design decisions

The DB is a materialized read model, never the source of truth: deleting the database loses cached records only; agent history stores stay untouched and agentgrep stays read-only over them.
--cache auto is conservative: the index only answers queries it can satisfy; anything else falls through to the live scanner, and --cache require turns unsupported queries into explicit CLI errors.
Sync skips by fingerprint, not by guess: source_state records the synced mtime and content fingerprint; --force is the explicit full-rebuild path.

Test plan

tests/test_db_index.py — DbStore schema and sync behavior: freshness skips, --force resync, FTS queries, deterministic ids, early exit
tests/test_cache_cli.py — CLI contracts for db and cache modes: JSON and NDJSON stdout, human summaries, progress output, error paths
tests/test_agentgrep_mcp.py — read-only db_status payloads and capability registration
Full gate per commit: ruff check / ruff format, ty check, uv run pytest (incl. doctests), just build-docs

Stacked PR

The insights/suggestions surface that previously shared this branch now lives in its own PR stacked on this one.

Closes #49.

tony · 2026-06-06T19:53:09Z

Code review

Found 3 issues:

_source_fingerprint stats only the main source file, so WAL-mode SQLite stores (Cursor state.vscdb, Grok, OpenCode) whose commits land in the -wal sidecar look unchanged — source_is_current returns true and db sync skips them, serving stale records. This is the same failure mode trunk commit f41a2df fixed for the in-memory scan cache in _engine/scanning.py (_source_scan_cache_key includes the sidecar's size/mtime for source_kind == "sqlite"); the DB freshness path doesn't carry that fix forward.

agentgrep/src/agentgrep/db.py

Lines 231 to 243 in 1042dae

    
           def _source_fingerprint(source: agentgrep.SourceHandle) -> str: 
        
               """Return a cheap source fingerprint for cache freshness checks.""" 
        
               try: 
        
                   stat = source.path.stat() 
        
               except OSError: 
        
                   size = 0 
        
                   mtime_ns = source.mtime_ns 
        
               else: 
        
                   size = stat.st_size 
        
                   mtime_ns = stat.st_mtime_ns 
        
               return text_hash(f"{source.path}\0{size}\0{mtime_ns}")

_db_status_sync opens a DbRuntime (one sqlite3.connect per call) and never closes it. In the long-lived MCP server every db_status call leaks a connection/file descriptors (verified: 3 fds per call), and unclosed WAL-mode connections can suppress checkpointing. DbStore.close() exists but is unreachable here — DbRuntime has no close/context-manager.

agentgrep/src/agentgrep/mcp/tools/db_tools.py

Lines 38 to 40 in 1042dae

    
               ) 
        
           status = DbRuntime.open(path).status() 
        
           return DbStatusModel(

The docstring of test_collect_search_records_does_not_import_profiler_when_inactive was changed to "The non-profiled db path keeps the profiler module unloaded", but the test exercises collect_search_records (the record-collection path) and has nothing to do with the db command — the original "collection path" wording referred to record collection, not a command name, so the new text is factually wrong.

agentgrep/tests/test_engine_profiling.py

Lines 693 to 695 in 1042dae

    
           ) -> None: 
        
               """The non-profiled db path keeps the profiler module unloaded.""" 
        
               source = agentgrep.SourceHandle(

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

tony · 2026-06-06T22:21:29Z

Follow-up: all review findings addressed, including the ones below the posting threshold, one commit per issue.

Finding	Commit
WAL sidecars invisible to sync freshness (posted, #1)	`agentgrep(fix[db]): Fingerprint SQLite WAL sidecars in sync freshness`
db_status connection leak (posted, #2)	`agentgrep(fix[db]): Close per-call DB runtimes`
Wrong profiling docstring (posted, #3)	`agentgrep(fix[test]): Describe the profiling guard test accurately`
FTS5 external-content deletes with placeholder values	`agentgrep(fix[db]): Delete FTS rows with their stored values`
Paste-cache cross-file adapters served stale syncs	`agentgrep(fix[db]): Exempt cross-file adapters from sync freshness skips`
Whole-token FTS dropped substring matches under `--cache require`	`agentgrep(fix[db]): Match cached searches by exact substring semantics` (trigram tokenizer + exact post-filter; rebuild policy in ADR 0008)
Cache path bypassed per-session dedup	`agentgrep(fix[db]): Apply per-session dedup to cached search results`
Read-only `db_status` wrote schema metadata	`agentgrep(fix[db]): Open status surfaces read-only`
`db explain` identical to `db status`	`agentgrep(feat[db]): Report cache diagnostics from db explain`
Missing doctests on pure helpers	`agentgrep(test[db]): Add doctests to pure db and render helpers`

Found along the way: the test suite could read the developer's real cache via default_db_path() — now isolated per-test (agentgrep(test[db]): Isolate the user cache from the test suite), and ADR 0008 records the cache schema rebuild policy. The stacked insights PR picks up the same lifecycle/read-only treatment for its tool surfaces.

tony · 2026-06-06T23:38:20Z

Round-2 follow-up: the second review pass (both PRs) surfaced 12 findings; none crossed the posting threshold, but the legitimate ones are fixed here, one commit per issue.

Finding	Commit	PR
Result cap applied before per-session dedup undercounted unique results	`fix[db]: Count result caps in unique cached records`	#47
Cache candidates covered title+text only (live matches model/role/path) and casefold-expanding text escaped trigram matching	`fix[db]: Index the casefolded live match surface` — one design fix for both: the cache now indexes the same casefolded haystack the live matcher uses	#47
Schema rebuild dropped only base tables, leaving stale artifact rows	`fix[db]: Drop artifact tables in schema rebuilds` (+ sqlite_master drift test)	#48
CLI read actions (insights list/explain, suggestions show/render/list) opened read-write	`fix[insights]: Open CLI read actions read-only`	#48
Suggestion listings unbounded vs the insights pattern	`feat[suggestions]: Bound persisted suggestion listings` (`--limit`, totals, truncation, MCP fields)	#48
Function-local cheap stdlib import	`fix[insights]: Import json at module level`	#48
Missing doctests on pure helpers	`test[db]: Add doctests to remaining pure helpers`	#48

Declined with reasons: SCHEMA_VERSION bump for the new tables (nothing has shipped; the schema is born final), the explain ok/error overlap (the offending row state has no writer), the _pytest alias (load-bearing under the TYPE_CHECKING-only import), Parameters docstring sections (repo convention uses short prose for internal helpers), and the private exemption-constant import (single source of truth, behavior-tested).

tony · 2026-06-07T02:10:49Z

Code review

Found 1 issue:

profile_engine.py records cache_mode in profiler artifacts but the profiled path never consults the cache. payload["cache_mode"] is read from AGENTGREP_CACHE, yet profile_search_query builds its query pipeline without a SearchRuntime/DbRuntime, so _db_search_result early-returns and every component measures a live scan. Running AGENTGREP_CACHE=require produces live-scan timings labeled cache_mode: "require" (and off changes nothing), which misleads benchmark analysis and conflicts with Add explicit cache bypass controls for profiling and benchmarks #49's acceptance criterion that AGENTGREP_CACHE "selects the same modes for CLI, MCP server, benchmark, and profiler subprocesses". Either honor the mode in the profiled path (build a DbRuntime when the env var asks for it) or record cache_mode: "off"/omit the field for engine profiles so the artifact reflects what was measured.

agentgrep/scripts/profile_engine.py

Lines 297 to 299 in c4b8fdc

    
           payload["artifact_kind"] = PROFILE_RUN_ARTIFACT_KIND 
        
           payload["cache_mode"] = os.environ.get("AGENTGREP_CACHE", "auto") 
        
           payload["agent_count"] = len(agents)

Profiled path that never reaches the cache:

agentgrep/src/agentgrep/_engine/profiling.py

Lines 203 to 210 in c4b8fdc

    
           def profile_search_query( 
        
               home: pathlib.Path, 
        
               query: agentgrep.SearchQuery, 
        
               *, 
        
               backends: agentgrep.BackendSelection | None = None, 
        
               control: agentgrep.SearchControl | None = None, 
        
               runtime: SearchRuntime | None = None,

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

tony · 2026-06-07T11:07:39Z

Round-3 follow-up: the cache-correctness review (cross-thread SQLite, scope mismatch, partial coverage, stale rows) plus the posted profiler issue are addressed, one commit per issue.

Open cache connections in the consulting thread — SearchRuntime gains a per-consult opener; the MCP server no longer holds a connection the asyncio.to_thread worker can't use. End-to-end regression test reproduces the prior ProgrammingError through fastmcp.Client.
Probe and close search-path cache runtimes — the CLI's open-time DatabaseError guard now actually fires (read-only connects are lazy), and both search paths close what they open.
Mirror live scope semantics in cached search — conversations scope filters by store role (user turns included again); prompts scope gates chat stores per agent like the planner.
Gate auto-mode cache hits on sync coverage — partial syncs (--agent/--scope/--limit-sources/interrupted) no longer pass off subsets as answers; db explain surfaces the coverage map; new partial-coverage span reason.
Prune vanished sources on complete full syncs — deleted/rotated history files stop answering cached searches; narrowed syncs never prune.
Honor AGENTGREP_CACHE in engine profiles — resolves the review comment above: profile_search_query consults the cache like run_search_query, so require-mode artifacts now measure the cache path (decision span present, no collect phases) instead of mislabeled live scans.
Smaller items: empty agent selections return empty from the cached path (IN () hazard removed), the event-stream docstring documents the cache-served shape, AGENTS.md artifact field enumerations list cache_mode, and the benchmark guide's fused paragraphs are separated.

The insight-tool error-handling fix (foreign cache files return empty payloads like db_status) landed on the stacked #48 branch.

why: The doctest guidance scopes runnable examples to pure helpers with no external state. The db normalization, hashing, FTS quoting, and count-formatting helpers qualify and had none. what: - Add Examples sections to normalize_record_text, token_set, text_hash, and _quote_fts_term. - Add Examples sections to the four db count formatters.

why: The cache applied the result limit before per-session dedup, so cross-source duplicates inside the first N rows returned fewer than N unique records. The live driver dedups during collection, where a cap always means N unique records. what: - Dedup inside DbStore.search_records after sorting and before the limit slice, owning the event-stream uniqueness invariant in one place. - Drop the now-redundant dedup pass in _db_search_result. - Cover limit-with-duplicates and no-dedup-with-limit with named cases, and pin the duplicate-storage test to an undeduped query.

why: The cache generated candidates from raw title and text while the live engine matches a five-field haystack with Python case folding, so records matching a term only through model, role, or path — or through casefold expansions like Strasse for Straße — never reached the post-filter, and --cache require silently under-returned. what: - Store a casefolded haystack column built from the same build_record_match_surface helper the live matcher uses, and point the trigram FTS index and the scan prefilter at it. - Casefold query terms before MATCH so both sides fold in Python; the trigram index never re-folds, letting non-ASCII terms use FTS and removing the ASCII guard from _fts_indexable. - Keep the live-predicate post-filter authoritative so text-surface queries narrow correctly from haystack candidates. - Cover model, role, path, casefold-expansion, and text-surface parity with named cases. Pre-release dev caches need one resync.

why: Benchmark harnesses, CI jobs, and MCP server configuration blocks cannot pass per-command flags, so cache behavior needs an environment lever. A tri-state mirror of --cache covers both cold runs (off) and honest warm runs (require, which fails loudly instead of silently measuring live scans). A cross-project audit (issue #49) found no ecosystem env-var convention to borrow, so the name follows the existing AGENTGREP_* prefix. what: - Resolve cache mode as flag > AGENTGREP_CACHE > auto via pure, doctested helpers; invalid values fail at parse time with a clean message. - Honor the variable in the MCP server runtime, attaching the cache read-only when it exists; the server never writes the cache file. - Document the variable in the configuration guide and cover resolution precedence, the invalid-value error, and the server runtime with named cases.

why: Cache-served searches opened the DB through the migration path, writing schema metadata on every cached query, and --cache require with a missing cache silently created an empty database and returned zero results. Reads must not mutate the cache, and require should fail loudly so warm benchmark runs cannot measure an accidental live scan. what: - Open the CLI search runtime via DbRuntime.open_readonly. - Error cleanly on --cache require when the cache is missing or not an agentgrep database, without creating files; auto keeps falling back to the live scanner. - Assert byte-identical cache files across cached searches and the clean require/auto missing-cache behaviors.

why: Cache behavior needs transparency to be trustworthy: profilers and benchmarks must be able to see whether a query was served from the DB cache, fell back to the live scanner, and why. One aggregate sample per consulted query keeps the telemetry n+1-safe — never a span per record. what: - Emit a search.cache.decision sample from the cache choke point with the cache mode, handled flag, served record count, and fallback reason (no-db, unsupported, or empty) — privacy-safe scalars only. - Document the span in the profiling vocabulary (AGENTS.md and the benchmark dev page). - Cover served, empty-fallback, unsupported-fallback, and off paths with named cases.

why: Cache-aware search makes timings ambiguous unless each run states its cache posture. Cold entries bypass the cache outright; warm entries serve from a pre-synced bench-scoped cache under require mode, so a cold cache fails the bench instead of silently timing live scans. what: - Add paired *-cache-cold-* and *-cache-warm-* bench entries; warm entries declare a setup_command that syncs the bench-scoped cache once before timing. - Support the optional setup_command key in the harness, failing the row cleanly when setup exits non-zero. - Record cache_mode on every measurement row (parsed from the env prefix, which stays visible in the sanitized command string) and in profiler artifacts. - Document cold and warm invocations in the benchmark guide and cover extraction, pairing, and metadata with named cases.

why: Cache levers grew across the flag surface, the environment, and the profiling spans; the architecture record should state the policy they implement so future cache layers inherit it deliberately. what: - Add a "Cache control and transparency" section to ADR 0005: every cache layer must expose a bypass lever and a hit-or-miss signal, with the AGENTGREP_DB / --cache + AGENTGREP_CACHE / decision-span levers and their precedence.

why: SQLite connections are bound to their creating thread, and the MCP search tool runs iter_search_events through asyncio.to_thread. The server built its runtime with an open read-only connection on the server thread, so the first cache-enabled MCP search raised sqlite3.ProgrammingError instead of serving records — auto mode could not even fall back because the error is not a DbQueryUnsupported. what: - Add SearchRuntime.db_opener, a per-consult factory the consulting thread calls; _db_search_result opens through it when no db handle is set and closes what it opened in the same thread. - Replace the MCP server's held connection with _open_cache_runtime: missing files return None (existing no-db semantics), and a probe query surfaces foreign or corrupt files at open time, where read-only connects are otherwise lazy. - Rewrite the cache-env runtime test for opener semantics, exercise the opener from a spawned thread across synced, foreign, and missing cache files, and add an end-to-end require-mode search through fastmcp.Client that fails with the cross-thread error before this fix.

why: Read-only SQLite connects are lazy, so the open-time DatabaseError guard in _db_runtime_for_cli never fired — a foreign file under --cache require surfaced as a traceback inside the search instead of the documented clean exit. The search helpers also never closed the runtime they opened, leaking the connection until process exit. what: - Probe the schema-version row after the read-only open so foreign or corrupt cache files exit 2 under require and degrade to a live scan under auto, closing the half-open connection either way. - Close the cache runtime in both CLI search paths: try/finally in the eager helper and a generator finally in the event-stream helper so early-breaking consumers still release the connection. - Cover foreign-file require/auto behavior and all three close paths with named cases; give the unsupported-query stub the close hook the search path now exercises.

why: The cached scope predicates filtered by record kind while the live pipeline filters by store role plus planner source selection. Chat adapters emit user turns as kind='prompt', so cached conversations-scope searches dropped every user turn, and cached prompts-scope searches returned chat-store user turns for agents whose live plan reads only their dedicated prompt-history store — diverging from the uncached scanner in both directions. what: - Filter conversations scope by store role through the same store_role_for_record catalog lookup the live record filter uses; drop the kind predicate that excluded user turns. - Keep kind='prompt' as a correct superset SQL prefilter for prompts scope and gate conversation-store records per agent against the synced source ledger, mirroring prompt_history_agents_for_sources and source_matches_scope. - Cover both divergence directions and the unchanged scopes with named parity cases over codex (chat + prompt store) and pi (chat only) records; extend the db test helpers with kind, agent, store, scope, and agents parameters.

why: Auto mode trusted any non-empty cache hit, but db sync ships agent, scope, and source-cap levers and can exit early, so the index can legitimately cover less than a query. A codex-only sync followed by an all-agents search silently served codex-only results as the answer. what: - Record SyncCoverage (agents, scope, complete) in the meta table when a sync loop finishes uninterrupted; capped runs pass complete=False so they never claim coverage, and merges preserve what earlier full syncs established for other agents. - Fall back to the live scanner in auto mode with a partial-coverage span reason whenever coverage does not span the query's agents and scope; require keeps serving because the caller demanded the cache. - Surface the coverage map through db explain (JSON field and a text line distinguishing "not recorded" from an empty map) and document the auto-mode link on the explain page. - Cover the gate, the merge semantics, the early-exit no-write path, the CLI's args-to-coverage mapping, and the explain rendering with named cases.

why: A resync only replaces records for sources present in the current discovery, so a deleted or rotated history file left its ledger row, records, and FTS entries behind forever — cached searches kept returning records the live scanner no longer sees, with no remediation short of deleting the cache file. what: - After an uninterrupted sync loop with prune_missing set, delete ledger rows absent from the batch set through the external-content FTS delete path; freshness-skipped sources are part of the batch set and stay. SyncResult gains a sources_pruned counter and the removed records fold into records_removed. - Enable pruning from the CLI only for uncapped, full-scope, all-agents syncs — a narrowed run does not observe the full catalog and must not delete what it skipped. Early-exited loops never prune. - Render the pruned count in the sync summary and document the pruning rule on the sync page. - Cover prune/keep behavior, the early-exit guard, and the CLI args-to-prune mapping with named cases.

why: Profiler artifacts recorded the requested cache mode while the profiled path never consulted the DB cache, so a require-mode run produced live-scan timings labeled as warm-cache evidence — exactly the misleading metadata the cache-transparency policy forbids. what: - Consult the DB cache at the top of profile_search_query, mirroring run_search_query: a served query returns the cached records with the search.cache.decision span in the profile and skips the discover/plan/collect phases it no longer executes. - Build the profiler's search runtime from the resolved cache mode in scripts/profile_engine.py — off profiles the live engine, auto and require attach the read-only DB when one exists — and record the resolved mode in the payload. - Cover require-serves-from-cache and off-profiles-live with named cases asserting which spans the artifact carries.

… selections why: An empty query.agents tuple generated WHERE r.agent IN (), a nonstandard form some SQLite builds reject, and diverged from live semantics where an empty selection discovers zero sources. Only the direct library surface can construct this query — the CLI and MCP normalize to a non-empty selection — but a public API should not depend on undocumented SQLite tolerance. what: - Short-circuit DbStore.search_records to an empty result for an empty agent selection, before any SQL is built. - Cover the guard with a synced-store test asserting live parity.

why: The module docstring enumerates the event-stream invariants, but the DB-cache early exit emits SearchStarted(source_count=0) with no per-source events — an undocumented shape that makes source_count=0 ambiguous between "nothing discovered" and "served from cache" for consumers and maintainers of the protocol. what: - Describe the cache-served envelope in the iter_search_events module docstring and name the search.cache.decision span as the disambiguator.

why: The profiling and benchmarking sections enumerate the metadata fields artifacts carry, and both artifact kinds gained a cache_mode field the enumerations did not mention — agents reading the project rules would omit it when interpreting artifacts. what: - Add cache_mode to the profiler-artifact and benchmark-artifact field sentences with one line each on where the value comes from.

why: Two cache-section insertions ran into adjacent sentences without a blank line, fusing distinct paragraphs in the rendered page — the cold-path benchmark lead-in merged into the span inventory, and the privacy paragraph merged into the decision-span description. what: - Add the missing paragraph breaks before the cold-path benchmark lead-in and after the search.cache.decision description.

…oint why: Query-strategy decisions for the cache need evidence about how SQLite behaves under our statements - per-statement timing, statement counts per operation, and rows touched - none of which the profiler could see. agentgrep owns every SQL call site in DbStore, so an explicit execute helper observes everything without cursor proxies or the sqlite trace callback, whose expanded-SQL output would leak bound search terms into telemetry. what: - Add _query/_execute/_executescript helpers that time each statement, accumulate per-statement-name counters, and DEBUG-log the statement text with placeholders only - bound parameters are never captured. - Name every statement shape (records.search_fts, records.insert, fts.delete, meta.get, sources.upsert, ...) and route all DbStore statements through the helpers. - Flush one aggregated db.sql.statement profile sample per statement name at the end of search_records, status, explain, and sync_records - a high agentgrep_sql_count on one sample is the n+1 signal; sync loops never emit per-record samples. - Cover the FTS and scan statement names, the sync n+1 aggregation, the silent no-profiler path, and a sentinel-term proof that no log message, extra, or sample attribute carries bound parameters.

why: Statement timings alone cannot show whether SQLite served a query from an index or fell back to a scan. The cache-transparency policy calls for a lever plus a signal for every cache layer; this adds the planner signal without any always-on cost. what: - Run EXPLAIN QUERY PLAN once per statement shape when AGENTGREP_SQL_EXPLAIN is set, joining the plan detail rows - table, index, and strategy names only, never bound parameters - onto the statement's aggregated sample as agentgrep_sql_plan and into the DEBUG log. - Echo the lever as sql_explain in profiler artifacts next to cache_mode so plan-bearing payloads are self-describing. - Cover FTS and scan plan capture, the off-by-default path, the once-per-shape EXPLAIN guarantee, and the artifact echo with named cases.

…ever why: The span vocabulary and cache-transparency policy enumerate the observability contract; the new db.sql.statement samples and the AGENTGREP_SQL_EXPLAIN lever belong in that inventory, with the privacy boundary stated where users will look for it. what: - Add db.sql.statement to the AGENTS.md span vocabulary with the n+1-signal reading of agentgrep_sql_count and the placeholders-only privacy rule. - Extend ADR 0005's cache-transparency section with the SQLite-layer spans and lever. - Document AGENTGREP_SQL_EXPLAIN in the configuration guide and the benchmark guide with one plan-capturing profile invocation.

why: The strategy study against the real 439k-record cache showed the probe-shaped query is about twice as fast against a narrow table as against the wide records table at every term frequency - page density is the lever - and the split also stores the haystack once instead of twice, shrinking the file. This commit is the behavior-identical storage swap; the probe lands separately so each is bisectable. what: - Replace the records table with records_search (ids, kind, agent, store, adapter, path, timestamp, session identity, hashes) and record_details (text, title, role, model, metadata) cascading from it, plus a content-full trigram FTS table that owns the haystack. - Make FTS removal a plain DELETE by rowid set: content-full FTS does not need the external-content 'delete' command bookkeeping, removing that corruption class outright. - Insert each record across the three surfaces; serve search, record iteration, and lookups through the search/details join; count records_search for status; run the short-term scan against the FTS table's own haystack column. - Add a round-trip fidelity test (every field survives the split) and update the source-delete index probe and telemetry statement names.

…robe why: The cached search fetched every matching row at full width before the limit slice - 164 MB pumped for a hot term - because dedup-before- limit kept the LIMIT out of SQL. The 65-run strategy study measured the fix: a lean ordered probe with keyset continuation plus per-survivor hydration runs 6-8x faster end-to-end and beats the live scanner at every term frequency. The measured rejects are equally explicit: instr pushdown (trigram MATCH proven exact on the haystack surface, so it only drags the wide column into the probe), GROUP BY MAX(rowid) (returned wrong top-50s), window-function dedup (materializes full partitions), and a stored sort column (FTS yields unordered rowids, so the index never engages). what: - Probe limited searches with lean columns ordered by the live sort tuple plus rowid DESC - the unique total order - in windows of max(4*limit, 200), continuing via a row-value keyset cursor; scope- filter probe rows through a scalar form of the cached scope check; hydrate admitted rows per page and seal the window only when limit records survive scope, dedup, and the matches_record oracle. - Serve unlimited searches from the same lean fetch with the same deterministic order, hydrating survivors in batches. - Keep every contract: regex/any_term/compiled raise, empty agents return nothing, case-sensitive stays served, dedupe=False skips only dedup, and the coverage gate and decision span are untouched. New statement names (records.probe_fts, records.probe_scan, records.hydrate) make the phases individually profilable. - Prove parity with a property grid against the unlimited reference: NULL-timestamp tie runs and dedup groups straddling forced page boundaries, dedupe=False ordering, case-sensitive and text-surface oracle rejections refilling the window, single-page sealing at the default window, and corpus exhaustion.

why: The storage split and keyset probe are measured decisions; the ADR should carry the evidence and the named rejects so they stay rejected, and the dev page should describe the storage shape readers now find in db.py. what: - Amend ADR 0005's decision with the read-model layout, the probe strategy, the study findings, and the four measured rejects with one-line reasons. - Describe the split tables and the probe walkthrough on the db-index dev page, including the profilable statement names. - Note the probe/hydrate statement names in the AGENTS.md telemetry vocabulary, with the page-count reading of the probe sample.

why: Measured against the real 3.8 GB cache, memory-mapping the database file cuts hot-term probes 20-45% and the short-term scan about 35%. The OS page cache it rides is shared across the per-consult connections the MCP server opens - a per-connection cache_size, which benchmarked similarly, would re-warm on every consult and was rejected for that reason. what: - Set PRAGMA mmap_size on both open paths (the read-only opener and _configure), sized by SQLITE_MMAP_BYTES; SQLite clamps the request to its compile-time SQLITE_MAX_MMAP_SIZE, so over-requesting is harmless. - Assert the pragma on fresh writable and read-only connections.

why: Detail rows average about 3 KiB, so the default 4 KiB pages chain into overflow pages on hydration. Measured against the real cache, 8 KiB pages halve hydration time (3.5ms to 1.6ms for a 200-row window) and, combined with the mmap budget, take the hot two-page probe from 213ms to 136ms. what: - Set PRAGMA page_size=8192 in _configure before schema creation; the pragma only affects databases created by this connection, so existing caches keep their page size until their next rebuild. - Assert the page size on freshly created caches.

why: The aggregated probe sample shows page counts but not why pages continue. A db.probe.page sample per page - window, rows fetched, scope-admitted, cumulative kept - exposes the funnel; the first profile with it showed the hot-term continuation was scope rejection (130 of 200 page-one rows), not oracle rejection as assumed. what: - Emit one aggregate db.probe.page sample per probe page with the window size, fetched rows, scope-admitted rows, and cumulative kept results; no per-record samples.

why: Per-page probe telemetry showed prompts-scope hot-term probes fetching 200 rows but admitting only 70 — the Python scope filter rejected 130 chat-store user turns from agents that hold a dedicated prompt-history store, forcing a second 800-row continuation page and a second hydration round. Conversations scope had no SQL prefilter at all. Encoding the same store-role gates as SQL predicates lets the probe seal in a single page: page one now admits every fetched row and the warm hot-term consult drops from ~150-170ms to ~105ms. what: - Replace _prompt_history_agents with _scope_catalog, classifying the synced source ledger once into prompt-history agents and conversation-role (store, adapter_id) pairs. - Prompts scope adds NOT ((store, adapter_id) IN (VALUES ...) AND agent IN (...)) so chat-store user turns from prompt-history agents never enter the probe page. - Conversations scope adds (store, adapter_id) IN (VALUES ...), or a constant-false predicate when no conversation-role source is synced. - Keep the Python scope filter as the authoritative re-check on admitted rows; parity grids are unchanged.

tony temporarily deployed to docs June 6, 2026 17:52 — with GitHub Actions Inactive

tony force-pushed the streamline-04 branch from 22883f0 to 9f1177b Compare June 6, 2026 18:25

tony temporarily deployed to docs June 6, 2026 18:25 — with GitHub Actions Inactive

tony force-pushed the streamline-04 branch from 9f1177b to 1042dae Compare June 6, 2026 19:27

tony temporarily deployed to docs June 6, 2026 19:27 — with GitHub Actions Inactive

tony changed the title ~~Persistent DB index with deterministic insights and suggestions~~ Persistent SQLite DB index with cache-aware search Jun 6, 2026

tony mentioned this pull request Jun 6, 2026

Deterministic insights and review-only suggestions #48

Open

5 tasks

tony mentioned this pull request Jun 6, 2026

Add explicit cache bypass controls for profiling and benchmarks #49

Open

tony temporarily deployed to docs June 6, 2026 20:58 — with GitHub Actions Inactive

tony temporarily deployed to docs June 6, 2026 22:00 — with GitHub Actions Inactive

tony temporarily deployed to docs June 6, 2026 22:03 — with GitHub Actions Inactive

tony temporarily deployed to docs June 6, 2026 22:21 — with GitHub Actions Inactive

tony temporarily deployed to docs June 6, 2026 23:37 — with GitHub Actions Inactive

tony force-pushed the streamline-04 branch from d1d9c97 to 456ea4a Compare June 7, 2026 00:15

tony temporarily deployed to docs June 7, 2026 00:15 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 01:08 — with GitHub Actions Inactive

tony force-pushed the streamline-04 branch from 8f15333 to c4b8fdc Compare June 7, 2026 01:38

tony temporarily deployed to docs June 7, 2026 01:38 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 10:34 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 10:47 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 10:53 — with GitHub Actions Inactive

tony force-pushed the streamline-04 branch from 560486b to feed723 Compare June 7, 2026 11:25

tony temporarily deployed to docs June 7, 2026 11:25 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 11:51 — with GitHub Actions Inactive

tony temporarily deployed to docs June 7, 2026 12:47 — with GitHub Actions Inactive

tony added 28 commits June 27, 2026 13:44

tony force-pushed the streamline-04 branch from 5616600 to 10257b5 Compare June 27, 2026 18:47

tony had a problem deploying to docs June 27, 2026 18:47 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent SQLite DB index with cache-aware search#47

Persistent SQLite DB index with cache-aware search#47
tony wants to merge 45 commits into
masterfrom
streamline-04

tony commented Jun 6, 2026 •

edited

Loading

Uh oh!

tony commented Jun 6, 2026

Uh oh!

tony commented Jun 6, 2026

Uh oh!

tony commented Jun 6, 2026

Uh oh!

tony commented Jun 7, 2026

Uh oh!

tony commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tony commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes by area

DB index

Search integration

CLI

MCP

Docs

Design decisions

Test plan

Stacked PR

Uh oh!

tony commented Jun 6, 2026

Code review

Uh oh!

tony commented Jun 6, 2026

Uh oh!

tony commented Jun 6, 2026

Uh oh!

tony commented Jun 7, 2026

Code review

Uh oh!

tony commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tony commented Jun 6, 2026 •

edited

Loading