Skip to content

Persistent SQLite DB index with cache-aware search#47

Open
tony wants to merge 45 commits into
masterfrom
streamline-04
Open

Persistent SQLite DB index with cache-aware search#47
tony wants to merge 45 commits into
masterfrom
streamline-04

Conversation

@tony

@tony tony commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a persistent SQLite DB index (ADR 0005): agentgrep db sync|status|explain materializes discovered agent history into a local WAL-mode database with an FTS5 text index, a source ledger with freshness state, and stable source/record ids. The DB is derived state — Codex, Claude, Cursor, Gemini, Grok, Pi, and OpenCode stores remain the source of truth.
  • Add cache-aware search execution: grep and search gain --cache auto|require|off (and --no-cache). The default auto serves a query from the DB index only when it can answer it and falls back to the live scanner otherwise; require insists on the DB path, and unsupported cache-required queries fail as clean CLI errors instead of tracebacks.
  • Add live stderr sync progress with cooperative early exit — Enter stops at the next source boundary and keeps partial results; structured stdout stays machine-readable in JSON and NDJSON modes.
  • Add cache-fast sync defaults: source_state fingerprint checks skip unchanged sources unless --force is passed, and a source-id index keeps per-source replacement fast.
  • Add human-readable semantic summaries as the default text output for db payloads, with explicit --json/--ndjson machine modes.
  • Add the read-only db_status MCP tool.

Changes by area

DB index

src/agentgrep/db.py: DbStore (SQLite/WAL/FTS5 schema: source ledger plus source_state, normalized records), DbRuntime sync/status/search, sync progress protocol, freshness fingerprints, and a default path under the XDG cache directory overridable via AGENTGREP_DB or --db.

Search integration

src/agentgrep/__init__.py, src/agentgrep/_engine/: cache-mode plumbing through SearchRuntime so search-shaped commands consult the DB index under --cache auto|require and preserve the live-scan path under --no-cache.

CLI

src/agentgrep/cli/parser.py, src/agentgrep/cli/render.py: the db command group with progress flags, semantic human summaries in text mode, JSON/NDJSON machine modes, and clean CLI errors for unsupported cached queries.

MCP

src/agentgrep/mcp/tools/db_tools.py, src/agentgrep/mcp/models.py: the read-only db_status tool with its pydantic response model and capability registration.

Docs

ADR 0005; per-command CLI pages for db, plus argparse-backed pages for the existing search and ui commands; a DB-index development page; DB cache controls in the configuration guide.

Design decisions

  • The DB is a materialized read model, never the source of truth: deleting the database loses cached records only; agent history stores stay untouched and agentgrep stays read-only over them.
  • --cache auto is conservative: the index only answers queries it can satisfy; anything else falls through to the live scanner, and --cache require turns unsupported queries into explicit CLI errors.
  • Sync skips by fingerprint, not by guess: source_state records the synced mtime and content fingerprint; --force is the explicit full-rebuild path.

Test plan

  • tests/test_db_index.pyDbStore schema and sync behavior: freshness skips, --force resync, FTS queries, deterministic ids, early exit
  • tests/test_cache_cli.py — CLI contracts for db and cache modes: JSON and NDJSON stdout, human summaries, progress output, error paths
  • tests/test_agentgrep_mcp.py — read-only db_status payloads and capability registration
  • Full gate per commit: ruff check / ruff format, ty check, uv run pytest (incl. doctests), just build-docs

Stacked PR

The insights/suggestions surface that previously shared this branch now lives in its own PR stacked on this one.

Closes #49.

@tony tony force-pushed the streamline-04 branch from 22883f0 to 9f1177b Compare June 6, 2026 18:25
@tony tony force-pushed the streamline-04 branch from 9f1177b to 1042dae Compare June 6, 2026 19:27
@tony tony changed the title Persistent DB index with deterministic insights and suggestions Persistent SQLite DB index with cache-aware search Jun 6, 2026
@tony

tony commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

Code review

Found 3 issues:

  1. _source_fingerprint stats only the main source file, so WAL-mode SQLite stores (Cursor state.vscdb, Grok, OpenCode) whose commits land in the -wal sidecar look unchanged — source_is_current returns true and db sync skips them, serving stale records. This is the same failure mode trunk commit f41a2df fixed for the in-memory scan cache in _engine/scanning.py (_source_scan_cache_key includes the sidecar's size/mtime for source_kind == "sqlite"); the DB freshness path doesn't carry that fix forward.

def _source_fingerprint(source: agentgrep.SourceHandle) -> str:
"""Return a cheap source fingerprint for cache freshness checks."""
try:
stat = source.path.stat()
except OSError:
size = 0
mtime_ns = source.mtime_ns
else:
size = stat.st_size
mtime_ns = stat.st_mtime_ns
return text_hash(f"{source.path}\0{size}\0{mtime_ns}")

  1. _db_status_sync opens a DbRuntime (one sqlite3.connect per call) and never closes it. In the long-lived MCP server every db_status call leaks a connection/file descriptors (verified: 3 fds per call), and unclosed WAL-mode connections can suppress checkpointing. DbStore.close() exists but is unreachable here — DbRuntime has no close/context-manager.

)
status = DbRuntime.open(path).status()
return DbStatusModel(

  1. The docstring of test_collect_search_records_does_not_import_profiler_when_inactive was changed to "The non-profiled db path keeps the profiler module unloaded", but the test exercises collect_search_records (the record-collection path) and has nothing to do with the db command — the original "collection path" wording referred to record collection, not a command name, so the new text is factually wrong.

) -> None:
"""The non-profiled db path keeps the profiler module unloaded."""
source = agentgrep.SourceHandle(

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

Follow-up: all review findings addressed, including the ones below the posting threshold, one commit per issue.

Finding Commit
WAL sidecars invisible to sync freshness (posted, #1) agentgrep(fix[db]): Fingerprint SQLite WAL sidecars in sync freshness
db_status connection leak (posted, #2) agentgrep(fix[db]): Close per-call DB runtimes
Wrong profiling docstring (posted, #3) agentgrep(fix[test]): Describe the profiling guard test accurately
FTS5 external-content deletes with placeholder values agentgrep(fix[db]): Delete FTS rows with their stored values
Paste-cache cross-file adapters served stale syncs agentgrep(fix[db]): Exempt cross-file adapters from sync freshness skips
Whole-token FTS dropped substring matches under --cache require agentgrep(fix[db]): Match cached searches by exact substring semantics (trigram tokenizer + exact post-filter; rebuild policy in ADR 0008)
Cache path bypassed per-session dedup agentgrep(fix[db]): Apply per-session dedup to cached search results
Read-only db_status wrote schema metadata agentgrep(fix[db]): Open status surfaces read-only
db explain identical to db status agentgrep(feat[db]): Report cache diagnostics from db explain
Missing doctests on pure helpers agentgrep(test[db]): Add doctests to pure db and render helpers

Found along the way: the test suite could read the developer's real cache via default_db_path() — now isolated per-test (agentgrep(test[db]): Isolate the user cache from the test suite), and ADR 0008 records the cache schema rebuild policy. The stacked insights PR picks up the same lifecycle/read-only treatment for its tool surfaces.

@tony

tony commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

Round-2 follow-up: the second review pass (both PRs) surfaced 12 findings; none crossed the posting threshold, but the legitimate ones are fixed here, one commit per issue.

Finding Commit PR
Result cap applied before per-session dedup undercounted unique results fix[db]: Count result caps in unique cached records #47
Cache candidates covered title+text only (live matches model/role/path) and casefold-expanding text escaped trigram matching fix[db]: Index the casefolded live match surface — one design fix for both: the cache now indexes the same casefolded haystack the live matcher uses #47
Schema rebuild dropped only base tables, leaving stale artifact rows fix[db]: Drop artifact tables in schema rebuilds (+ sqlite_master drift test) #48
CLI read actions (insights list/explain, suggestions show/render/list) opened read-write fix[insights]: Open CLI read actions read-only #48
Suggestion listings unbounded vs the insights pattern feat[suggestions]: Bound persisted suggestion listings (--limit, totals, truncation, MCP fields) #48
Function-local cheap stdlib import fix[insights]: Import json at module level #48
Missing doctests on pure helpers test[db]: Add doctests to remaining pure helpers #48

Declined with reasons: SCHEMA_VERSION bump for the new tables (nothing has shipped; the schema is born final), the explain ok/error overlap (the offending row state has no writer), the _pytest alias (load-bearing under the TYPE_CHECKING-only import), Parameters docstring sections (repo convention uses short prose for internal helpers), and the private exemption-constant import (single source of truth, behavior-tested).

@tony

tony commented Jun 7, 2026

Copy link
Copy Markdown
Owner Author

Code review

Found 1 issue:

  1. profile_engine.py records cache_mode in profiler artifacts but the profiled path never consults the cache. payload["cache_mode"] is read from AGENTGREP_CACHE, yet profile_search_query builds its query pipeline without a SearchRuntime/DbRuntime, so _db_search_result early-returns and every component measures a live scan. Running AGENTGREP_CACHE=require produces live-scan timings labeled cache_mode: "require" (and off changes nothing), which misleads benchmark analysis and conflicts with Add explicit cache bypass controls for profiling and benchmarks #49's acceptance criterion that AGENTGREP_CACHE "selects the same modes for CLI, MCP server, benchmark, and profiler subprocesses". Either honor the mode in the profiled path (build a DbRuntime when the env var asks for it) or record cache_mode: "off"/omit the field for engine profiles so the artifact reflects what was measured.

payload["artifact_kind"] = PROFILE_RUN_ARTIFACT_KIND
payload["cache_mode"] = os.environ.get("AGENTGREP_CACHE", "auto")
payload["agent_count"] = len(agents)

Profiled path that never reaches the cache:

def profile_search_query(
home: pathlib.Path,
query: agentgrep.SearchQuery,
*,
backends: agentgrep.BackendSelection | None = None,
control: agentgrep.SearchControl | None = None,
runtime: SearchRuntime | None = None,

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 7, 2026

Copy link
Copy Markdown
Owner Author

Round-3 follow-up: the cache-correctness review (cross-thread SQLite, scope mismatch, partial coverage, stale rows) plus the posted profiler issue are addressed, one commit per issue.

  • Open cache connections in the consulting thread — SearchRuntime gains a per-consult opener; the MCP server no longer holds a connection the asyncio.to_thread worker can't use. End-to-end regression test reproduces the prior ProgrammingError through fastmcp.Client.
  • Probe and close search-path cache runtimes — the CLI's open-time DatabaseError guard now actually fires (read-only connects are lazy), and both search paths close what they open.
  • Mirror live scope semantics in cached search — conversations scope filters by store role (user turns included again); prompts scope gates chat stores per agent like the planner.
  • Gate auto-mode cache hits on sync coverage — partial syncs (--agent/--scope/--limit-sources/interrupted) no longer pass off subsets as answers; db explain surfaces the coverage map; new partial-coverage span reason.
  • Prune vanished sources on complete full syncs — deleted/rotated history files stop answering cached searches; narrowed syncs never prune.
  • Honor AGENTGREP_CACHE in engine profiles — resolves the review comment above: profile_search_query consults the cache like run_search_query, so require-mode artifacts now measure the cache path (decision span present, no collect phases) instead of mislabeled live scans.
  • Smaller items: empty agent selections return empty from the cached path (IN () hazard removed), the event-stream docstring documents the cache-served shape, AGENTS.md artifact field enumerations list cache_mode, and the benchmark guide's fused paragraphs are separated.

The insight-tool error-handling fix (foreign cache files return empty payloads like db_status) landed on the stacked #48 branch.

tony added 28 commits June 27, 2026 13:44
why: The doctest guidance scopes runnable examples to pure helpers
with no external state. The db normalization, hashing, FTS quoting,
and count-formatting helpers qualify and had none.

what:
- Add Examples sections to normalize_record_text, token_set,
  text_hash, and _quote_fts_term.
- Add Examples sections to the four db count formatters.
why: The cache applied the result limit before per-session dedup, so
cross-source duplicates inside the first N rows returned fewer than N
unique records. The live driver dedups during collection, where a cap
always means N unique records.

what:
- Dedup inside DbStore.search_records after sorting and before the
  limit slice, owning the event-stream uniqueness invariant in one
  place.
- Drop the now-redundant dedup pass in _db_search_result.
- Cover limit-with-duplicates and no-dedup-with-limit with named cases,
  and pin the duplicate-storage test to an undeduped query.
why: The cache generated candidates from raw title and text while the
live engine matches a five-field haystack with Python case folding, so
records matching a term only through model, role, or path — or through
casefold expansions like Strasse for Straße — never reached the
post-filter, and --cache require silently under-returned.

what:
- Store a casefolded haystack column built from the same
  build_record_match_surface helper the live matcher uses, and point
  the trigram FTS index and the scan prefilter at it.
- Casefold query terms before MATCH so both sides fold in Python; the
  trigram index never re-folds, letting non-ASCII terms use FTS and
  removing the ASCII guard from _fts_indexable.
- Keep the live-predicate post-filter authoritative so text-surface
  queries narrow correctly from haystack candidates.
- Cover model, role, path, casefold-expansion, and text-surface parity
  with named cases. Pre-release dev caches need one resync.
why: Benchmark harnesses, CI jobs, and MCP server configuration blocks
cannot pass per-command flags, so cache behavior needs an environment
lever. A tri-state mirror of --cache covers both cold runs (off) and
honest warm runs (require, which fails loudly instead of silently
measuring live scans). A cross-project audit (issue #49) found no
ecosystem env-var convention to borrow, so the name follows the
existing AGENTGREP_* prefix.

what:
- Resolve cache mode as flag > AGENTGREP_CACHE > auto via pure,
  doctested helpers; invalid values fail at parse time with a clean
  message.
- Honor the variable in the MCP server runtime, attaching the cache
  read-only when it exists; the server never writes the cache file.
- Document the variable in the configuration guide and cover
  resolution precedence, the invalid-value error, and the server
  runtime with named cases.
why: Cache-served searches opened the DB through the migration path,
writing schema metadata on every cached query, and --cache require
with a missing cache silently created an empty database and returned
zero results. Reads must not mutate the cache, and require should fail
loudly so warm benchmark runs cannot measure an accidental live scan.

what:
- Open the CLI search runtime via DbRuntime.open_readonly.
- Error cleanly on --cache require when the cache is missing or not an
  agentgrep database, without creating files; auto keeps falling back
  to the live scanner.
- Assert byte-identical cache files across cached searches and the
  clean require/auto missing-cache behaviors.
why: Cache behavior needs transparency to be trustworthy: profilers
and benchmarks must be able to see whether a query was served from the
DB cache, fell back to the live scanner, and why. One aggregate sample
per consulted query keeps the telemetry n+1-safe — never a span per
record.

what:
- Emit a search.cache.decision sample from the cache choke point with
  the cache mode, handled flag, served record count, and fallback
  reason (no-db, unsupported, or empty) — privacy-safe scalars only.
- Document the span in the profiling vocabulary (AGENTS.md and the
  benchmark dev page).
- Cover served, empty-fallback, unsupported-fallback, and off paths
  with named cases.
why: Cache-aware search makes timings ambiguous unless each run states
its cache posture. Cold entries bypass the cache outright; warm entries
serve from a pre-synced bench-scoped cache under require mode, so a
cold cache fails the bench instead of silently timing live scans.

what:
- Add paired *-cache-cold-* and *-cache-warm-* bench entries; warm
  entries declare a setup_command that syncs the bench-scoped cache
  once before timing.
- Support the optional setup_command key in the harness, failing the
  row cleanly when setup exits non-zero.
- Record cache_mode on every measurement row (parsed from the env
  prefix, which stays visible in the sanitized command string) and in
  profiler artifacts.
- Document cold and warm invocations in the benchmark guide and cover
  extraction, pairing, and metadata with named cases.
why: Cache levers grew across the flag surface, the environment, and
the profiling spans; the architecture record should state the policy
they implement so future cache layers inherit it deliberately.

what:
- Add a "Cache control and transparency" section to ADR 0005: every
  cache layer must expose a bypass lever and a hit-or-miss signal,
  with the AGENTGREP_DB / --cache + AGENTGREP_CACHE / decision-span
  levers and their precedence.
why: SQLite connections are bound to their creating thread, and the
MCP search tool runs iter_search_events through asyncio.to_thread. The
server built its runtime with an open read-only connection on the
server thread, so the first cache-enabled MCP search raised
sqlite3.ProgrammingError instead of serving records — auto mode could
not even fall back because the error is not a DbQueryUnsupported.

what:
- Add SearchRuntime.db_opener, a per-consult factory the consulting
  thread calls; _db_search_result opens through it when no db handle
  is set and closes what it opened in the same thread.
- Replace the MCP server's held connection with _open_cache_runtime:
  missing files return None (existing no-db semantics), and a probe
  query surfaces foreign or corrupt files at open time, where
  read-only connects are otherwise lazy.
- Rewrite the cache-env runtime test for opener semantics, exercise
  the opener from a spawned thread across synced, foreign, and
  missing cache files, and add an end-to-end require-mode search
  through fastmcp.Client that fails with the cross-thread error
  before this fix.
why: Read-only SQLite connects are lazy, so the open-time
DatabaseError guard in _db_runtime_for_cli never fired — a foreign
file under --cache require surfaced as a traceback inside the search
instead of the documented clean exit. The search helpers also never
closed the runtime they opened, leaking the connection until process
exit.

what:
- Probe the schema-version row after the read-only open so foreign or
  corrupt cache files exit 2 under require and degrade to a live scan
  under auto, closing the half-open connection either way.
- Close the cache runtime in both CLI search paths: try/finally in
  the eager helper and a generator finally in the event-stream helper
  so early-breaking consumers still release the connection.
- Cover foreign-file require/auto behavior and all three close paths
  with named cases; give the unsupported-query stub the close hook
  the search path now exercises.
why: The cached scope predicates filtered by record kind while the
live pipeline filters by store role plus planner source selection.
Chat adapters emit user turns as kind='prompt', so cached
conversations-scope searches dropped every user turn, and cached
prompts-scope searches returned chat-store user turns for agents
whose live plan reads only their dedicated prompt-history store —
diverging from the uncached scanner in both directions.

what:
- Filter conversations scope by store role through the same
  store_role_for_record catalog lookup the live record filter uses;
  drop the kind predicate that excluded user turns.
- Keep kind='prompt' as a correct superset SQL prefilter for prompts
  scope and gate conversation-store records per agent against the
  synced source ledger, mirroring prompt_history_agents_for_sources
  and source_matches_scope.
- Cover both divergence directions and the unchanged scopes with
  named parity cases over codex (chat + prompt store) and pi
  (chat only) records; extend the db test helpers with kind, agent,
  store, scope, and agents parameters.
why: Auto mode trusted any non-empty cache hit, but db sync ships
agent, scope, and source-cap levers and can exit early, so the index
can legitimately cover less than a query. A codex-only sync followed
by an all-agents search silently served codex-only results as the
answer.

what:
- Record SyncCoverage (agents, scope, complete) in the meta table
  when a sync loop finishes uninterrupted; capped runs pass
  complete=False so they never claim coverage, and merges preserve
  what earlier full syncs established for other agents.
- Fall back to the live scanner in auto mode with a partial-coverage
  span reason whenever coverage does not span the query's agents and
  scope; require keeps serving because the caller demanded the cache.
- Surface the coverage map through db explain (JSON field and a text
  line distinguishing "not recorded" from an empty map) and document
  the auto-mode link on the explain page.
- Cover the gate, the merge semantics, the early-exit no-write path,
  the CLI's args-to-coverage mapping, and the explain rendering with
  named cases.
why: A resync only replaces records for sources present in the
current discovery, so a deleted or rotated history file left its
ledger row, records, and FTS entries behind forever — cached searches
kept returning records the live scanner no longer sees, with no
remediation short of deleting the cache file.

what:
- After an uninterrupted sync loop with prune_missing set, delete
  ledger rows absent from the batch set through the external-content
  FTS delete path; freshness-skipped sources are part of the batch
  set and stay. SyncResult gains a sources_pruned counter and the
  removed records fold into records_removed.
- Enable pruning from the CLI only for uncapped, full-scope,
  all-agents syncs — a narrowed run does not observe the full catalog
  and must not delete what it skipped. Early-exited loops never
  prune.
- Render the pruned count in the sync summary and document the
  pruning rule on the sync page.
- Cover prune/keep behavior, the early-exit guard, and the CLI
  args-to-prune mapping with named cases.
why: Profiler artifacts recorded the requested cache mode while the
profiled path never consulted the DB cache, so a require-mode run
produced live-scan timings labeled as warm-cache evidence — exactly
the misleading metadata the cache-transparency policy forbids.

what:
- Consult the DB cache at the top of profile_search_query, mirroring
  run_search_query: a served query returns the cached records with
  the search.cache.decision span in the profile and skips the
  discover/plan/collect phases it no longer executes.
- Build the profiler's search runtime from the resolved cache mode in
  scripts/profile_engine.py — off profiles the live engine, auto and
  require attach the read-only DB when one exists — and record the
  resolved mode in the payload.
- Cover require-serves-from-cache and off-profiles-live with named
  cases asserting which spans the artifact carries.
… selections

why: An empty query.agents tuple generated WHERE r.agent IN (), a
nonstandard form some SQLite builds reject, and diverged from live
semantics where an empty selection discovers zero sources. Only the
direct library surface can construct this query — the CLI and MCP
normalize to a non-empty selection — but a public API should not
depend on undocumented SQLite tolerance.

what:
- Short-circuit DbStore.search_records to an empty result for an
  empty agent selection, before any SQL is built.
- Cover the guard with a synced-store test asserting live parity.
why: The module docstring enumerates the event-stream invariants, but
the DB-cache early exit emits SearchStarted(source_count=0) with no
per-source events — an undocumented shape that makes source_count=0
ambiguous between "nothing discovered" and "served from cache" for
consumers and maintainers of the protocol.

what:
- Describe the cache-served envelope in the iter_search_events module
  docstring and name the search.cache.decision span as the
  disambiguator.
why: The profiling and benchmarking sections enumerate the metadata
fields artifacts carry, and both artifact kinds gained a cache_mode
field the enumerations did not mention — agents reading the project
rules would omit it when interpreting artifacts.

what:
- Add cache_mode to the profiler-artifact and benchmark-artifact
  field sentences with one line each on where the value comes from.
why: Two cache-section insertions ran into adjacent sentences without
a blank line, fusing distinct paragraphs in the rendered page — the
cold-path benchmark lead-in merged into the span inventory, and the
privacy paragraph merged into the decision-span description.

what:
- Add the missing paragraph breaks before the cold-path benchmark
  lead-in and after the search.cache.decision description.
…oint

why: Query-strategy decisions for the cache need evidence about how
SQLite behaves under our statements - per-statement timing, statement
counts per operation, and rows touched - none of which the profiler
could see. agentgrep owns every SQL call site in DbStore, so an
explicit execute helper observes everything without cursor proxies or
the sqlite trace callback, whose expanded-SQL output would leak bound
search terms into telemetry.

what:
- Add _query/_execute/_executescript helpers that time each statement,
  accumulate per-statement-name counters, and DEBUG-log the statement
  text with placeholders only - bound parameters are never captured.
- Name every statement shape (records.search_fts, records.insert,
  fts.delete, meta.get, sources.upsert, ...) and route all DbStore
  statements through the helpers.
- Flush one aggregated db.sql.statement profile sample per statement
  name at the end of search_records, status, explain, and sync_records
  - a high agentgrep_sql_count on one sample is the n+1 signal; sync
  loops never emit per-record samples.
- Cover the FTS and scan statement names, the sync n+1 aggregation,
  the silent no-profiler path, and a sentinel-term proof that no log
  message, extra, or sample attribute carries bound parameters.
why: Statement timings alone cannot show whether SQLite served a
query from an index or fell back to a scan. The cache-transparency
policy calls for a lever plus a signal for every cache layer; this
adds the planner signal without any always-on cost.

what:
- Run EXPLAIN QUERY PLAN once per statement shape when
  AGENTGREP_SQL_EXPLAIN is set, joining the plan detail rows -
  table, index, and strategy names only, never bound parameters -
  onto the statement's aggregated sample as agentgrep_sql_plan and
  into the DEBUG log.
- Echo the lever as sql_explain in profiler artifacts next to
  cache_mode so plan-bearing payloads are self-describing.
- Cover FTS and scan plan capture, the off-by-default path, the
  once-per-shape EXPLAIN guarantee, and the artifact echo with named
  cases.
…ever

why: The span vocabulary and cache-transparency policy enumerate the
observability contract; the new db.sql.statement samples and the
AGENTGREP_SQL_EXPLAIN lever belong in that inventory, with the privacy
boundary stated where users will look for it.

what:
- Add db.sql.statement to the AGENTS.md span vocabulary with the
  n+1-signal reading of agentgrep_sql_count and the placeholders-only
  privacy rule.
- Extend ADR 0005's cache-transparency section with the SQLite-layer
  spans and lever.
- Document AGENTGREP_SQL_EXPLAIN in the configuration guide and the
  benchmark guide with one plan-capturing profile invocation.
why: The strategy study against the real 439k-record cache showed the
probe-shaped query is about twice as fast against a narrow table as
against the wide records table at every term frequency - page density
is the lever - and the split also stores the haystack once instead of
twice, shrinking the file. This commit is the behavior-identical
storage swap; the probe lands separately so each is bisectable.

what:
- Replace the records table with records_search (ids, kind, agent,
  store, adapter, path, timestamp, session identity, hashes) and
  record_details (text, title, role, model, metadata) cascading from
  it, plus a content-full trigram FTS table that owns the haystack.
- Make FTS removal a plain DELETE by rowid set: content-full FTS does
  not need the external-content 'delete' command bookkeeping, removing
  that corruption class outright.
- Insert each record across the three surfaces; serve search, record
  iteration, and lookups through the search/details join; count
  records_search for status; run the short-term scan against the FTS
  table's own haystack column.
- Add a round-trip fidelity test (every field survives the split) and
  update the source-delete index probe and telemetry statement names.
…robe

why: The cached search fetched every matching row at full width before
the limit slice - 164 MB pumped for a hot term - because dedup-before-
limit kept the LIMIT out of SQL. The 65-run strategy study measured the
fix: a lean ordered probe with keyset continuation plus per-survivor
hydration runs 6-8x faster end-to-end and beats the live scanner at
every term frequency. The measured rejects are equally explicit: instr
pushdown (trigram MATCH proven exact on the haystack surface, so it
only drags the wide column into the probe), GROUP BY MAX(rowid)
(returned wrong top-50s), window-function dedup (materializes full
partitions), and a stored sort column (FTS yields unordered rowids, so
the index never engages).

what:
- Probe limited searches with lean columns ordered by the live sort
  tuple plus rowid DESC - the unique total order - in windows of
  max(4*limit, 200), continuing via a row-value keyset cursor; scope-
  filter probe rows through a scalar form of the cached scope check;
  hydrate admitted rows per page and seal the window only when limit
  records survive scope, dedup, and the matches_record oracle.
- Serve unlimited searches from the same lean fetch with the same
  deterministic order, hydrating survivors in batches.
- Keep every contract: regex/any_term/compiled raise, empty agents
  return nothing, case-sensitive stays served, dedupe=False skips only
  dedup, and the coverage gate and decision span are untouched. New
  statement names (records.probe_fts, records.probe_scan,
  records.hydrate) make the phases individually profilable.
- Prove parity with a property grid against the unlimited reference:
  NULL-timestamp tie runs and dedup groups straddling forced page
  boundaries, dedupe=False ordering, case-sensitive and text-surface
  oracle rejections refilling the window, single-page sealing at the
  default window, and corpus exhaustion.
why: The storage split and keyset probe are measured decisions; the
ADR should carry the evidence and the named rejects so they stay
rejected, and the dev page should describe the storage shape readers
now find in db.py.

what:
- Amend ADR 0005's decision with the read-model layout, the probe
  strategy, the study findings, and the four measured rejects with
  one-line reasons.
- Describe the split tables and the probe walkthrough on the db-index
  dev page, including the profilable statement names.
- Note the probe/hydrate statement names in the AGENTS.md telemetry
  vocabulary, with the page-count reading of the probe sample.
why: Measured against the real 3.8 GB cache, memory-mapping the
database file cuts hot-term probes 20-45% and the short-term scan
about 35%. The OS page cache it rides is shared across the per-consult
connections the MCP server opens - a per-connection cache_size, which
benchmarked similarly, would re-warm on every consult and was rejected
for that reason.

what:
- Set PRAGMA mmap_size on both open paths (the read-only opener and
  _configure), sized by SQLITE_MMAP_BYTES; SQLite clamps the request
  to its compile-time SQLITE_MAX_MMAP_SIZE, so over-requesting is
  harmless.
- Assert the pragma on fresh writable and read-only connections.
why: Detail rows average about 3 KiB, so the default 4 KiB pages chain
into overflow pages on hydration. Measured against the real cache,
8 KiB pages halve hydration time (3.5ms to 1.6ms for a 200-row window)
and, combined with the mmap budget, take the hot two-page probe from
213ms to 136ms.

what:
- Set PRAGMA page_size=8192 in _configure before schema creation; the
  pragma only affects databases created by this connection, so
  existing caches keep their page size until their next rebuild.
- Assert the page size on freshly created caches.
why: The aggregated probe sample shows page counts but not why pages
continue. A db.probe.page sample per page - window, rows fetched,
scope-admitted, cumulative kept - exposes the funnel; the first
profile with it showed the hot-term continuation was scope rejection
(130 of 200 page-one rows), not oracle rejection as assumed.

what:
- Emit one aggregate db.probe.page sample per probe page with the
  window size, fetched rows, scope-admitted rows, and cumulative kept
  results; no per-record samples.
why: Per-page probe telemetry showed prompts-scope hot-term probes
fetching 200 rows but admitting only 70 — the Python scope filter
rejected 130 chat-store user turns from agents that hold a dedicated
prompt-history store, forcing a second 800-row continuation page and
a second hydration round. Conversations scope had no SQL prefilter at
all. Encoding the same store-role gates as SQL predicates lets the
probe seal in a single page: page one now admits every fetched row
and the warm hot-term consult drops from ~150-170ms to ~105ms.

what:
- Replace _prompt_history_agents with _scope_catalog, classifying the
  synced source ledger once into prompt-history agents and
  conversation-role (store, adapter_id) pairs.
- Prompts scope adds NOT ((store, adapter_id) IN (VALUES ...) AND
  agent IN (...)) so chat-store user turns from prompt-history agents
  never enter the probe page.
- Conversations scope adds (store, adapter_id) IN (VALUES ...), or a
  constant-false predicate when no conversation-role source is
  synced.
- Keep the Python scope filter as the authoritative re-check on
  admitted rows; parity grids are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add explicit cache bypass controls for profiling and benchmarks

1 participant