Persistent SQLite DB index with cache-aware search#47
Conversation
Code reviewFound 3 issues:
Lines 231 to 243 in 1042dae
agentgrep/src/agentgrep/mcp/tools/db_tools.py Lines 38 to 40 in 1042dae
agentgrep/tests/test_engine_profiling.py Lines 693 to 695 in 1042dae 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
Follow-up: all review findings addressed, including the ones below the posting threshold, one commit per issue.
Found along the way: the test suite could read the developer's real cache via |
|
Round-2 follow-up: the second review pass (both PRs) surfaced 12 findings; none crossed the posting threshold, but the legitimate ones are fixed here, one commit per issue.
Declined with reasons: SCHEMA_VERSION bump for the new tables (nothing has shipped; the schema is born final), the explain ok/error overlap (the offending row state has no writer), the |
Code reviewFound 1 issue:
agentgrep/scripts/profile_engine.py Lines 297 to 299 in c4b8fdc Profiled path that never reaches the cache: agentgrep/src/agentgrep/_engine/profiling.py Lines 203 to 210 in c4b8fdc 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
Round-3 follow-up: the cache-correctness review (cross-thread SQLite, scope mismatch, partial coverage, stale rows) plus the posted profiler issue are addressed, one commit per issue.
The insight-tool error-handling fix (foreign cache files return empty payloads like db_status) landed on the stacked #48 branch. |
why: The doctest guidance scopes runnable examples to pure helpers with no external state. The db normalization, hashing, FTS quoting, and count-formatting helpers qualify and had none. what: - Add Examples sections to normalize_record_text, token_set, text_hash, and _quote_fts_term. - Add Examples sections to the four db count formatters.
why: The cache applied the result limit before per-session dedup, so cross-source duplicates inside the first N rows returned fewer than N unique records. The live driver dedups during collection, where a cap always means N unique records. what: - Dedup inside DbStore.search_records after sorting and before the limit slice, owning the event-stream uniqueness invariant in one place. - Drop the now-redundant dedup pass in _db_search_result. - Cover limit-with-duplicates and no-dedup-with-limit with named cases, and pin the duplicate-storage test to an undeduped query.
why: The cache generated candidates from raw title and text while the live engine matches a five-field haystack with Python case folding, so records matching a term only through model, role, or path — or through casefold expansions like Strasse for Straße — never reached the post-filter, and --cache require silently under-returned. what: - Store a casefolded haystack column built from the same build_record_match_surface helper the live matcher uses, and point the trigram FTS index and the scan prefilter at it. - Casefold query terms before MATCH so both sides fold in Python; the trigram index never re-folds, letting non-ASCII terms use FTS and removing the ASCII guard from _fts_indexable. - Keep the live-predicate post-filter authoritative so text-surface queries narrow correctly from haystack candidates. - Cover model, role, path, casefold-expansion, and text-surface parity with named cases. Pre-release dev caches need one resync.
why: Benchmark harnesses, CI jobs, and MCP server configuration blocks cannot pass per-command flags, so cache behavior needs an environment lever. A tri-state mirror of --cache covers both cold runs (off) and honest warm runs (require, which fails loudly instead of silently measuring live scans). A cross-project audit (issue #49) found no ecosystem env-var convention to borrow, so the name follows the existing AGENTGREP_* prefix. what: - Resolve cache mode as flag > AGENTGREP_CACHE > auto via pure, doctested helpers; invalid values fail at parse time with a clean message. - Honor the variable in the MCP server runtime, attaching the cache read-only when it exists; the server never writes the cache file. - Document the variable in the configuration guide and cover resolution precedence, the invalid-value error, and the server runtime with named cases.
why: Cache-served searches opened the DB through the migration path, writing schema metadata on every cached query, and --cache require with a missing cache silently created an empty database and returned zero results. Reads must not mutate the cache, and require should fail loudly so warm benchmark runs cannot measure an accidental live scan. what: - Open the CLI search runtime via DbRuntime.open_readonly. - Error cleanly on --cache require when the cache is missing or not an agentgrep database, without creating files; auto keeps falling back to the live scanner. - Assert byte-identical cache files across cached searches and the clean require/auto missing-cache behaviors.
why: Cache behavior needs transparency to be trustworthy: profilers and benchmarks must be able to see whether a query was served from the DB cache, fell back to the live scanner, and why. One aggregate sample per consulted query keeps the telemetry n+1-safe — never a span per record. what: - Emit a search.cache.decision sample from the cache choke point with the cache mode, handled flag, served record count, and fallback reason (no-db, unsupported, or empty) — privacy-safe scalars only. - Document the span in the profiling vocabulary (AGENTS.md and the benchmark dev page). - Cover served, empty-fallback, unsupported-fallback, and off paths with named cases.
why: Cache-aware search makes timings ambiguous unless each run states its cache posture. Cold entries bypass the cache outright; warm entries serve from a pre-synced bench-scoped cache under require mode, so a cold cache fails the bench instead of silently timing live scans. what: - Add paired *-cache-cold-* and *-cache-warm-* bench entries; warm entries declare a setup_command that syncs the bench-scoped cache once before timing. - Support the optional setup_command key in the harness, failing the row cleanly when setup exits non-zero. - Record cache_mode on every measurement row (parsed from the env prefix, which stays visible in the sanitized command string) and in profiler artifacts. - Document cold and warm invocations in the benchmark guide and cover extraction, pairing, and metadata with named cases.
why: Cache levers grew across the flag surface, the environment, and the profiling spans; the architecture record should state the policy they implement so future cache layers inherit it deliberately. what: - Add a "Cache control and transparency" section to ADR 0005: every cache layer must expose a bypass lever and a hit-or-miss signal, with the AGENTGREP_DB / --cache + AGENTGREP_CACHE / decision-span levers and their precedence.
why: SQLite connections are bound to their creating thread, and the MCP search tool runs iter_search_events through asyncio.to_thread. The server built its runtime with an open read-only connection on the server thread, so the first cache-enabled MCP search raised sqlite3.ProgrammingError instead of serving records — auto mode could not even fall back because the error is not a DbQueryUnsupported. what: - Add SearchRuntime.db_opener, a per-consult factory the consulting thread calls; _db_search_result opens through it when no db handle is set and closes what it opened in the same thread. - Replace the MCP server's held connection with _open_cache_runtime: missing files return None (existing no-db semantics), and a probe query surfaces foreign or corrupt files at open time, where read-only connects are otherwise lazy. - Rewrite the cache-env runtime test for opener semantics, exercise the opener from a spawned thread across synced, foreign, and missing cache files, and add an end-to-end require-mode search through fastmcp.Client that fails with the cross-thread error before this fix.
why: Read-only SQLite connects are lazy, so the open-time DatabaseError guard in _db_runtime_for_cli never fired — a foreign file under --cache require surfaced as a traceback inside the search instead of the documented clean exit. The search helpers also never closed the runtime they opened, leaking the connection until process exit. what: - Probe the schema-version row after the read-only open so foreign or corrupt cache files exit 2 under require and degrade to a live scan under auto, closing the half-open connection either way. - Close the cache runtime in both CLI search paths: try/finally in the eager helper and a generator finally in the event-stream helper so early-breaking consumers still release the connection. - Cover foreign-file require/auto behavior and all three close paths with named cases; give the unsupported-query stub the close hook the search path now exercises.
why: The cached scope predicates filtered by record kind while the live pipeline filters by store role plus planner source selection. Chat adapters emit user turns as kind='prompt', so cached conversations-scope searches dropped every user turn, and cached prompts-scope searches returned chat-store user turns for agents whose live plan reads only their dedicated prompt-history store — diverging from the uncached scanner in both directions. what: - Filter conversations scope by store role through the same store_role_for_record catalog lookup the live record filter uses; drop the kind predicate that excluded user turns. - Keep kind='prompt' as a correct superset SQL prefilter for prompts scope and gate conversation-store records per agent against the synced source ledger, mirroring prompt_history_agents_for_sources and source_matches_scope. - Cover both divergence directions and the unchanged scopes with named parity cases over codex (chat + prompt store) and pi (chat only) records; extend the db test helpers with kind, agent, store, scope, and agents parameters.
why: Auto mode trusted any non-empty cache hit, but db sync ships agent, scope, and source-cap levers and can exit early, so the index can legitimately cover less than a query. A codex-only sync followed by an all-agents search silently served codex-only results as the answer. what: - Record SyncCoverage (agents, scope, complete) in the meta table when a sync loop finishes uninterrupted; capped runs pass complete=False so they never claim coverage, and merges preserve what earlier full syncs established for other agents. - Fall back to the live scanner in auto mode with a partial-coverage span reason whenever coverage does not span the query's agents and scope; require keeps serving because the caller demanded the cache. - Surface the coverage map through db explain (JSON field and a text line distinguishing "not recorded" from an empty map) and document the auto-mode link on the explain page. - Cover the gate, the merge semantics, the early-exit no-write path, the CLI's args-to-coverage mapping, and the explain rendering with named cases.
why: A resync only replaces records for sources present in the current discovery, so a deleted or rotated history file left its ledger row, records, and FTS entries behind forever — cached searches kept returning records the live scanner no longer sees, with no remediation short of deleting the cache file. what: - After an uninterrupted sync loop with prune_missing set, delete ledger rows absent from the batch set through the external-content FTS delete path; freshness-skipped sources are part of the batch set and stay. SyncResult gains a sources_pruned counter and the removed records fold into records_removed. - Enable pruning from the CLI only for uncapped, full-scope, all-agents syncs — a narrowed run does not observe the full catalog and must not delete what it skipped. Early-exited loops never prune. - Render the pruned count in the sync summary and document the pruning rule on the sync page. - Cover prune/keep behavior, the early-exit guard, and the CLI args-to-prune mapping with named cases.
why: Profiler artifacts recorded the requested cache mode while the profiled path never consulted the DB cache, so a require-mode run produced live-scan timings labeled as warm-cache evidence — exactly the misleading metadata the cache-transparency policy forbids. what: - Consult the DB cache at the top of profile_search_query, mirroring run_search_query: a served query returns the cached records with the search.cache.decision span in the profile and skips the discover/plan/collect phases it no longer executes. - Build the profiler's search runtime from the resolved cache mode in scripts/profile_engine.py — off profiles the live engine, auto and require attach the read-only DB when one exists — and record the resolved mode in the payload. - Cover require-serves-from-cache and off-profiles-live with named cases asserting which spans the artifact carries.
… selections why: An empty query.agents tuple generated WHERE r.agent IN (), a nonstandard form some SQLite builds reject, and diverged from live semantics where an empty selection discovers zero sources. Only the direct library surface can construct this query — the CLI and MCP normalize to a non-empty selection — but a public API should not depend on undocumented SQLite tolerance. what: - Short-circuit DbStore.search_records to an empty result for an empty agent selection, before any SQL is built. - Cover the guard with a synced-store test asserting live parity.
why: The module docstring enumerates the event-stream invariants, but the DB-cache early exit emits SearchStarted(source_count=0) with no per-source events — an undocumented shape that makes source_count=0 ambiguous between "nothing discovered" and "served from cache" for consumers and maintainers of the protocol. what: - Describe the cache-served envelope in the iter_search_events module docstring and name the search.cache.decision span as the disambiguator.
why: The profiling and benchmarking sections enumerate the metadata fields artifacts carry, and both artifact kinds gained a cache_mode field the enumerations did not mention — agents reading the project rules would omit it when interpreting artifacts. what: - Add cache_mode to the profiler-artifact and benchmark-artifact field sentences with one line each on where the value comes from.
why: Two cache-section insertions ran into adjacent sentences without a blank line, fusing distinct paragraphs in the rendered page — the cold-path benchmark lead-in merged into the span inventory, and the privacy paragraph merged into the decision-span description. what: - Add the missing paragraph breaks before the cold-path benchmark lead-in and after the search.cache.decision description.
…oint why: Query-strategy decisions for the cache need evidence about how SQLite behaves under our statements - per-statement timing, statement counts per operation, and rows touched - none of which the profiler could see. agentgrep owns every SQL call site in DbStore, so an explicit execute helper observes everything without cursor proxies or the sqlite trace callback, whose expanded-SQL output would leak bound search terms into telemetry. what: - Add _query/_execute/_executescript helpers that time each statement, accumulate per-statement-name counters, and DEBUG-log the statement text with placeholders only - bound parameters are never captured. - Name every statement shape (records.search_fts, records.insert, fts.delete, meta.get, sources.upsert, ...) and route all DbStore statements through the helpers. - Flush one aggregated db.sql.statement profile sample per statement name at the end of search_records, status, explain, and sync_records - a high agentgrep_sql_count on one sample is the n+1 signal; sync loops never emit per-record samples. - Cover the FTS and scan statement names, the sync n+1 aggregation, the silent no-profiler path, and a sentinel-term proof that no log message, extra, or sample attribute carries bound parameters.
why: Statement timings alone cannot show whether SQLite served a query from an index or fell back to a scan. The cache-transparency policy calls for a lever plus a signal for every cache layer; this adds the planner signal without any always-on cost. what: - Run EXPLAIN QUERY PLAN once per statement shape when AGENTGREP_SQL_EXPLAIN is set, joining the plan detail rows - table, index, and strategy names only, never bound parameters - onto the statement's aggregated sample as agentgrep_sql_plan and into the DEBUG log. - Echo the lever as sql_explain in profiler artifacts next to cache_mode so plan-bearing payloads are self-describing. - Cover FTS and scan plan capture, the off-by-default path, the once-per-shape EXPLAIN guarantee, and the artifact echo with named cases.
…ever why: The span vocabulary and cache-transparency policy enumerate the observability contract; the new db.sql.statement samples and the AGENTGREP_SQL_EXPLAIN lever belong in that inventory, with the privacy boundary stated where users will look for it. what: - Add db.sql.statement to the AGENTS.md span vocabulary with the n+1-signal reading of agentgrep_sql_count and the placeholders-only privacy rule. - Extend ADR 0005's cache-transparency section with the SQLite-layer spans and lever. - Document AGENTGREP_SQL_EXPLAIN in the configuration guide and the benchmark guide with one plan-capturing profile invocation.
why: The strategy study against the real 439k-record cache showed the probe-shaped query is about twice as fast against a narrow table as against the wide records table at every term frequency - page density is the lever - and the split also stores the haystack once instead of twice, shrinking the file. This commit is the behavior-identical storage swap; the probe lands separately so each is bisectable. what: - Replace the records table with records_search (ids, kind, agent, store, adapter, path, timestamp, session identity, hashes) and record_details (text, title, role, model, metadata) cascading from it, plus a content-full trigram FTS table that owns the haystack. - Make FTS removal a plain DELETE by rowid set: content-full FTS does not need the external-content 'delete' command bookkeeping, removing that corruption class outright. - Insert each record across the three surfaces; serve search, record iteration, and lookups through the search/details join; count records_search for status; run the short-term scan against the FTS table's own haystack column. - Add a round-trip fidelity test (every field survives the split) and update the source-delete index probe and telemetry statement names.
…robe why: The cached search fetched every matching row at full width before the limit slice - 164 MB pumped for a hot term - because dedup-before- limit kept the LIMIT out of SQL. The 65-run strategy study measured the fix: a lean ordered probe with keyset continuation plus per-survivor hydration runs 6-8x faster end-to-end and beats the live scanner at every term frequency. The measured rejects are equally explicit: instr pushdown (trigram MATCH proven exact on the haystack surface, so it only drags the wide column into the probe), GROUP BY MAX(rowid) (returned wrong top-50s), window-function dedup (materializes full partitions), and a stored sort column (FTS yields unordered rowids, so the index never engages). what: - Probe limited searches with lean columns ordered by the live sort tuple plus rowid DESC - the unique total order - in windows of max(4*limit, 200), continuing via a row-value keyset cursor; scope- filter probe rows through a scalar form of the cached scope check; hydrate admitted rows per page and seal the window only when limit records survive scope, dedup, and the matches_record oracle. - Serve unlimited searches from the same lean fetch with the same deterministic order, hydrating survivors in batches. - Keep every contract: regex/any_term/compiled raise, empty agents return nothing, case-sensitive stays served, dedupe=False skips only dedup, and the coverage gate and decision span are untouched. New statement names (records.probe_fts, records.probe_scan, records.hydrate) make the phases individually profilable. - Prove parity with a property grid against the unlimited reference: NULL-timestamp tie runs and dedup groups straddling forced page boundaries, dedupe=False ordering, case-sensitive and text-surface oracle rejections refilling the window, single-page sealing at the default window, and corpus exhaustion.
why: The storage split and keyset probe are measured decisions; the ADR should carry the evidence and the named rejects so they stay rejected, and the dev page should describe the storage shape readers now find in db.py. what: - Amend ADR 0005's decision with the read-model layout, the probe strategy, the study findings, and the four measured rejects with one-line reasons. - Describe the split tables and the probe walkthrough on the db-index dev page, including the profilable statement names. - Note the probe/hydrate statement names in the AGENTS.md telemetry vocabulary, with the page-count reading of the probe sample.
why: Measured against the real 3.8 GB cache, memory-mapping the database file cuts hot-term probes 20-45% and the short-term scan about 35%. The OS page cache it rides is shared across the per-consult connections the MCP server opens - a per-connection cache_size, which benchmarked similarly, would re-warm on every consult and was rejected for that reason. what: - Set PRAGMA mmap_size on both open paths (the read-only opener and _configure), sized by SQLITE_MMAP_BYTES; SQLite clamps the request to its compile-time SQLITE_MAX_MMAP_SIZE, so over-requesting is harmless. - Assert the pragma on fresh writable and read-only connections.
why: Detail rows average about 3 KiB, so the default 4 KiB pages chain into overflow pages on hydration. Measured against the real cache, 8 KiB pages halve hydration time (3.5ms to 1.6ms for a 200-row window) and, combined with the mmap budget, take the hot two-page probe from 213ms to 136ms. what: - Set PRAGMA page_size=8192 in _configure before schema creation; the pragma only affects databases created by this connection, so existing caches keep their page size until their next rebuild. - Assert the page size on freshly created caches.
why: The aggregated probe sample shows page counts but not why pages continue. A db.probe.page sample per page - window, rows fetched, scope-admitted, cumulative kept - exposes the funnel; the first profile with it showed the hot-term continuation was scope rejection (130 of 200 page-one rows), not oracle rejection as assumed. what: - Emit one aggregate db.probe.page sample per probe page with the window size, fetched rows, scope-admitted rows, and cumulative kept results; no per-record samples.
why: Per-page probe telemetry showed prompts-scope hot-term probes fetching 200 rows but admitting only 70 — the Python scope filter rejected 130 chat-store user turns from agents that hold a dedicated prompt-history store, forcing a second 800-row continuation page and a second hydration round. Conversations scope had no SQL prefilter at all. Encoding the same store-role gates as SQL predicates lets the probe seal in a single page: page one now admits every fetched row and the warm hot-term consult drops from ~150-170ms to ~105ms. what: - Replace _prompt_history_agents with _scope_catalog, classifying the synced source ledger once into prompt-history agents and conversation-role (store, adapter_id) pairs. - Prompts scope adds NOT ((store, adapter_id) IN (VALUES ...) AND agent IN (...)) so chat-store user turns from prompt-history agents never enter the probe page. - Conversations scope adds (store, adapter_id) IN (VALUES ...), or a constant-false predicate when no conversation-role source is synced. - Keep the Python scope filter as the authoritative re-check on admitted rows; parity grids are unchanged.
Summary
agentgrep db sync|status|explainmaterializes discovered agent history into a local WAL-mode database with an FTS5 text index, a source ledger with freshness state, and stable source/record ids. The DB is derived state — Codex, Claude, Cursor, Gemini, Grok, Pi, and OpenCode stores remain the source of truth.grepandsearchgain--cache auto|require|off(and--no-cache). The defaultautoserves a query from the DB index only when it can answer it and falls back to the live scanner otherwise;requireinsists on the DB path, and unsupported cache-required queries fail as clean CLI errors instead of tracebacks.source_statefingerprint checks skip unchanged sources unless--forceis passed, and a source-id index keeps per-source replacement fast.--json/--ndjsonmachine modes.db_statusMCP tool.Changes by area
DB index
src/agentgrep/db.py:DbStore(SQLite/WAL/FTS5 schema: source ledger plussource_state, normalized records),DbRuntimesync/status/search, sync progress protocol, freshness fingerprints, and a default path under the XDG cache directory overridable viaAGENTGREP_DBor--db.Search integration
src/agentgrep/__init__.py,src/agentgrep/_engine/: cache-mode plumbing throughSearchRuntimeso search-shaped commands consult the DB index under--cache auto|requireand preserve the live-scan path under--no-cache.CLI
src/agentgrep/cli/parser.py,src/agentgrep/cli/render.py: thedbcommand group with progress flags, semantic human summaries in text mode, JSON/NDJSON machine modes, and clean CLI errors for unsupported cached queries.MCP
src/agentgrep/mcp/tools/db_tools.py,src/agentgrep/mcp/models.py: the read-onlydb_statustool with its pydantic response model and capability registration.Docs
ADR 0005; per-command CLI pages for
db, plus argparse-backed pages for the existingsearchanduicommands; a DB-index development page; DB cache controls in the configuration guide.Design decisions
--cache autois conservative: the index only answers queries it can satisfy; anything else falls through to the live scanner, and--cache requireturns unsupported queries into explicit CLI errors.source_staterecords the synced mtime and content fingerprint;--forceis the explicit full-rebuild path.Test plan
tests/test_db_index.py—DbStoreschema and sync behavior: freshness skips,--forceresync, FTS queries, deterministic ids, early exittests/test_cache_cli.py— CLI contracts fordband cache modes: JSON and NDJSON stdout, human summaries, progress output, error pathstests/test_agentgrep_mcp.py— read-onlydb_statuspayloads and capability registrationruff check/ruff format,ty check,uv run pytest(incl. doctests),just build-docsStacked PR
The insights/suggestions surface that previously shared this branch now lives in its own PR stacked on this one.
Closes #49.