Skip to content

Route CLI archive access through daemon HTTP#426

Open
wesm wants to merge 80 commits into
mainfrom
uncovered-flame
Open

Route CLI archive access through daemon HTTP#426
wesm wants to merge 80 commits into
mainfrom
uncovered-flame

Conversation

@wesm

@wesm wesm commented Jul 1, 2026

Copy link
Copy Markdown
Member

Why

msgvault users can have a long-running daemon, scheduled syncs, CLIs, TUI, MCP tools, and agent workflows touching the same archive. The old local CLI path let many commands open SQLite directly while msgvault serve was also alive, which made local usage behave differently from remote usage and left too much archive-ownership policy spread across command implementations.

This PR makes daemon HTTP the boundary for CLI archive access. Local and remote msgvault now use the same transport model, the daemon owns the expensive archive/store lifecycle, and long-running operations keep the terminal ergonomics users expect by streaming progress and command output back through the client.

What changed

  • Routes archive-access CLI commands through the selected daemon, using either the configured remote server or an auto-started local daemon.
  • Hardens local daemon lifecycle behavior around start/stop/restart/status, idle shutdown, binary/API compatibility, write ownership, scheduler cancellation, and graceful shutdown of in-flight mutations.
  • Moves the HTTP API to Huma and checks in OpenAPI artifacts plus a generated Go client so the daemon contract is explicit and CI can reject stale generated assets.
  • Replaces the old split remote-client implementation with a daemon client adapter layer used by CLI, TUI, and MCP paths.
  • Updates user-facing docs for daemon lifecycle, remote deployment, configuration, troubleshooting, and the 0.17.0 deprecations for CLI/TUI/MCP flags that now need daemon-level configuration.

Compatibility notes

There are no new database schema or DDL migrations in this PR relative to current main. This is still not a no-risk no-op against an old 0.15.x archive, because current main already contains post-0.15 schema evolution; testing against a copied archive or backup is the right rollout path.

The branch has been exercised with the tagged Go suite, docs build checks, lint/vet, OpenAPI freshness checks, and roborev open-job checks before opening this PR.

Move archive-access CLI behavior behind the selected daemon so local and remote msgvault use the same HTTP path. This prevents foreground CLI processes from opening large SQLite archives directly, gives the daemon clear ownership of long-running mutations, and preserves existing stdout/stderr ergonomics by streaming daemon work back to the terminal.

The migration also hardens daemon lifecycle management, exposes a Huma/OpenAPI contract with generated client coverage, routes TUI and MCP archive workflows through daemon-backed adapters, and documents the daemon-level configuration that replaces deprecated per-command analytics flags in 0.17.0.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (a51a8bc)

Summary verdict: Changes need fixes before merge due to medium-severity behavior and concurrency regressions.

Medium

  • cmd/msgvault/cmd/tui.go:130
    The local TUI now wraps the HTTP store with daemonclient.NewEngineAdapter, but that adapter only implements query.Engine, not query.TextEngine. Since tui.go only enables text mode when the backend satisfies query.TextEngine, SMS/MMS/Teams text search and listing become unavailable in the TUI.
    Fix: Add daemon-backed query.TextEngine support and API/client methods for text operations, or keep a direct local text engine for TUI text mode until those endpoints exist.

  • internal/daemonclient/engine_adapter.go:725
    Engine.ListAccounts now calls the daemon store’s /api/v1/accounts path, which returns configured scheduled accounts. The previous query engine contract returned all source accounts from the archive. Consumers such as the TUI account selector and MCP account lookup can no longer see imported sources or accounts present in the database but not in the schedule config.
    Fix: Back query.Engine.ListAccounts with the source-backed CLI accounts endpoint, or add an equivalent daemon endpoint that returns all archive sources and map those to query.AccountInfo.

  • internal/api/cli_handlers.go:1316
    handleCLISearch is a GET route, so the operation gate skips it, but it can still mutate the database by running BackfillFTS when the FTS index needs backfill. That allows index writes to run concurrently with sync/delete/dedupe operations or with another search doing the same backfill.
    Fix: Move the backfill to a gated operation, explicitly gate this route when backfill is needed, or ensure FTS backfill is completed under the daemon’s serialized mutation path before serving search.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 13m16s), codex_security (codex/security, done, 3m59s) | Total: 17m26s

wesm and others added 8 commits July 1, 2026 11:00
The daemon-routing PR needs to preserve the old CLI/TUI behavior while making HTTP the single archive access path. Review and CI found places where that contract still leaked: TUI text mode lost its TextEngine, source account listing used scheduler config instead of archive sources, lazy FTS backfill could mutate from an ungated GET path, and log following could occupy the mutation gate indefinitely.

This keeps the stricter daemon-first architecture by adding generated OpenAPI coverage for daemon-backed text queries instead of falling back to direct local access. The remaining changes repair CI-only drift: generated assets, Docker bind-mount ownership, the Nix vendor hash, PostgreSQL attachment insertion, and the repository testify-helper rule.

Validation: Docker image build and smoke test with a UID-1000-owned /data bind mount; nix build.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The Docker validation job gives the bind-mounted data directory to the non-root image user so init-db exercises the same ownership path users rely on. The host-side assertion also needs to account for that ownership transfer; otherwise a permission-sensitive runner can report that the database is missing even when the container created it successfully.\n\nUse a private temp directory, verify the resulting database with sudo, and clean up with sudo so the smoke test checks the intended behavior without depending on runner umask or sticky-bit details.\n\nValidation: Docker smoke check against a UID-1000-owned /data bind mount.\n\nGenerated with Codex\nCo-authored-by: Codex <codex@openai.com>
Text aggregate queries use TimeYear as the enum zero value, so the daemon adapter could not distinguish an omitted granularity from an explicit yearly aggregate. That made daemon-backed callers send time_granularity=year and bypass the HTTP handler's intended month default.

Carry an explicit granularity marker for text aggregate options, omit the query parameter only when it is truly unset, and normalize direct text engines to the same API-compatible default. This keeps existing CLI/TUI calls on the month default while still allowing callers to request explicit yearly buckets.

Validation: make test; make lint-ci; go run ./cmd/testify-helper-check -tags="fts5 sqlite_vec" ./...

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The custom testify-helper checker now enforces local assert/require helpers once tests accumulate repeated package-level calls. The daemon HTTP PR added enough assertion-heavy tests to trip that lint rule, so this updates the affected tests to use local helpers while keeping testify imports conventional.

Validation: go fmt ./...; go vet -tags "fts5 sqlite_vec" ./...; go test -tags "fts5 sqlite_vec" ./cmd/msgvault/cmd ./internal/api ./internal/config ./internal/daemonclient ./internal/mcp ./internal/scheduler ./pkg/client; make lint-ci

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The post-rebase testify helper cleanup left several daemon generated-client handler assertions using package-call argument shape after a local assert helper was in scope. That made the test compare *testing.T against the expected request value instead of validating the request itself.

Keep testify/assert imported unaliased and use the local helper call shape in those handler closures so the checker stays clean without changing the import convention.

Validation: go run ./cmd/testify-helper-check -tags="fts5 sqlite_vec" ./...; go test -tags "fts5 sqlite_vec" ./internal/daemonclient; go test -tags "fts5 sqlite_vec" ./cmd/msgvault/cmd ./internal/api ./internal/config ./internal/daemonclient ./internal/mcp ./internal/scheduler ./pkg/client; go vet -tags "fts5 sqlite_vec" ./...; make lint-ci; make test

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The idle reset test used a 40 ms timeout with only a 25 ms pre-reset sleep. On macOS CI, normal scheduler delay can let the tracker reach its initial idle timeout before the test request runs, so the test fails without proving a daemon lifecycle regression.

Use a wider timing window and assert that the tracked request was accepted before checking that idle does not fire. This keeps the behavior under test while removing the CI-only race.

Validation: go test -tags "fts5 sqlite_vec" ./internal/api -run TestIdleTrackerExternalRequestResetsIdle -count=200; make test; make lint-ci

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The OpenAPI generated-client freshness test regenerates files on the runner and compares them to the checked-in artifacts. On Windows, oapi-codegen can emit CRLF for generated Go files, making semantically identical artifacts compare stale against the LF files committed from Unix.

Normalize CRLF to LF before comparing generated Go source so the test still catches real OpenAPI/client drift without failing solely on platform line endings.

Validation: go test -tags "fts5 sqlite_vec" ./internal/api -run "TestOpenAPI(ClientArtifact|ClientSpecArtifact|Artifact)UpToDate" -count=1; make test; make lint-ci

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The generated OpenAPI freshness tests should catch real schema/client drift, not platform-specific checkout newline differences. Windows CI can present checked-in YAML with CRLF while the rendered OpenAPI document uses LF, which made the client spec copy look stale even though the content matched.

Normalize generated artifact comparisons consistently for YAML and generated Go output so the tests remain strict about content while staying portable across runners.

Validation: go test -tags "fts5 sqlite_vec" ./internal/api -run 'TestOpenAPI(Artifact|ClientSpecArtifact|ClientArtifact)UpToDate' -count=1

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (ed89805)

Medium-risk issues found; no Critical or High findings were reported.

Medium

  • cmd/msgvault/cmd/store_resolver.go:136 - Local daemon reuse checks only a compatible runtime record plus unauthenticated ping, while routed HTTP calls authenticate with the current process config. If server.api_key changes for the same data dir, the CLI can keep reusing a daemon started with the old key and then fail routed operations with 401 until manual restart. Include auth/config identity in the runtime record, or probe an authenticated endpoint with the current key before reuse and restart/report a clear key mismatch.

  • internal/api/operation_gate.go:176 - cliRunRequestSkipsOperationGate reads the entire /api/v1/cli/run request body before the Huma API-key middleware runs. Since operationGateMiddleware wraps the router before /api/v1 auth, an unauthenticated client can POST a very large body and force buffering in memory before credentials are checked, creating an availability risk. Move API authentication before operation-gate body inspection, or cap the pre-parse with http.MaxBytesReader/io.LimitReader and reject oversized bodies.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 16m35s), codex_security (codex/security, done, 7m22s) | Total: 24m6s

The testify helper rule should not depend on local convention to keep assert and require imports unaliased. Configure importas to reject aliases for github.com/stretchr/testify/assert and github.com/stretchr/testify/require, then apply golangci-lint --fix so the repository starts from a compliant baseline.

Where unaliasing exposed helper-shadowed package-call shapes, keep the valid local helper declarations and use the helper call form instead. Nested subtests that needed a fresh helper now avoid relying on an outer helper variable as the package qualifier.

Validation: go vet ./...; go vet -tags "fts5 sqlite_vec" ./...; make lint-ci; make test

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Comment thread internal/api/cli_handlers.go Dismissed
Comment thread internal/api/handlers.go Dismissed
Local daemon discovery must not silently reuse a process that was started with different server authentication. When [server].api_key changes, the CLI needs to fail before routed operations with a clear restart remedy instead of discovering the old daemon via unauthenticated ping and then surfacing unrelated 401s.

Record a runtime auth fingerprint, verify it on reuse, and run an authenticated readiness probe for keyed local daemons. Also bound the pre-auth /api/v1/cli/run body inspection used to keep read-only log tails out of the mutation gate, so unauthenticated oversized requests cannot force unbounded buffering.

Validation: go test -tags "fts5 sqlite_vec" ./cmd/msgvault/cmd -run 'Test(OpenHTTPStore|WriteDaemonRuntime|OpenTUIEngineLocalFlagUsesLocalDaemonHTTP)' -count=1; go test -tags "fts5 sqlite_vec" ./internal/api -run 'TestOperationGateMiddleware(SkipsLogCLIRunAndRestoresBody|RejectsOversizedCLIRunInspectionBody|StillGatesMutatingCLIRun|GatesMutatingMethods)' -count=1

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Comment thread cmd/msgvault/cmd/daemon_runtime.go Fixed
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (d24501c)

The PR has one medium-severity issue: daemon-routed CLI execution appears to break interactive OAuth reauthorization.

Medium

  • cmd/msgvault/cmd/daemon_cli_subprocess.go:96: Daemon-routed CLI subprocesses are created without stdin, while sync, sync-full, and verify still determine OAuth reauth availability from os.Stdin in the child process. Because these commands now run through the daemon subprocess path, an interactive user running msgvault sync ... or msgvault verify ... can no longer re-authorize an expired or revoked token; the child sees non-interactive stdin and returns the “run from an interactive terminal” error.

    Suggested fix: Keep OAuth reauthorization in the foreground client, or add an explicit interactive reauth flow/protocol for daemon CLI streaming. At minimum, add coverage for the daemon-routed expired-token path so sync and verify can still recover tokens when invoked from an interactive terminal.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 12m32s), codex_security (codex/security, done, 7m30s) | Total: 20m10s

wesm and others added 2 commits July 1, 2026 12:48
A runtime record from an older daemon does not tell the CLI whether that daemon was started with an API key. Treat that missing auth metadata as unknown: keyed current configs still require a restart, while no-key current configs verify the daemon with a live request so removing an API key does not leave the CLI talking to an older keyed process until a generic 401 appears.

Use an Argon2id-derived runtime fingerprint rather than a plain SHA-256 digest so the stored metadata remains deterministic for local compatibility checks without tripping security scanners that treat API keys like password material.

Validation: go test -tags "fts5 sqlite_vec" ./cmd/msgvault/cmd -run 'Test(OpenHTTPStore|WriteDaemonRuntime|OpenTUIEngineLocalFlagUsesLocalDaemonHTTP|ShowMessage|StatsCommand_(Scoped|Unscoped)|UpdateAccount)' -count=1

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The daemon-routing PR is large enough for CodeQL to report existing trusted-local surfaces as new PR alerts. These locations are intentional: setup prints a generated key in an interactive flow, manifest and secure-file helpers operate on user-selected local paths, QuerySQL is a trusted user interface, attachment paths are hash-validated, and release archives are sanitized before extraction.

Add source-level CodeQL suppressions at the reported sinks with local rationale so the PR check can distinguish these documented design choices from actionable security regressions.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Comment thread cmd/msgvault/cmd/setup.go Dismissed
Comment thread internal/query/duckdb.go Dismissed
Comment thread internal/update/update.go Fixed
CodeQL source suppressions are location-sensitive, so documenting the archive iteration points does not help alerts reported at the filesystem operations that consume sanitized paths.

Move the rationale to the sanitized extraction sinks so future code-scanning output points at the same place as the documented invariant without changing release archive extraction behavior.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (59c0569)

Verdict: No Medium, High, or Critical findings to report.

Security review found no issues. The only reported finding was Low severity and is omitted per the requested threshold.


Panel: ci_default_security | Synthesis: codex, 5s | Members: codex_default (codex/default, done, 12m49s), codex_security (codex/security, done, 5m43s) | Total: 18m37s

msgvault and the rest of the kit-based tools should share one updater implementation instead of carrying duplicate release discovery, archive extraction, checksum, and install logic. Using kit/selfupdate keeps those security-sensitive paths aligned while preserving msgvault release asset naming and SHA256SUMS-only compatibility so existing 0.16.x installations still have a path forward.

Updating the binary also needs to respect the daemon-first architecture: a local daemon may continue running the old executable after the CLI replaces it. Stop any live local daemon before installation and restart it afterward, including best-effort restart paths when stop or install fails after a daemon was already stopped.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (2582eae)

Summary verdict: changes need fixes for two medium-risk regressions; no security issues were identified.

Medium

  • Location: cmd/msgvault/cmd/serve.go:361
    Problem: shutdownServeRuntime gives http.Server.Shutdown only serveAPIShutdownTimeout, even though it then waits up to the longer operation-drain timeout for active archive work. Because Shutdown waits for active handlers, a legitimate daemon-routed operation that runs longer than the API shutdown timeout can make shutdown return API server shutdown: context deadline exceeded even after the operation gate drains successfully.
    Fix: Use the operation-drain context/budget for server shutdown, or stop accepting new requests and wait for the operation gate before treating a server shutdown timeout as fatal.

  • Location: cmd/msgvault/cmd/export_attachments.go:127
    Problem: export-attachments now fetches each attachment with GetCLIAttachment, which materializes the full response body in memory before writing it. This regresses the previous streaming export path and can spike memory or fail on large attachments.
    Fix: Use a streaming attachment opener, such as the existing daemon client OpenCLIAttachment, and copy directly to the destination file with io.Copy.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 17m11s), codex_security (codex/security, done, 4m40s) | Total: 22m1s

After self-update replaces the on-disk executable, the still-running updater process should not rely on os.Executable when respawning the daemon. Depending on platform and install mechanics, that can point at the renamed or deleted process image rather than the freshly installed binary.

Capture the intended target executable path before installation and pass it through the background daemon start path only for the post-update restart. Normal daemon auto-starts continue to resolve the executable the same way they did before.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (42da222)

Findings require changes before merge.

Medium

  • internal/daemonclient/engine_adapter.go:222, internal/daemonclient/engine_adapter.go:507, internal/daemonclient/engine_adapter.go:934
    The daemon-backed query engine drops MessageFilter.SourceIDs and AggregateOptions.SourceIDs when building HTTP requests, only forwarding source_id. SourceIDs is the collection/multi-source filter and overrides SourceID in the local engines; silently ignoring it can broaden collection-scoped or explicitly empty-scope requests into all-account or single-account queries, producing incorrect search results, aggregates, stats, or Gmail ID lists.
    Fix: Add multi-source support to the API/client and forward SourceIDs while preserving empty-slice match-nothing semantics, or explicitly reject non-nil SourceIDs in this adapter instead of silently dropping the filter. Add adapter coverage for both non-empty and empty SourceIDs.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 27m3s), codex_security (codex/security, done, 4m55s) | Total: 32m5s

wesm and others added 3 commits July 1, 2026 14:57
Auto-started CLI commands can now wait behind daemon startup work before the HTTP listener exists. On upgraded large archives that work may include archive schema setup, vector backend migrations, embed_gen backfill, or analytics initialization, so a silent 30-second readiness ceiling makes commands like tui look hung and then fail while the daemon is still doing useful work.

Keep command stdout untouched, but print local daemon startup details to stderr, mirror the latest serve log line while waiting, and give local autostart a longer readiness budget. Add daemon startup stage logs and vector migration phase logs so the foreground caller can show what the daemon is actually doing; redact PostgreSQL database URLs in those logs.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The daemon must not spend minutes opening and migrating a large archive only to discover that the configured API port is unavailable. Reserve the HTTP listener before claiming daemon ownership or touching the archive, then hand that listener to the API server after startup completes.

This also makes daemon-start failures visible to foreground CLIs by including the last serve.log line when the background process exits before readiness. Port conflicts now fail fast with the actual bind error instead of surfacing as a generic server process exit.

Validation: ./msgvault serve start with agentsview occupying 127.0.0.1:8080 now fails immediately with the bind conflict before archive startup.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
A CLI waiting behind the background-launch lock should not sit through the full long readiness budget after the original starter exits or releases the lock. That path exists specifically for concurrent local auto-starts, so it must observe both possible outcomes: a usable daemon appears, or the launch lock becomes available again.

Poll the launch lock while waiting for a usable runtime and let the waiting CLI take over startup when the lock is released. This keeps the long readiness budget for genuine slow daemon initialization without turning failed startup attempts into 30-minute hangs for later commands.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

roborev: Combined Review (ee91222)

Medium finding: msgvault build-cache can no longer recover a missing/stale DuckDB cache in the documented configuration.

Medium

  • cmd/msgvault/cmd/build_cache.go:105 - build-cache now always goes through OpenHTTPStore, which autostarts/reuses the daemon. When [analytics].engine = "duckdb" and auto_build_cache = false, daemon startup refuses to open a stale or missing cache, so the documented recovery command msgvault build-cache cannot reach /cli/build-cache to create or repair it.
    Fix: Let local build-cache bypass daemon autostart when no usable daemon is already running, or expose a daemon build-cache path that does not require opening the DuckDB analytics engine first.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 12m7s), codex_security (codex/security, done, 7m25s) | Total: 19m38s

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
wesm and others added 3 commits July 1, 2026 23:41
Jobs addressed:
- 26864: guard the direct launch-lock acquisition with daemonStartInProgress so
  a CLI does not spawn a duplicate daemon while a prior serve-start child is
  still initializing.
- 26866: make vectorInitHandle.WaitContext prefer h.done so a finished init is
  never misreported as timed out, keeping CloseFeatures reachable.
- 26873: fall back to the raw log line when the humanizer sees unexpected attrs
  (e.g. panic/stack) instead of hiding them.
- 26874: skip interactive console quieting when --log-sql/[log].sql_trace is set
  so INFO-level SQL trace is not suppressed.
- 26865: use a build-tag-appropriate mainPath in the generic precheck tests so
  they pass under pgvector-only builds.
- 26906: keep signal-terminated subprocesses wrapped (only normal exits map to
  the silent sentinel); limit the non---all logs fallback to msgvault-* files;
  replace the sh-dependent test with a cross-platform self-exec helper.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Jobs addressed:
- 26896: populate active/source-deleted breakdown in GetTextStats (SQLite +
  DuckDB), DuckDB computeSearchStats, and the daemon engine adapter's
  TotalStats conversion.
- 26913: hydrate SQLite message-summary FromName from the message's own "from"
  recipient display_name, matching sender-name aggregation and DuckDB, and
  falling back to the direct sender participant only when there is no "from" row.
- 26898: add context-aware batch hydration (batchPopulateContext) so the context
  search path cancels recipient/label lookups with the request.
- 26902: add context variants of ListMessages/GetStats/GetMessage/
  GetMessagesSummariesByIDs on *Store so request paths thread the context and
  request_id into every request-owned query.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Jobs addressed:
- 26891: reject invalid known-operator values (q.Err()) with 400 invalid_query
  on /search/fast and /search/deep instead of dropping them.
- 26900: map fast/deep search context deadline/cancellation to a structured 503
  via writeIfContextError instead of a generic 500.
- 26908: parse the sub-aggregate filter from a request view without the
  aggregate-owned sort/direction/limit/offset params so aggregate sorts
  (count/name/attachment_size) are honored.
- 26893: re-validate a latched stale vector status on /health and /api/v1/stats
  so a rebuild/activation clears it without a daemon restart.
- 26902: call the context-aware store variants with r.Context() for stats,
  list, message, and summary hydration paths.
- 26911: invalidate the memoized FTS-completeness flag around rebuild-fts and
  re-set it only after a successful rebuild.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (3e226e8)

Medium-risk issues found; no Critical or High findings were reported.

Medium

  • internal/api/handlers.go:1464
    schedule and enabled are defaulted/ignored by handleAddAccount, but the request struct makes them required in the OpenAPI schema and generated client. Generated clients can reject otherwise valid server requests that only provide email.
    Fix: Mark defaulted fields optional in the schema, such as json:"schedule,omitempty" and json:"enabled,omitempty" or pointer/optional Huma annotations, then regenerate OpenAPI/client artifacts.

  • internal/api/server.go:368
    Long-running daemon CLI routes are intended to be unbounded, but the global http.Server.WriteTimeout is still capped at DaemonLongRequestTimeout + 5s. Streaming operations like full sync, build-cache, or rebuild-fts can be disconnected after about 30 minutes even though their request context has no timeout.
    Fix: Disable or extend write deadlines for the long streaming routes, for example by using per-request deadlines or clearing the response write deadline for those handlers.

  • cmd/msgvault/cmd/export_attachment.go:172
    The daemon-routed export-attachment path removed the previous output-path validation while the command still documents piping email attachment filenames directly into -o. A malicious sender can provide a filename like ../../.ssh/authorized_keys; if the user follows the documented export loop, exportAttachmentBinaryStream passes that path to writeAttachmentStreamToFile, which writes/renames to that exact path. This allows external email content to escape the intended export directory and create or replace writable files in the user’s account.
    Fix: Restore output-path validation before writing. --output should reject absolute/rooted/drive/UNC paths and any .. traversal, or sanitize metadata-derived names and require an explicit unsafe flag for arbitrary paths. Re-add regression tests for traversal and absolute output paths.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 17m55s), codex_security (codex/security, done, 6m34s) | Total: 24m40s

- Daemon storeAPIAdapter now implements the context-aware read methods
  (GetStatsContext/ListMessagesContext/GetMessageContext/
  GetMessagesSummariesByIDsContext), so api.Server uses the request
  context instead of context.Background() in production. Export the
  interface as api.CtxMessageStore for a compile-time assertion.
- query GetTextStats (SQLite + DuckDB) now applies the live-message
  predicate so dedup-hidden rows are excluded from message_count and the
  active/source-deleted breakdown.
- api hybrid and similar search map hydration context errors to a
  structured 503 instead of returning 200 with empty results.
- api handleStats maps context timeout/cancellation to 503 instead of
  a generic 500.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (da0d8a2)

Medium: cmd/msgvault/cmd/build_cache.go:172 may skip required full rebuilds for manual/daemon build-cache runs.

runBuildCacheLocal calls buildCache using only the user-provided fullRebuild flag, so cacheNeedsBuild signals for source-deleted, dedup-hidden, or updated rows are bypassed. If no newer message ID exists, buildCache can skip export even though a full rebuild is required, leaving stale Parquet rows.

Suggested fix: run cacheNeedsBuild before calling buildCache and promote fullRebuild when staleness.FullRebuild is true.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 20m12s), codex_security (codex/security, done, 7m18s) | Total: 27m36s

wesm and others added 3 commits July 2, 2026 10:32
Both SQLiteEngine.TextSearch and DuckDBEngine.TextSearch passed the raw
user query straight to `messages_fts MATCH ?`, so any FTS5 metacharacter
(`"`, `(`, `'; DROP TABLE--`) reached the FTS5 parser and returned a 500
syntax error on /api/v1/text/search.

Add a shared sanitizeTextSearchMatch helper that tokenizes the query and
builds the MATCH argument via the existing store SQLiteDialect.BuildFTSArg
sanitizer, which quotes each term, applies prefix match, and drops
tokenless terms. Empty or punctuation-only input now returns zero rows
cleanly instead of erroring, matching the main search path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
/messages and /search reset an over-max page_size to the default 20
instead of clamping to the max 100; split the branches so absent/too-small
falls back to 20 and >100 clamps to 100. Non-numeric input still returns
400.

The /aggregates/sub sort param is validated against the aggregate enum
(count, size, attachment_size, name) but the OpenAPI spec documented the
message-filter enum (date, size, subject) because the merge put
message-filter params first. Give aggregateOptionParams the explicit sort
enum doc and order aggregate params first for sub-aggregates so the spec
matches the handler; regenerate the OpenAPI artifacts.

Clarify the /accounts summary to document that it lists scheduler-
configured accounts, distinct from /cli/accounts which lists all archived
sources.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Address seven LOW-severity CLI consistency findings:

- show-message/export-eml/export-attachments: reject empty and
  malformed-numeric message IDs (e.g. 42.5, 1e3) up front with a clear
  "invalid message ID" error, trimming surrounding whitespace. Source/Gmail
  ID lookups are preserved.
- --log-level: validate the resolved level (flag or [log].level) via
  logging.ValidateLevel; invalid values now error instead of silently
  defaulting to info.
- stats: format all counts with thousands grouping, not just Messages.
- query --format: fold case so JSON/Table/CSV are accepted.
- export-attachments -o DIR: create a missing output directory, matching
  the file-creating sibling exporters.
- show-deletion: reject empty/whitespace batch IDs before lookup;
  GetManifest guards empty IDs (fixes the "manifest  not found" double space).
- list-deletions --json: stop truncating the manifest description to 30 chars
  at storage time so JSON carries the full value; table views still truncate.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (18b591e)

Security review found one Medium issue that should be fixed before merge.

Medium

  • internal/api/cli_handlers.go:962 / internal/deletion/manifest.go:345 — Path traversal in uploaded deletion manifest IDs.

    POST /api/v1/cli/deletion-manifests accepts client-supplied manifest.ID and only checks that it is non-empty before saving. SaveManifest then uses filepath.Join(dir, manifest.ID+".json") and creates parent directories, so an authenticated API caller could submit an ID containing ../ path traversal and write JSON outside the deletions directory, potentially corrupting daemon-owned state such as OAuth token files.

    Suggested fix: validate manifest IDs before saving, ideally both in the API handler and deletion manager. Reject absolute paths, .., path separators, and IDs outside the generated filename alphabet. After joining/cleaning, verify the final path remains directly under the intended status directory before writing.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 14m56s), codex_security (codex/security, done, 6m29s) | Total: 21m33s

POST /api/v1/cli/deletion-manifests accepted a client-supplied
manifest.ID and only checked it was non-empty. SaveManifest joins the
ID into a filesystem path and creates parent directories, so an
authenticated caller could submit an ID like "../../tokens/foo" and
write a JSON file outside the deletions directory, corrupting
daemon-owned state.

Add ValidateManifestID, restricting IDs to the alphabet generated IDs
already use ([A-Za-z0-9_-]), which excludes '.', path separators and any
absolute or "../" component. Enforce it at the deletion manager (Save,
Get, Move, Cancel) for defense in depth and at the API handler, which
now returns 400 invalid_manifest_id instead of a 500.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (b3f858d)

High severity issue found; low-severity-only findings were omitted.

High

  • internal/api/server.go:316 - The new pprof handlers are mounted on the main server and only gated by r.RemoteAddr loopback checks, not the configured API key. In deployments behind a same-host reverse proxy, TLS terminator, or SSH tunnel, unauthenticated remote requests may arrive as 127.0.0.1, bypass /api/v1 API-key protection, and access /debug/pprof/*.

    This could expose cmdline, goroutine/heap profiles, CPU profiles, or traces, and allow remote triggering of profiling work.

    Suggested fix: avoid registering pprof on the main API mux by default. Make it explicit/opt-in on a separate local-only admin listener, or require both loopback and normal API authorization when an API key is configured.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 15m28s), codex_security (codex/security, done, 7m2s) | Total: 22m38s

The /debug/pprof/ handlers were gated on isLoopbackRequest alone. Behind
a same-host reverse proxy, TLS terminator, or SSH tunnel, unauthenticated
remote requests arrive as 127.0.0.1 and would bypass the /api/v1 API-key
protection to read cmdline, goroutine/heap/CPU profiles and traces, and
trigger profiling work.

Gate the handlers on both loopback and apiRequestAuthorized, reusing the
same predicate as the API-key middleware and the loopback rate-limit
exemption so the trust logic cannot drift. When no API key is configured
(keyless local mode) apiRequestAuthorized returns true, so on-box
introspection for the TUI/CLI autostart case is unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@wesm wesm marked this pull request as ready for review July 2, 2026 19:15
wesm and others added 4 commits July 2, 2026 14:35
Thread each source's OAuth app name through the /cli/accounts response and
the daemonclient CLIAccount struct so the client can build the right OAuth
manager for a given Gmail account. Empty string means the default app.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
msgvault sync/sync-full proxy to the local daemon, which runs the actual
sync in a non-TTY CLI subprocess that cannot open a browser. When a Gmail
token is expired the subprocess failed with a misleading "run from an
interactive terminal" error even though the user was on a terminal.

Add a client-side preflight that, only when the daemon is local, stdin is
a terminal, and OAuth is configured, probes each target Gmail account's
token and runs the browser reauth flow for any that are expired/revoked.
Tokens are shared on the local filesystem, so the refreshed token is on
disk before the subprocess runs. The preflight never enrolls a new account
and never touches valid tokens or service accounts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The non-interactive reauth error now names the real remedy
(msgvault add-account <email> --force), whose browser flow works from the
daemon's CLI subprocess, instead of falsely telling the user to run from an
interactive terminal.

ResolveConsoleLevel also quiets routine INFO to WARN for the daemon CLI
subprocess, whose --no-log-file stderr streams verbatim to the console.
Human progress (fmt.Print) and WARN/ERROR lines are unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The non-interactive reauth error named only 'add-account <email> --force',
but --force requires browser OAuth and is rejected with --headless, so a
headless-server operator could not follow the advice. Name both remedies:
--force for a desktop with a browser, --headless (device code) for a
server without one. The headless code path prints device-flow
instructions before the token-reuse check, so it re-authorizes an
existing expired token.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (f507fa1)

Summary verdict: Two medium-severity issues need attention; no critical or high findings were reported.

Medium

  • cmd/msgvault/cmd/cache_staleness.go:56

    • Problem: Cache freshness does not compare the persisted cache schema version to cacheSchemaVersion, so an existing complete v5 cache can be considered fresh after the version is bumped to v6. The daemon may keep using stale parquet contents.
    • Fix: After loading _last_sync.json, treat state.SchemaVersion != cacheSchemaVersion as NeedsBuild: true with FullRebuild: true, and add test coverage.
  • cmd/msgvault/cmd/sync_http.go:164

    • Problem: Reauth preflight filters requested sync targets only by email, while sync and sync-full also accept display names. Running a daemon-backed sync by display name can skip local token renewal and then fail inside the non-interactive daemon path.
    • Fix: Include display name in preflightAccount and match the request using the same email/display-name semantics as local sync resolution.

Reviewers: 2 done | Synthesis: codex, 9s | Total: 17m39s

wesm and others added 6 commits July 2, 2026 14:59
getTokenSourceWithReauth is shared by Gmail sync and Calendar sync, but
its non-interactive recovery text hardcoded the Gmail add-account flow,
misdirecting Calendar callers into the wrong scopes. Thread a
caller-specific recovery hint through all call sites: Gmail callers point
at add-account, the Calendar caller at add-calendar.

Addresses roborev jobs 26998 and 26999 finding C.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reauth preflight now matches the sync argument against each Gmail
account's display name as well as its email (sync/sync-full resolve via
GetSourcesByIdentifierOrDisplayName) and re-authorizes every match, not
just the first. Preflight reauth uses a new browser
AuthorizePreservingGrantedScopes so Google's forced consent keeps
previously granted scopes (Calendar, permanent-delete).

Addresses roborev job 26997 findings A and B.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
globalConfigFlagArgs dropped the persistent logging flags, so an explicit
--log-level/--verbose/--log-sql never reached daemon-proxied commands and
the subprocess quieting forced WARN. Forward these flags in
globalConfigFlagArgs (also used by the build-cache subprocess, where
forwarding is desirable) so an explicit level overrides the quieting while
file logging stays off via --no-log-file.

Addresses roborev job 26998 finding D.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cacheNeedsBuild gated the analytics rebuild on message counts,
deletions, and parquet presence but never compared the persisted
schema_version to cacheSchemaVersion. An otherwise up-to-date cache
was therefore reported fresh after a schema bump, so buildCache (which
does force a full rebuild on mismatch) was never called and the daemon
kept serving stale-layout parquet. Treat a schema mismatch as a full
rebuild in the staleness gate itself.

Addresses roborev combined review on f507fa1 (Medium). The same
review's display-name preflight finding was already fixed in 8deede6.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two gaps in logging-flag forwarding to daemon-spawned subprocesses:

- daemonCLIArgsFromCobra stripped all root persistent flags before
  building CLIRunRequest.Args, so an already-running daemon never saw a
  client's `--log-level`/`--verbose`/`--log-sql`/`--log-sql-slow-ms`.
  The subprocess reconstructs args from the daemon's own globals, so the
  per-invocation level was lost. Allowlist the logging flags so they pass
  through to the child argv, where they override the daemon's defaults.

- globalConfigFlagArgs forwarded --log-level/--verbose/--log-sql but not
  --log-sql-slow-ms, dropping an explicit slow-query threshold in
  build-cache and daemon CLI subprocesses. Forward it when non-zero.

Addresses roborev review 27004 (Medium + Low).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An expired or revoked Calendar refresh token still reports HasToken,
the calendar scope, and a matching client, so add-calendar treated it
as reusable: it skipped every reauth branch and failed later in
buildCalendarClient with a non-interactive invalid_grant — the same
error the recovery hint just told the user to repeat. `--headless` also
skipped the copy-token instructions for the same reason.

Probe the token up front (attempt a refresh, detect invalid_grant) and
route a dead token to reauthorization: a browser reauth that preserves
granted scopes when interactive, or the headless copy-token
instructions under --headless.

Addresses roborev review 27002 (Medium).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (3d1c134)

Medium issue found; no High or Critical findings.

Medium

  • internal/api/operation_gate.go:151: Operation gate failures return plain http.Error text responses before requests reach API handlers, so gated API failures such as request-body inspection errors or 503 busy/shutdown responses bypass the standard JSON ErrorResponse envelope. Generated clients expecting documented JSON errors may surface decode errors instead of actionable daemon errors.
    • Fix: Return the standard ErrorResponse JSON shape from these branches, for example via writeError, and update operation gate tests to assert the JSON envelope.

Reviewers: 2 done | Synthesis: codex, 6s | Total: 11m18s

…gate

The add-calendar token probe relied on TokenSource, which returns a
cached unexpired access token without contacting Google, so a revoked
refresh token went undetected until a later sync failed with a
non-interactive invalid_grant. Add oauth.Manager.ForceRefresh, which
redeems the stored refresh token directly (saving the refreshed token
on success), and switch the probe to it so reauthorization or headless
copy-token instructions happen up front.

Operation gate middleware failures (oversized or unreadable cli/run
inspection body, busy/shutting-down gate) returned plain http.Error
text, bypassing the JSON ErrorResponse envelope that API clients
expect. Return writeError responses instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roborev-ci

roborev-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

roborev: Combined Review (f80b268)

Medium: service-account add-account misses default identity confirmation and post-source migrations.

  • Severity: Medium
  • Location: cmd/msgvault/cmd/addaccount.go:187
  • Problem: The service-account add-account path returns after creating/updating the source without confirming the default account identity or running runPostSourceCreateMigrations. Since ingest startup migration is now intentionally a no-op, this path can leave service-account Gmail sources without their account identifier and deferred legacy identities until a later command runs migrations.
  • Fix: Before the success message/return, mirror the OAuth paths: call confirmDefaultIdentity unless --no-default-identity is set, then call runPostSourceCreateMigrations.

Reviewers: 2 done | Synthesis: codex, 7s | Total: 25m37s

The force-refresh tests added after the unaliased testify import cleanup crossed the custom testify-helper threshold. CI's macOS test job runs make lint-ci after tests, so those direct package calls failed even though the package tests themselves passed.

Use local assert and require helpers in the affected tests while keeping testify imports unaliased, matching the importas rule and the repository's helper checker.

Validation: go run ./cmd/testify-helper-check -tags="fts5 sqlite_vec" ./internal/oauth; go test -tags "fts5 sqlite_vec" ./internal/oauth; make lint-ci; make test

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown

roborev: Combined Review (bc12f69)

Summary verdict: Changes need follow-up for two medium-severity regressions; no critical or high findings were reported.

Medium

  • Location: internal/api/server.go:370
    Problem: Long-running CLI routes are treated as unbounded in requestTimeoutForPath, but http.Server.WriteTimeout still caps the whole response at DaemonLongRequestTimeout + 5s. Streaming operations like full sync, verify, or cache rebuild can still be disconnected after about 30 minutes.
    Fix: Disable the server-level write timeout for streaming routes, or clear/set per-response write deadlines for those handlers and rely on the existing per-route context timeouts elsewhere.

  • Location: cmd/msgvault/cmd/update.go:142
    Problem: msgvault update now loads and validates the full app config before installing, even though the root command explicitly skips config loading for update. A malformed or invalid config can block installing an update before any daemon lifecycle handling is needed.
    Fix: Make daemon config loading best-effort for update lifecycle handling. On config load failure, proceed with the update without daemon stop/restart, or fall back to minimal runtime-record discovery that does not validate the full config.


Reviewers: 2 done | Synthesis: codex, 9s | Total: 14m36s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants