Skip to content

Merge upstream GBrain v0.35.1.1 while preserving Eva OpenClaw defaults#102

Merged
100yenadmin merged 15 commits into
masterfrom
eva/merge-upstream-v0.35.1.1
May 17, 2026
Merged

Merge upstream GBrain v0.35.1.1 while preserving Eva OpenClaw defaults#102
100yenadmin merged 15 commits into
masterfrom
eva/merge-upstream-v0.35.1.1

Conversation

@100yenadmin
Copy link
Copy Markdown
Member

@100yenadmin 100yenadmin commented May 17, 2026

TLDR

Closes #101.

This PR catches Eva Brain up from 172dbcc to upstream GBrain f004a27 / v0.35.1.1 while keeping Eva as a thin fork: upstream owns the core GBrain database/search/sync/provider/media primitives; Eva preserves OpenClaw-native install, no-key OAuth extraction, support-KB packaging, Voyage 4 Large 2048d defaults, and safe public updater behavior.

flowchart LR
  U["Garry upstream GBrain v0.35.1.1"] --> C["Eva catch-up merge"]
  E["Eva product surface"] --> C
  C --> R["Reviewable Eva PR"]
  R --> Core["Upstream core improvements accepted"]
  R --> Eva["Eva OpenClaw/OAuth/install defaults preserved"]
Loading

Upstream v0.35.1.1 Fixes Accepted

  • v0.33.1.0 whoknows expertise/routing.
  • Voyage output_dimension, flexible dimension validation, and large-response OOM caps.
  • Search-lite modes, token budgets, query cache, intent weighting, and search telemetry.
  • Sync/import reliability: 100 MiB sync buffer, import checkpoint resume, cursor-paginated stale embed hardening.
  • Cathedral III code intelligence foundation and code retrieval eval surfaces.
  • MCP/source-isolation/PKCE/federated-read fix wave.
  • Supervisor watchdog exit handling.
  • ZeroEntropy embedding/reranker support and provider model discoverability.
  • LongMemEval adapter/slug/gateway-wire fix wave.

Preserved Eva Product Surface

  • OpenClaw plugin package and /plugins/gbrain/extract remain the product extraction path.
  • CodexExtractionClient, import-media, and ingest-media --extract openclaw remain intact as transitional OpenClaw adapter surfaces.
  • provider_auth, .gbrain/gbrain.env, and OpenClaw credential-source behavior stay layered on top of upstream's provider gateway.
  • Voyage 4 Large remains Eva's default install posture at 2048 dimensions.
  • Public installer/updater docs still point at electricsheephq/eva-brain, install the Codex Desktop plugin, install the OpenClaw plugin, and support the OpenClaw support KB.
  • Postinstall remains advisory only: no install-time auto-migrations or gateway restarts.

Conflict Decisions

  • Provider gateway: accepted upstream recipe/reranker/ZeroEntropy/Voyage improvements, then re-applied Eva provider_auth so OpenClaw-owned credentials still resolve before env fallback. Empty-model OpenAI-compatible providers remain available when an operator supplied a concrete model.
  • Voyage pricing: kept current Voyage 4 Large pricing at $0.12/M tokens, while preserving voyage-3-large at $0.18/M. The estimator now tests Voyage 4 Large, v4, v4-lite, and ZeroEntropy.
  • Source identity: source-aware Eva behavior wins where upstream catch-up reopened ambiguity. Bare getPage, softDeletePage, restorePage, and putRawData target default; files upsert on (source_id, storage_path); stale embed and takes paths retain explicit source filters.
  • OpenClaw install: updater doctor now sets GBRAIN_SKILLS_DIR; OpenClaw restart prefers systemd on customer hosts and falls back to openclaw gateway restart; support-KB refresh syncs/embeds only openclaw-support-kb.
  • Generated docs: regenerated llms-full.txt and fixed metric-glossary generation so git diff --check and the freshness guard agree.

Adversarial Fixes From Review

  • Federated query-cache isolation: upstream query cache had a scalar source_id identity. Eva's federated OAuth reads can search several sources while still carrying a primary sourceId, so this PR now adds the canonical source set into the cache knobs hash. That keeps federated results from replaying into later scalar/default-only searches without adding a schema migration.
  • Voyage multimodal auth: multimodal Voyage now uses the same provider-auth resolver path as text embeddings, so an OpenClaw-owned credential source wins over stray environment keys.
  • Contradiction tool locality: find_contradictions still allows default local contexts, but source-scoped/non-default contexts cannot be invoked without an authorized source.
  • CI model-command mock: the models command test mock now covers the upstream v0.35 provider exports and the new zero-network probes.
  • Search-mode CLI dispatch: gbrain search modes|stats|tune now routes to the upstream search-mode command while plain gbrain search <query> still uses keyword search.
  • v54 OAuth bootstrap wedge: upstream issue v0.35.1.0 schema-embedded.ts references oauth_clients.source_id/federated_read before v60/v61 migrations apply — wedges v54 upgrades garrytan/gbrain#1092 showed schema-embedded.ts could reference oauth_clients.source_id / federated_read before v60/v61 migrations created them. Eva now forward-bootstraps those columns in both PGLite and Postgres before replaying the embedded schema.
  • Strict MCP array schemas: upstream PR v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement garrytan/gbrain#1053's schema fix is ported in the deploy branch. ParamDef → JSON Schema mapping is centralized and recursive, so HTTP MCP, stdio MCP, and minion tool schemas all keep items.type for arrays such as extract_facts.entity_hints.
  • Remote git flag placement: upstream PR v0.35.3.0 fix wave: extract_facts items + git --no-recurse-submodules placement garrytan/gbrain#1053's git fix is ported. Global -c SSRF flags stay before clone/pull; --no-recurse-submodules now sits after the subcommand where real git accepts it.
  • Empty-slug fact extraction: upstream issue extract_facts: empty slugs array triggers full-brain walk, dead-letters autopilot-cycle garrytan/gbrain#1096 is fixed locally. extract_facts treats slugs: [] as an explicit zero-page incremental set, not as a request to walk the full brain.

Validation

Local focused validation was run from /Volumes/LEXAR/repos/eva-brain-upstream-v0.35.1.1.

  • bun install --frozen-lockfile
  • bun run build:llms
  • git diff --check
  • git diff --cached --check
  • bun run check:eval-glossary
  • bun run check:exports-count
  • bun run check:source-id-projection
  • bun run check:cli-exec
  • bun run typecheck
  • bun test test/build-llms.test.ts test/local-updater-contract.test.ts test/install-contract.test.ts test/openclaw-gbrain-plugin-contract.test.ts test/codex-extraction-client.test.ts test/embedding-pricing.test.ts
  • bun test test/engine-upsertFile.test.ts test/embed-stale-source.serial.test.ts test/put-page-namespace.test.ts test/operation-context-sourceid-required.test.ts
  • bun test test/ai/gateway.test.ts test/ai/dims-zeroentropy.test.ts test/voyage-response-cap.test.ts test/openai-compat-multimodal.test.ts test/ai/zeroentropy-recipe.test.ts test/ai/zeroentropy-compat-fetch.test.ts
  • bun test test/e2e/source-isolation-pglite.test.ts test/e2e/embed-stale-pagination.test.ts test/e2e/source-routing.test.ts
  • bun test test/eval-candidates.test.ts test/reindex-code.test.ts
  • bun test test/build-llms.test.ts test/pglite-engine.test.ts test/eval-contradictions-integrations.test.ts test/voyage-multimodal.test.ts
  • bun test test/ai/auth.serial.test.ts test/openai-compat-multimodal.test.ts test/ai/gateway.test.ts test/postgres-engine.test.ts test/engine-upsertFile.test.ts test/embed-stale-source.serial.test.ts test/openclaw-gbrain-plugin-contract.test.ts test/codex-extraction-client.test.ts test/local-updater-contract.test.ts test/install-contract.test.ts test/doctor-remote.test.ts test/skillpack-sync-guard.test.ts
  • bun test test/hybrid-search-lite.serial.test.ts test/query-cache.test.ts test/query-cache-knobs-hash.test.ts test/commands/models.serial.test.ts
  • bun test test/commands-search.test.ts test/cli.test.ts
  • bun test test/schema-bootstrap-coverage.test.ts test/mcp-tool-defs.test.ts test/git-remote.test.ts test/extract-facts-phase.test.ts
  • bun test test/brain-allowlist.test.ts test/v0_29-tool-surfaces.test.ts test/e2e/serve-http-oauth.test.ts
  • bun test test/e2e/postgres-bootstrap.test.ts (skips locally without DATABASE_URL; added Postgres regression coverage for the v54 OAuth bootstrap shape)
  • Fresh-home canary: gbrain init --pglite --embedding-model voyage:voyage-4-large --embedding-dimensions 2048 --json, gbrain import <docs> --no-embed --json, gbrain search pr102canaryneedle, gbrain search modes --json, gbrain doctor --json --fast
  • Fake-provider PGLite canary for upstream gbrain embed hangs indefinitely on PGLite - no HTTP requests sent garrytan/gbrain#1065/gbrain sync and gbrain embed ignore Voyage AI embedding config, hardcode OpenAI garrytan/gbrain#1086: initialized a temp brain with litellm:test-embed at 8 dimensions, imported one markdown file with --no-embed, ran embed --stale against a local OpenAI-compatible test server, verified one outbound embedding request, verified embed --stale --dry-run reported zero stale chunks, searched the canary phrase with --no-embed, and ran doctor --fast --json with score 95.

Full bun test / E2E / CodeQL / gitleaks should run in GitHub Actions per our local-resource policy.

Follow-Up Review Gate

The second 95% confidence review wave covered:

  1. DB/source/data-loss behavior.
  2. Provider/Voyage/cost/auth behavior.
  3. OpenClaw/media/OAuth product boundary.
  4. Install/docs/CI readiness.

The original blocking finding was federated query-cache isolation, which is fixed in this branch. The final deploy-risk wave found three more practical blockers or near-blockers in upstream open issues/PRs: garrytan#1092, garrytan#1053, and garrytan#1096. Those are also fixed in this branch. Upstream garrytan#1078/garrytan#1079 remains a conditional autopilot/dream risk, but the Eva fleet updater/support-KB path is source-scoped and does not call that generic cycle sync path.

garrytan and others added 14 commits May 12, 2026 14:33
…ity routing (garrytan#881)

* feat(v0.33): add SearchOpts.types multi-type filter to searchHybrid

Push the page-type filter into SQL via AND p.type = ANY(\$N::text[]) in
both engines' searchKeyword + searchVector + searchKeywordChunks paths.
Primary consumer is the upcoming gbrain whoknows command (filters to
['person','company']); the limit budget then goes to typed candidates
instead of being eaten by note/transcript/article pages. Future
entity-only search in v0.34+ reuses the parameter for free.

AND-applies alongside the existing single-value type filter (callers can
use either or both). HybridSearchOpts threads opts.types into the
underlying searchOpts so hybridSearch callers get the SQL-level filter
without any post-filter waste.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33): whoknows core ranking function + 10 locked unit tests

Implements ENG-D1's locked spec: score = log(1 + raw_match) ×
max(0.1, exp(-days/180)) × (0.5 + 0.5 × salience). raw_match comes
from hybridSearch's RRF + source-boost-adjusted score; salience and
recency boosts in hybridSearch are intentionally disabled so the
formula applies on a clean signal.

rankCandidates() is the pure function the eval grades against;
findExperts() is the public entrypoint that wires hybrid search +
batch salience/effective_date fetches; runWhoknows() is the CLI.

Test/whoknows.test.ts covers the 10 ENG-D3 cases (zero results,
negative recency floor, NaN salience neutral default, NaN match
zeros gracefully, type preservation, --explain factor breakdown,
top-K limit clamping, recency-floor extreme-days safety, alphabetical
tie-break determinism, public-surface contract). Plus four sanity
asserts (higher-match outranks, more-recent outranks, higher-salience
outranks, all-zero candidate appears with score 0). Plus one factor
decomposition assertion that pins the exact formula numerically.
Plus a composite-key safety case (Codex F1).

22 expect calls across 16 tests. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33): register find_experts MCP op + gbrain whoknows CLI

Wires both surfaces per ENG-D5: MCP op = find_experts (matches
find_anomalies naming convention; agent-facing); CLI command =
gbrain whoknows (memorable, user-facing). One findExperts() core
function backs both paths.

The op is scope:'read', localOnly:false — accessible over HTTP MCP
to read-scoped OAuth clients like the salience/anomalies family.
Op handler validates non-empty topic and dispatches to the same
findExperts() pure function the CLI uses.

CLI dispatch in src/cli.ts:case 'whoknows' calls runWhoknows; thin-
client routing happens inside runWhoknows via isThinClient(cfg) —
remote MCP installs route through the v0.31.1 routing seam to
callRemoteTool('find_experts', ...).

FIND_EXPERTS_DESCRIPTION in operations-descriptions.ts mirrors the
v0.29 redirect-hint style: leads with what the tool does, lists
explicit user-intent triggers ("who should I talk to about X",
"who knows about Y"), notes the type-filter behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33): gbrain eval whoknows — two-layer eval gate (ENG-D2)

Implements the locked spec: Layer 1 hand-labeled fixture (>=80% top-3
hit rate) is the primary ship-blocking gate; Layer 2 eval_candidates
replay (>=0.4 mean set-Jaccard@3) is the regression gate that
auto-skips when < 20 replay-eligible rows exist (CONTRIBUTOR_MODE
sparseness fallback).

Dispatch lands as `gbrain eval whoknows <fixture.jsonl>` sub-subcommand
in src/commands/eval.ts (mirrors v0.25.0 export/prune/replay and
v0.27.x cross-modal pattern). Exits 0/1/2 for pass/fail/usage so CI
gates can consume.

JSON output (--json) ships schema_version: 1 for stable consumer
contract (mirrors v0.25.0 eval-replay.ts). Human output groups by
layer + emits a per-miss diagnostic table so failures are
self-debugging.

Unit tests pin:
- jaccardAtK math (7 cases — identical, disjoint, partial, k cutoff,
  empty-empty vacuous-stable, empty-vs-non-empty, Set dedup)
- topKHit (7 cases — position 1, 3, 4, miss, multi-expected, empty
  actual, empty expected)
- readFixture (6 cases — well-formed, comments/blanks, missing file,
  malformed JSON, missing required fields, non-string filter)
- Locked thresholds (HIT_RATE=0.8, REGRESSION=0.4, MIN_REPLAY_ROWS=20)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33): gbrain doctor adds whoknows_health check

Per CEO-D7 (substrate-conditional v0.33 doctor check, but the
fixture-presence sub-check ships in week 1 regardless — it's the
"did you do the assignment?" signal). When the eval fixture is
missing, empty, or undersized (< 5 rows), doctor warns with the
exact path the user should populate.

The check is intentionally lightweight: it does NOT run the eval
itself or measure hit-rate regression. That's the job of `gbrain
eval whoknows`, called from CI/ship time. This check is the cheap
always-runs signal that surfaces in `gbrain doctor` and on the
ship review dashboard.

5 unit cases pin the four-status behavior (missing/empty/undersized/
ok) plus the comment-and-blank-line filtering so users can comment
out queries during iteration without breaking the row count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33): synthetic whoknows eval fixture + E2E quality gate test

test/fixtures/whoknows-eval.jsonl ships as a 10-query placeholder
demonstrating the schema. Comments document the assignment for end
users: they replace these with their own real queries before
shipping their gbrain install. The placeholder uses obviously-
example slugs (wiki/people/example-alice, etc.) so nobody mistakes
it for production data.

test/e2e/whoknows.test.ts seeds a synthetic PGLite brain that
matches the placeholder fixture, then runs findExperts on every
fixture query and asserts >=80% top-3 hit rate per ENG-D2 quality
gate. Also exercises the typeFilter (concept-decoy pages filtered
out), empty-result graceful return, --explain factor breakdown, and
top-K limit honoring.

Basis-vector embeddings (no API key) follow the existing pattern from
test/e2e/search-quality.test.ts.

5 test cases, 23 expect calls, all passing against PGLite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.33): VERSION bump + CHANGELOG + CLAUDE.md + llms regen

Bumps VERSION 0.31.11 → 0.33.0 and package.json to match. CHANGELOG
entry leads with the headline use ("ask gbrain who knows about X")
and the locked ENG-D1 ranking formula. "Numbers that matter" replaced
with a "what ships on which eval outcome" table — honest about the
eval-gated trajectory rather than fabricating benchmarks before the
release has been graded against a real brain.

CLAUDE.md Key Files annotations added for src/commands/whoknows.ts,
src/commands/eval-whoknows.ts, and test/fixtures/whoknows-eval.jsonl.
src/core/search/hybrid.ts entry extended with the new types parameter
documentation (push the type filter to SQL, no post-filter waste,
AND-applies alongside the existing single-value type field).

bun run build:llms ran the chaser; llms.txt + llms-full.txt
regenerated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.33): unit-test gap fill — engine typeFilter + find_experts op

Two new files filling the gaps Garry called out:

test/search-types-filter.test.ts — engine-level coverage on PGLite for
the new SearchOpts.types filter. Asserts the SQL-clause behavior
directly so a regression in the AND p.type = ANY(...) emission gets
caught here with a tight assertion rather than as part of a longer
findExperts pipeline. 9 cases across searchKeyword + searchVector +
chunk-grain documentation. Documents the pre-existing PGLite parity
gap (single-value `type` field is Postgres-only; `types` is the v0.33
multi-type filter that BOTH engines honor).

test/find-experts-op.test.ts — MCP-op contract test for find_experts.
Pins:
- Registered in the operations array + operationsByName
- scope: 'read', localOnly false (HTTP-MCP accessible per ENG-D5)
- Documented params (topic / limit / explain) with correct types
- cliHints.name === 'whoknows' (CLI surface bridge)
- Non-trivial description that references the use case
- Handler rejects empty / whitespace / missing topic with invalid_params
- Handler returns array shape on valid topic
- Handler honors limit param

11 op-contract cases + 9 engine-clause cases. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to v0.33.1.0

Garry asked for v0.33.1 instead of v0.33.0 (queue collision with
unrelated 0.33.0 work). 4-digit format: 0.33.1.0. CHANGELOG header
and "To take advantage of" block updated. llms.txt regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.33.1.1): cliHints.positional on find_experts so CLI accepts <topic>

Without `cliHints.positional: ['topic']`, the op-dispatch path in
src/cli.ts couldn't parse `gbrain whoknows "ai agents"` and threw
`invalid_params: topic is required`. Found while testing the v0.33.1.0
build against a real brain. The op handler validates topic; the CLI
just needed to know the positional shape so the dispatcher could
hand it through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.33.1.2): real-brain whoknows-eval fixture from VC intro network

Replaces the synthetic 10-row placeholder with 10 real expertise-routing
queries mined from Garry's actual brain via thin-client connection to
Wintermute (v0.32.2). Source: reference/vc-intro-network ("Who Takes
Intros from Garry") + adjacent routing context. All 15 unique expected
person slugs verified against ~/git/brain/people/<slug>.md source
markdown:

  people/amit-kumar          Accel partner, 102 YC deals
  people/diana-hu            YC GP
  people/elad-gil            Angel, top-rated
  people/eric-vishria        Benchmark, healthtech
  people/gokul-rajaram       Angel, 57 YC deals
  people/joff-redfern        Menlo Ventures, ex-CPO Atlassian
  people/jon-xu              YC GP
  people/kristina-shen       Chemistry, healthtech
  people/lachy-groom         Angel, 43 YC deals
  people/lee-edwards         Quiet Capital, 52 YC deals
  people/nick-shalek         Ribbit Capital, fintech
  people/nina-achadian       Index Ventures, 69 YC deals (note: slug
                              uses 'achadian' not 'achadjian')
  people/parul-singh         645 Ventures
  people/rebecca-kaden       USV
  people/trae-stephens       Founders Fund, defense/deep-tech

Eval cannot run yet against Wintermute thin-client: server is v0.32.2,
find_experts MCP op was added in v0.33. Once Wintermute upgrades the
eval will run end-to-end via the v0.31.1 thin-client routing seam.
Local eval works once the brain is indexed with find_experts available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.33.1.3): wire thin-client routing into eval-whoknows

`gbrain eval whoknows` now works against a thin-client install. When
isThinClient(cfg), each fixture query routes through the remote
find_experts MCP op via callRemoteTool — same v0.31.1 routing seam
runWhoknows already uses. Local mode unchanged: findExperts(engine, ...)
called directly.

Server prerequisite: the brain must be v0.33+ for find_experts to be
registered. Wintermute (currently v0.32.2) gets it on next upgrade and
then the eval runs end-to-end with zero client-side changes.

Mechanics:
- `WhoknowsFn` callable abstraction so the gates are impl-agnostic
- runEvalWhoknows(engine: BrainEngine | null, args) — null engine
  allowed in thin-client mode
- Regression gate auto-skips in thin-client mode (no DB access to
  eval_candidates; quality gate alone gates ship)
- cli.ts adds a thin-client bypass before connectEngine for
  `gbrain eval whoknows`, matching the longmemeval/cross-modal no-DB
  pattern

E2E test updated to use an inline synthetic fixture (the shipped
fixture is real-brain data now, doesn't match the seeded test brain).
Sanity-check the shipped fixture parses cleanly in a separate case.

Tests: 25 unit cases (+2 for null-engine signature contract) + 6 E2E
cases. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… rethrow (garrytan#962)

* fix: send Voyage output_dimension on embedding requests

* fixup: drop voyage-4-nano from flexible-dim set

Voyage's hosted /embeddings endpoint accepts `output_dimension` only for
the seven flexible-dim models (voyage-4-large, voyage-4, voyage-4-lite,
voyage-3-large, voyage-3.5, voyage-3.5-lite, voyage-code-3). voyage-4-nano
is an open-weight variant Voyage lists separately as fixed 1024-dim — the
hosted API rejects the parameter for it.

The recipe docstring previously claimed "all v4 variants" have flexible
dims, which is what led to nano being added to the allowlist in the first
place. Tighten the comment to name the hosted trio explicitly and call out
nano-as-open-weight.

Convert the test case at test/ai/gateway.test.ts from a positive assertion
(voyage-4-nano returns { dimensions: 512 }) to a negative regression pin
(voyage-4-nano returns undefined), so a future contributor can't silently
re-add nano without breaking this test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Voyage OOM-cap rethrow + flexible-dim runtime validation (Codex P3 follow-ups)

Two follow-ups from Codex's adversarial review of PR garrytan#962, both Voyage-adjacent
correctness fixes that the original PR scope had filed as TODOs.

1. gateway.ts:619 Voyage OOM cap was theatrical
-------------------------------------------------
voyageCompatFetch's inbound response rewriter is wrapped in a try/catch that
falls back to the original response on parse failure — correct for "Voyage
returned JSON I can't reshape, let the SDK handle it." But the per-embedding
Layer 2 OOM cap at line 619 threw a bare `new Error(...)`, which the same
catch silently swallowed. Net result: an oversized base64 response (Layer 1
skipped because no Content-Length header) returned through to the AI SDK and
could OOM the worker on JSON.parse.

Fix: introduce `VoyageResponseTooLargeError`, throw it at both cap sites
(Content-Length Layer 1 at line 595 and per-embedding Layer 2 at line 619),
and rethrow it from the inbound try/catch via `if (err instanceof
VoyageResponseTooLargeError) throw err`. Pre-existing fall-back-on-parse-error
behavior for other thrown errors is preserved.

Regression-pinned by 2 new behavioral tests (mock fetch returns oversized
Content-Length / oversized base64; embed() throws with the expected message)
and a structural assertion in test/voyage-response-cap.test.ts that the
`instanceof VoyageResponseTooLargeError ⇒ throw` line stays put.

2. Voyage flexible-dim runtime validation + doctor check
-------------------------------------------------------
A brain configured for a Voyage flexible-dim model (voyage-4-large,
voyage-3-large, voyage-3.5, voyage-3.5-lite, voyage-4, voyage-4-lite,
voyage-code-3) without an explicit `embedding_dimensions` would fall back to
DEFAULT_EMBEDDING_DIMENSIONS=1536 — an OpenAI default that Voyage rejects.
Voyage's only accepted values are {256, 512, 1024, 2048}. Pre-fix the failure
surfaced as an HTTP 400 from Voyage that often got misclassified as a
transient network error.

Fix:
- `dims.ts` exports `VOYAGE_VALID_OUTPUT_DIMS` and `isValidVoyageOutputDim`.
- `dimsProviderOptions` throws `AIConfigError` with a paste-ready fix command
  (`gbrain config set embedding_dimensions ...`) when a Voyage flexible-dim
  model is configured with an invalid dim value.
- `gbrain models doctor` gets a new `embedding_config` probe that runs first
  (zero tokens) and surfaces the misconfiguration before any chat/expansion
  probes spend a single token. New probe status `config` + optional `fix`
  hint rendered in human output.

Regression-pinned by 6 new unit tests covering the AIConfigError throw,
exact valid-values set, the bypass path for fixed-dim Voyage models, and
the fix-hint contents.

* chore: bump version and changelog (v0.33.1.1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.33.1.1

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tent weighting (garrytan#897)

* feat(search-lite): token budget + semantic query cache + intent weighting

Adds three additive features to the hybrid search pipeline. All
backward-compatible: existing callers see identical behavior unless they
opt in to the new options.

## 1. Token Budget Enforcement (src/core/search/token-budget.ts)

Cap the cumulative token cost of returned results so search payloads
fit downstream context windows. Greedy top-down walk; preserves caller
ordering; no re-rank. char/4 heuristic for token counting (no
tokenizer dependency \u2014 keeps the bun --compile bundle small).

  SearchOpts.tokenBudget   \u2014 numeric cap. Default undefined = no-op.
  HybridSearchMeta.token_budget = { budget, used, kept, dropped }

  HTTP query op: pass `token_budget` param.

## 2. Semantic Query Cache (src/core/search/query-cache.ts + migration v52)

Cache search results keyed by query embedding similarity. HNSW lookup:
`embedding <=> $1 < 0.08` (cosine similarity >= 0.92). Per-source
isolation so multi-source brains don\u2019t bleed. Per-row TTL (default 3600s).
Best-effort writes; all errors swallowed so the cache never breaks the
search hot path.

  Migration v52 creates query_cache table with HALFVEC where pgvector >= 0.7;
  falls back to VECTOR with the resolved config.embedding_dimensions dim.

  New `gbrain cache` CLI: stats / clear --yes / prune.
  Config keys: search.cache.enabled / similarity_threshold / ttl_seconds.

  HybridSearchMeta.cache = { status, similarity?, age_seconds? }

  Routed through new `hybridSearchCached(engine, query, opts)` wrapper;
  the operations.ts query op now uses this wrapper so MCP/CLI calls
  benefit automatically. Skipped for two-pass walks + non-default
  embedding columns where cache semantics don\u2019t hold.

## 3. Zero-LLM Intent Weighting (src/core/search/intent-weights.ts)

Builds on the existing query-intent classifier (4 intents: entity /
temporal / event / general). New weight-adjustment layer applies subtle
per-intent nudges:

  entity   \u2192 boost keyword RRF + exact slug/title match
  temporal \u2192 default recency=on when caller left it unset
  event    \u2192 boost keyword RRF (rare named entities) + soft recency
  general  \u2192 no-op (1.0 multipliers everywhere)

All adjustments are SUBTLE (max 1.25x). Caller-explicit options ALWAYS
win \u2014 intent weighting never silently overrides recency / salience.

Default ON; opt out via `opts.intentWeighting = false`. LLM query
expansion (expansion.ts) is still available and opt-in via
`opts.expansion = true` \u2014 it just isn\u2019t the default anymore.

  HybridSearchMeta.intent now surfaces classifier output for debugging.

## Tests

  test/token-budget.test.ts            (10 tests, pure module)
  test/intent-weights.test.ts          (13 tests, pure module)
  test/query-cache.test.ts             (12 tests, PGLite)
  test/hybrid-search-lite.serial.test.ts (9 tests, PGLite e2e)

Plus 105 pre-existing search tests still pass. `bun run verify` clean.

Co-authored-by: Wintermute <agents@garrytan.com>

* feat(search-mode): MODE_BUNDLES + resolveSearchMode wired into bare hybridSearch

Three named modes (conservative / balanced / tokenmax) that bundle the
search-lite knobs from PR garrytan#897 into a single config key. Mode resolution
lives in bare hybridSearch (NOT just the cached wrapper) so eval-replay
and eval-longmemeval — which call bare hybridSearch — test the same
mode-affected behavior as production. See [CDX-5+6] in the plan.

The mode bundle supplies DEFAULTS for intentWeighting, tokenBudget,
expansion, and searchLimit when the caller leaves those undefined.
Per-call SearchOpts and per-key config overrides still win (matches the
v0.31.12 model-tier resolution chain at model-config.ts:resolveModel).

knobsHash() exposes a stable SHA-256 of the resolved knob set; the cache
contamination hotfix (next commit) consumes it to prevent a tokenmax
write from being served to a conservative read.

Three new fields on HybridSearchMeta:
  - mode (resolved mode name)
  - existing token_budget meta now fires from bare hybridSearch too

Bare hybridSearch now applies tokenBudget at all three return paths
(no-embedding-provider, keyword-only-fallback, main). Previously only
hybridSearchCached enforced budget; eval commands missed it.

Tests: 37 unit cases pin the 3x7 bundle table cell-by-cell, the
resolution chain semantics, knobs hash determinism + cross-mode
separation, and the config-table parser. All 72 search-lite tests pass.

Bisect-friendly: this commit ONLY adds mode resolution. The cache-key
contamination hotfix [CDX-4] is a separate atomic commit (next).

* fix(query-cache): cross-mode contamination hotfix [CDX-4]

PR garrytan#897's query_cache keyed rows on sha256(source_id::query_text) only.
A tokenmax search (expansion=on, limit=50) populated a row that a
subsequent conservative call (no expansion, limit=10) read back, serving
the wrong-shape results. This is a real bug in PR garrytan#897 today, regardless
of the v0.32.3 mode picker work — Codex caught it in plan review.

Fix:
- Migration v56 adds query_cache.knobs_hash TEXT column + composite
  (source_id, knobs_hash, created_at) index. Existing rows have NULL
  knobs_hash and are excluded from lookups (silently re-populated with
  the right hash on first hit — no orphan data, no destructive migration).
- cacheRowId(query, source, knobsHash) — knobsHash now part of the PK so
  a tokenmax write and a conservative write for the same (query, source)
  land in distinct rows.
- SemanticQueryCache.lookup({knobsHash}) filters WHERE knobs_hash = $.
- SemanticQueryCache.store({knobsHash}) writes the resolved hash.
- hybridSearchCached threads knobsHash from resolveSearchMode through
  every cache call. Cache config (enabled/threshold/TTL) now reads from
  the resolved mode bundle, not directly from the config table.

Tests (test/query-cache-knobs-hash.test.ts, 11 cases):
- cacheRowId bifurcates by knobsHash
- Tokenmax write does NOT contaminate conservative lookup
- Three modes coexist as distinct rows for same query
- Legacy NULL-knobs_hash rows are excluded from lookup
- Same-mode write updates in place (no duplicate rows)

All 58 cache + mode tests pass. Migration v56 applies cleanly on a fresh
PGLite brain.

Bisect-friendly: this commit is the cache-key hotfix alone. Mode
resolution wiring lives in the previous commit.

* feat(search-telemetry): in-process rollup writer + search_telemetry table

Migration v57 creates search_telemetry (date, mode, intent, count,
sum_results, sum_tokens, sum_budget_dropped, cache_hit, cache_miss,
first_seen, last_seen). PK (date, mode, intent) caps growth at ~4380
rows/year. Sums + counts only — averages derive at read time so
concurrent ON CONFLICT writes from multiple gbrain processes accumulate
correctly [CDX-17].

In-memory bucket flushed periodically (60s OR 100 calls) + on process
beforeExit/SIGINT/SIGTERM with a 2-second cap. The search hot path NEVER
waits on this write [D2, CDX-19].

Date-bucketed cache_hit / cache_miss columns make hit rate over --days N
derivable [CDX-18]. query_cache.hit_count is a lifetime counter and
can't be sliced by window.

Wired into bare hybridSearch via emitMeta: every search call sync-bumps
a bucket. flush() drains atomically by swapping the map before SQL writes
so a record() during flush lands in the new map.

readSearchStats(engine, {days}) returns the StatsWindow shape that
gbrain search stats consumes (next commit).

Tests: 16 unit cases pin record/flush/read semantics including
ON-CONFLICT-adds-raw-values, concurrent-flush coalescing, cache hit-rate
math, missing-table graceful degradation, and window clamping.

53 migrations apply on a fresh PGLite brain.

* feat(config): add unset + listConfigKeys + readLineSafe helper [CDX-7+8+9]

CDX-8: gbrain config has no unset path today. Required before
`gbrain search modes --reset` can clear search.* overrides.

  - BrainEngine.unsetConfig(key) → returns rows deleted (0|1)
  - BrainEngine.listConfigKeys(prefix) → exact-literal prefix match
    with LIKE-escape on user-supplied % / _ / \ characters
  - PGLiteEngine + PostgresEngine implementations
  - `gbrain config unset <key>` and `gbrain config unset --pattern <prefix>`
    sub-subcommands

CDX-9: readLine has no EOF detection or timeout. Mode-picker plan calls
out "TTY closes mid-prompt → defaults to balanced" but the raw helper
hangs forever. New readLineSafe(prompt, defaultValue, timeoutMs=60s):

  - Returns defaultValue on stdin 'end' event
  - Returns defaultValue on timeout
  - Returns defaultValue on empty Enter
  - Non-TTY stdin returns defaultValue immediately (e2e safe)
  - Returns trimmed user input otherwise

Exported so install picker (next task) can use it.

Tests: 9 cases pin unset semantics + prefix matcher edge cases
(glob-wildcard escape, sort order, idempotent loop, search.* sweep).
All 53 migrations apply on a fresh PGLite brain.

* feat(init): install-time mode picker + upgrade banner

Install picker (src/commands/init-mode-picker.ts):
  - Runs as a phase inside `gbrain init` AFTER engine.initSchema() so DB
    config writes work [CDX-7].
  - Idempotent: skipped on re-init if search.mode is already set.
  - Smart auto-suggestion via recommendModeFor() reads
    models.tier.subagent / models.default / OPENAI_API_KEY:
      * Opus default/subagent → tokenmax (quality ceiling)
      * Haiku subagent → conservative (4K budget keeps cost down)
      * No OpenAI key → conservative (no LLM expansion possible)
      * Sonnet / unknown → balanced (safe default)
  - TTY shows menu via readLineSafe (60s timeout, defaults on EOF/empty).
  - Non-TTY auto-selects + emits operator hint:
      [gbrain] search mode: X (auto-selected — reason)
      [gbrain] To change: gbrain config set search.mode <...>
  - --json mode emits structured `{phase: 'search_mode_picker', ...}` event.
  - Wired into both initPGLite and initPostgres flows.

Upgrade banner (src/commands/upgrade.ts):
  - One-shot stderr banner in runPostUpgrade.
  - State persisted via config key `search.mode_upgrade_notice_shown=true`
    — fires at most once per install.
  - Copy corrected per [CDX-1+2+3]: production query op STILL defaults
    expand=true and limit=20. The banner reframes from "behavior is
    regressing" to "named modes available + here's how to preserve
    exact current shape."

Tests (test/init-mode-picker.test.ts, 16 cases):
  - recommendModeFor heuristic for all 4 input shapes
  - parseModeInput accepts numeric/named/case-insensitive, rejects garbage
  - runModePicker non-TTY auto-selects + writes config
  - Idempotent + --force re-prompt + JSON output
  - Opus → tokenmax, Haiku → conservative real wiring through engine

* feat(cli): gbrain search modes/stats/tune command

Three sub-subcommands mirroring the gbrain models (v0.31.12) shape:

  gbrain search modes [--json]
    Read-only routing dashboard. Shows the three mode bundles, the active
    mode, and the source of every resolved knob:
      cache_enabled = true   [override: search.cache.enabled]
      tokenBudget   = 4000   [mode: conservative]
    Plus knob descriptions for legibility.

  gbrain search modes --reset [--source <mode>]
    Clears every search.* override (NOT search.mode itself). Preserves
    the upgrade-notice state key. --source <mode> is a dry-run that
    lists what --reset would change without writing — the paved path
    [CDX-8] flagged as missing.

  gbrain search stats [--days N] [--json]
    Observability. Reads the search_telemetry rollup over the window
    (clamps to [1, 365]). Prints cache hit rate, mode mix, intent mix,
    budget drops, avg results/tokens. JSON output includes
    _meta.metric_glossary block per [CDX-25].

  gbrain search tune [--apply] [--json]
    Recommendation engine. 5 rules cover the bug class:
      - Insufficient data → "no_recommendations" status
      - Conservative + high budget-drop rate → suggest balanced
      - High cache hit rate (>85%) → suggest similarity threshold bump
      - Tokenmax + Haiku subagent → suggest balanced (cost mismatch)
      - Cache disabled but stats show usage → suggest re-enabling
    --apply mutates config via setConfig / unsetConfig with a paste-ready
    revert command printed at the end.

Registered in src/cli.ts dispatch table. 17 unit cases pin:
  - Dashboard report shape + per-knob source attribution
  - --reset preserves search.mode + notice key
  - --source dry-run never writes
  - stats reads telemetry rollup; --days clamps
  - tune recommendation rules fire on real telemetry data
  - --apply mutates config
  - --help + unknown subcommand exit codes

* feat(eval): metric glossary module + auto-gen METRIC_GLOSSARY.md + CI guard

Single source of truth at src/core/eval/metric-glossary.ts. Every entry
carries 3 fields:
  - industry_term (canonical IR/NLP literature name, preserved verbatim)
  - eli10 (plain-English a 16-year-old can follow)
  - range (numeric range + interpretation)

Covers 4 metric families:
  - Retrieval: P@k, R@k, MRR, nDCG@k
  - Stability: Jaccard@k, top-1 stability
  - Statistical: p-value (paired bootstrap + Bonferroni), 95% CI
  - Operational: cache hit rate, avg results/tokens, cost per query, p99 latency

Public surface:
  - getMetricGloss(metric) → full entry or null
  - eli10For(metric) → plain-English string or null
  - buildMetricGlossaryMeta(metrics[]) → {metric → eli10} record for
    JSON `_meta.metric_glossary` blocks per [CDX-25]. ONE block per
    response, NOT sibling `_gloss` fields on every metric.
  - renderMetricGlossaryMarkdown() → deterministic Markdown for the doc

Auto-generation:
  scripts/generate-metric-glossary.ts emits docs/eval/METRIC_GLOSSARY.md.
  Deterministic (same input → same bytes) so the CI guard can diff.

CI guard:
  scripts/check-eval-glossary-fresh.sh regenerates into a temp file and
  diffs against the committed doc. Out-of-date doc fails the build.
  Wired into `bun run verify` (and therefore `bun run test:full`).

Tests (test/metric-glossary.test.ts, 18 cases):
  - Every documented metric is present
  - Every entry has all 3 required fields
  - Accessors return null on unknown metrics (no throw)
  - buildMetricGlossaryMeta silently drops unknown metrics
  - renderer output is deterministic across calls
  - Renderer groups metrics into 4 sections

docs/eval/METRIC_GLOSSARY.md: 5491 bytes, 124 lines, fresh.

* feat(doctor): search_mode + eval_drift checks + drift-watch module

src/core/eval/drift-watch.ts — curated retrieval watch-list [CDX-6].
Five patterns covering the surface that actually affects retrieval quality:
  - src/core/search/      (search pipeline)
  - src/core/embedding.ts (embedding shape)
  - src/core/chunkers/    (chunk granularity)
  - src/core/ai/recipes/anthropic.ts + openai.ts (expansion + embed routing)
  - src/core/operations.ts (the query op definition)

Adding to the list is a deliberate act — requires a CHANGELOG line so
coverage grows on purpose, not by accident. Pure functions:
  - matchesWatchPattern(path) — trailing-slash = prefix, bare = equality
  - filesDriftedSince(repoRoot, sha?) — git diff --name-only wrapper
  - watchedFilesDrifted(repoRoot, sha?) — composite

src/commands/doctor.ts — two new checks.

checkSearchMode [CDX-20]: status stays 'ok' (never warns, never docks
health score). Hint in message field. Three branches:
  - unset → "search.mode is unset (using balanced fallback). Run
    `gbrain search modes` to see what is running and pick a mode."
  - mode + no overrides → "Mode: X (no per-key overrides — mode bundle
    is canonical)."
  - mode + overrides → "Mode: X with N per-key override(s) (k1, k2, …).
    To consolidate to the pure mode bundle: gbrain search modes --reset"
Upgrade-notice state key (search.mode_upgrade_notice_shown) is excluded
from the override roster — it's not a knob.

checkEvalDrift [CDX-6]: surfaces uncommitted changes to retrieval-watched
files. Always 'ok'; operator-facing reminder. Names up to 3 drifted files
in the message + paste-ready re-eval command.

Both helpers exported (was: file-private) so tests can pin behavior
without walking the full runDoctor pipeline.

Tests: 12 drift-watch cases + 7 doctor-check cases. Pin watch-list shape,
prefix-vs-equality matcher semantics, missing-repo graceful failure, and
all three search_mode branches.

* feat(eval): --mode flag on longmemeval/replay + run-all + compare

Per-mode --mode flag plumbed into:
  - gbrain eval longmemeval --mode <conservative|balanced|tokenmax>
    Sets search.mode in the benchmark brain's config table; config is
    in PRESERVE_TABLES so resetTables doesn't wipe it between questions.
    Mode surfaces in the per-question NDJSON row.
  - gbrain eval replay --mode <m> + --compare-limit N
    --compare-limit forces a constant K across modes [CDX-13]; without
    it, Jaccard@k against the captured baseline measures K-drift, not
    quality. Mode is set once before the replay loop.
  - NOT cross-modal per [CDX-11]: cross-modal scores OUTPUT against
    TASK; it doesn't retrieve. Adding --mode there is theater.

New: gbrain eval run-all orchestrator (src/commands/eval-run-all.ts):
  - Sweeps every requested mode × suite combination
  - Sequential default per D9; --parallel N opt-in (clamped to mode count)
  - Cost guard with split caps [CDX-15+16]:
      --budget-usd-retrieval N (default $5)
      --budget-usd-answer N (default $20)
    Non-TTY refuses with exit 2 unless --yes AND explicit --budget-usd-*
    flags pass. TTY refuses without --yes (defense against agent loops).
  - estimateRunCost computes per-(suite,mode) breakdown including the
    expansion-Haiku surcharge for tokenmax.
  - Audit trail: appends to <repo>/.gbrain-evals/eval-results.jsonl
    [CDX-23]. Personal brain (~/.gbrain) NEVER touched.
  - v0.32.3 ships orchestrator + argv + guard + persist hook.
    In-process per-suite invocation is a v0.32.4 follow-up (operator
    runs the per-suite CLIs with the documented --mode flag for now;
    each completion calls persistRunRecord to log).

New: gbrain eval compare report (src/commands/eval-compare.ts):
  - Reads eval-results.jsonl, groups by (suite, mode), renders MD or JSON
  - Most-recent (suite, mode, commit) wins when duplicates exist
  - JSON output has schema_version=2 + _meta.metric_glossary block per
    [CDX-25] (ONE block per response, not sibling _gloss fields)
  - _meta.methodology field names the paired-bootstrap + Bonferroni
    discipline per [CDX-14] so haters can reproduce
  - Missing file → friendly hint pointing at `gbrain eval run-all`

Wired into eval dispatch table in src/commands/eval.ts.

Metric glossary fuzzy fallback: `recall@10` → `recall@k` lookup
(the glossary documents the family; report rows carry specific K
values). Routes through getMetricGloss for every call site.

Tests (42 cases total — all green):
  - eval-run-all.test.ts (19): argv parser, cost estimate, guard
    semantics for all 4 (over/under × tty/non-tty) shapes, persist hook
    NDJSON shape.
  - eval-compare.test.ts (5): JSON + MD output shapes, glossary
    integration, missing-file graceful, mode filter, most-recent-wins.
  - metric-glossary.test.ts (18): unchanged but updated assertions to
    cover the fuzzy `@N` → `@k` fallback.

Pre-existing eval-replay / eval-longmemeval / eval-export / eval-prune
tests (42 cases) still pass — --mode + --compare-limit are additive.

* docs: methodology + CLAUDE.md/README/RESOLVER + skills/conventions

docs/eval/SEARCH_MODE_METHODOLOGY.md — haters-immune 8-section template.
Documents what the eval measures + does NOT measure, datasets + sizes
(LongMemEval n=500, Replay n=200, BrainBench n=1240 docs / 350 qrels),
random seed 42, run procedure verbatim, threats to validity (LongMemEval
English+technical skew, char/4 heuristic ~5-10% off, expansion ~97.6%
relative lift on this corpus), per-question raw outputs, pre-registered
expectations (tokenmax wins R@10 by 5-15pp, conservative wins cost by
5-15x, balanced lands within 3pp), re-run cadence anchored to the
src/core/eval/drift-watch.ts watch-list.

Statistical-significance section pins paired bootstrap with 10,000
resamples + Bonferroni correction across 3 modes × 4 metrics [CDX-14].

CLAUDE.md gets two new sections: ## Search Mode (3-mode table + resolution
chain + [CDX-4] cache contamination fix note + CLI commands) and ## Eval
discipline (single-source-of-truth glossary, methodology doc, eval_results
in repo NOT personal brain per [CDX-23]).

README.md Quick Start gets a paragraph naming the install picker, mode
heuristic, and the methodology link.

skills/conventions/search-modes.md NEW — convention file consumed by
brain-ops + query + signal-detector skills via the existing
`> **Convention:**` callout pattern. Routes "what mode" / "tune
retrieval" / "compare modes" queries to the right CLI surface.

skills/RESOLVER.md gets two new trigger rows pointing at
gbrain search * and gbrain eval compare.

* chore: regen llms.txt + llms-full.txt for v0.32.3 search-mode docs

bun run build:llms — picks up the new CLAUDE.md sections (Search Mode +
Eval discipline) and the docs/eval/SEARCH_MODE_METHODOLOGY.md addition.
build-llms.test.ts gate now passes.

* fix(doctor): wire search_mode + eval_drift checks into runDoctor main flow

The v0.32.3 search_mode + eval_drift helpers were inserted into the
DB-checks sub-helper at runDbChecks (line 345-355), but runDoctor itself
maintains its own check list and only calls the helpers' subset. Push
the two checks into the main runDoctor path (after the existing
sync_freshness check at line 2347) so they actually appear in
`gbrain doctor --json` output.

Both checks gated on engine !== null. Progress reporter heartbeat fires
for each. Both still return status 'ok' per [CDX-20] so health score is
preserved.

Verified end-to-end on a real Postgres brain: gbrain doctor --json now
includes 'search_mode' and 'eval_drift' in the checks array.

* fix: claw-test hang — DATABASE_URL leak + telemetry beforeExit deadlock

Two root causes for the hang, both fixed.

1. DATABASE_URL leak in claw-test scripted harness
   The harness inherits the parent process's env via `...process.env`
   for every phase child (init / import / query / extract / doctor).
   When the e2e runner sets DATABASE_URL (for OTHER e2e tests), it
   leaks into claw-test's children. `loadConfig` at src/core/config.ts:143
   then flips inferredEngine to 'postgres' for every subsequent phase,
   breaking the hermetic-PGLite-tempdir contract: phases race against
   each other on a shared test Postgres while pointing at different
   brain states.

   Fix: strip DATABASE_URL + GBRAIN_DATABASE_URL from the child env
   before forwarding. Re-apply GBRAIN_HOME / GBRAIN_FRICTION_RUN_ID
   after the merge so a parent's override can't win. The harness is
   PGLite-only by design.

2. Telemetry beforeExit deadlock
   v0.32.3's recordSearchTelemetry installed a `process.on('beforeExit',
   drainOnExit)` hook that wrapped the flush in `Promise.race([flush(),
   setTimeout(2000)])`. beforeExit fires when the event loop empties,
   but the hook enqueued NEW async work (the race's setTimeout +
   pending flush), so the event loop never re-emptied. Short-lived
   CLI invocations (`gbrain query "the"` finishing in ~100ms) ended
   up waiting on the DB write indefinitely.

   The claw-test harness spawns several short-lived gbrain queries.
   Each one hung after its real work finished. The harness then waited
   forever on its child subprocess's exit code.

   Fix: drop the beforeExit + SIGINT + SIGTERM hooks. Per [CDX-19]'s
   "stats are directional, not exact" contract, losing one unflushed
   bucket on process exit is acceptable. The unref'd setInterval
   handles long-running processes (HTTP MCP, autopilot, jobs work).
   Short-lived CLI invocations exit immediately.

Verified:
  - `gbrain query "the"` on a fresh PGLite brain exits in <1s (was
    hanging forever).
  - `bun test test/e2e/claw-test.test.ts` → 3 pass / 0 fail / 3.86s
    (was hanging at the banner indefinitely).
  - 85/85 e2e files / 574/574 tests pass including claw-test, with
    DATABASE_URL set (the configuration that originally repro'd the
    hang).
  - 6235/6235 unit tests pass.
  - Typecheck clean.

The two bugs interacted: the DATABASE_URL leak meant queries hit the
real Postgres (slow), making the beforeExit deadlock visible. Fixing
either alone would have masked the other. Both fixed in this commit.

* feat(install-picker): cost anchors in mode prompt + upgrade banner + docs

The install picker already asks explicitly (1/2/3 menu, default to the
recommendation on Enter). What was missing: a way to reason about the
cost tradeoff. Without numbers, "tokenmax" looks free and "conservative"
sounds restrictive; with numbers, the operator picks intentionally.

Cost anchors added everywhere the user encounters the mode choice:
  - Install picker MENU_TEXT (gbrain init)
  - Upgrade banner (gbrain upgrade post-upgrade)
  - CLAUDE.md ## Search Mode section
  - README.md Quick Start
  - docs/eval/SEARCH_MODE_METHODOLOGY.md (with the math)

Anchors at Sonnet 4.6 downstream ($3/M input):
  conservative  ~$0.012/query  ~$12/mo @ 1K  ~$1,200/mo @ 100K
  balanced      ~$0.030/query  ~$30/mo @ 1K  ~$3,000/mo @ 100K
  tokenmax      ~$0.060/query  ~$60/mo @ 1K  ~$6,000/mo @ 100K

Plus tokenmax's Haiku expansion overhead: ~$1.50 per 1K queries on top.
Cache hits roughly halve these on a brain with repeat-query traffic.

The math is documented in SEARCH_MODE_METHODOLOGY.md so a reviewer can
audit each variable (T = ~400 tokens/chunk from the recursive chunker's
300-word target; N = `searchLimit` cap; R = downstream model rate from
src/core/anthropic-pricing.ts). Drift away from these numbers requires
updating CLAUDE.md + the picker + the methodology doc in lockstep — a
regression test pins the picker's anchor strings to enforce this.

The framing also names the cost rule honestly: the dominant cost isn't
gbrain (semantic cache is free; Haiku expansion is rounding-error). It's
the downstream agent reading retrieved chunks back into its context.
Operators who don't realize this pick badly.

Tests: 5 new regression cases in init-mode-picker.test.ts pin every
cost string in MENU_TEXT. Total 21/21 picker tests pass; 6240/6240
unit tests pass; verify gate green.

* docs: realistic-scale cost anchor for search modes

The per-query cost framing in the picker (~$0.012/$0.030/$0.060) is
honest but theoretical — it treats each search as an isolated billable
event. Real agent loops amortize a lot of context across turns via
Anthropic prompt caching, so the per-query 5x ratio doesn't translate
1:1 into total agent spend.

Added a "Realistic-scale anchor" section to SEARCH_MODE_METHODOLOGY.md
representing one heavy power-user agent loop running tokenmax:

  - ~860 turns/mo (~29/day, one active agent)
  - ~900K tokens/turn (system + tools + history + reasoning + search)
  - ~$0.85/turn → ~$700/mo total agent spend at tokenmax
  - ~88% Anthropic prompt-cache hit rate

Scaling balanced + conservative DOWN from that anchor:

  - tokenmax  → ~$700/mo, search ~22% of total spend
  - balanced  → ~$620/mo, search ~12% (saves ~$78/mo vs tokenmax)
  - conservative → ~$575/mo, search ~5% (saves ~$124/mo vs tokenmax)

Honest takeaway: at realistic agent-loop scale WITH disciplined prompt
caching, mode choice saves 10-20% of total agent spend, not 5x. The
per-query math kicks back in for setups WITHOUT cache discipline (churn
the prompt prefix every turn → search payload becomes a larger fraction).
Both framings live in the doc.

CLAUDE.md ## Search Mode gets a forward-pointer paragraph naming the
"per-query math vs real-world spend" delta so agents reading the section
find the methodology footnote.

Numbers in the doc are anonymized + scaled away from any specific
deployment. No model names, no specific dollar figures from a real
production setup — just the per-turn / cache-hit-rate / search-count
shape ratios that a thoughtful operator can validate against their own
billing dashboard.

* feat(picker): mode × model cost matrix (25x corner-to-corner spread)

Previous version showed mode costs assuming Sonnet-only downstream.
That muted the spread to 5x and made mode choice look minor. Reality:
the downstream model tier is the BIGGER cost lever — pairing mode with
model is where the 25x spread lives.

New 3×3 matrix in the install picker, CLAUDE.md, methodology doc, README:

                  Haiku 4.5     Sonnet 4.6    Opus 4.7
                  ($1/M input)  ($3/M input)  ($5/M input)
  conservative    $400/mo       $1,200/mo     $2,000/mo
  balanced        $1,000/mo     $3,000/mo     $5,000/mo
  tokenmax        $2,000/mo     $6,000/mo     $10,000/mo

(per-query cost @ 100K queries/mo, full search payload, no cache savings)

The methodology doc gets a new "Mode × Model matrix" section above the
realistic-scale anchor with concrete right-sizing guidance:

  - tokenmax + Haiku: wrong direction. Haiku can't filter 50 chunks → noise
    not signal. Pay Haiku rates, get sub-Haiku quality.
  - conservative + Opus: wasted Opus. 200K context window starved on
    retrieval depth. Pay Opus rates, get conservative-shape retrieval.
  - Natural pairings span ~4x; the matrix corners span 25x. The natural
    diagonal is where most users should land.

Realistic-scale anchor refreshed:
  - tokenmax + Opus: ~$700/mo at 860 turns
  - balanced + Sonnet: ~$430/mo
  - conservative + Haiku: ~$170/mo

Plus a "mismatched pairings" section showing the math for tokenmax+Haiku
and conservative+Opus — both burn budget for no improvement.

Regression test updated: pins the 25x framing + the four anchor cells
(two corners + two diagonal mids) + the three downstream model rates.

22/22 picker tests pass. 6241/6241 unit tests pass. CI guards green.

* docs(picker): rescale cost matrix from 100K → 10K queries/mo (typical single user)

Most users running gbrain are single-user installs at ~10K queries/month,
not the 100K fleet-scale used in the original matrix. The picker numbers
($400 to $10,000/mo) looked alien to the actual audience. Rescaled to
10K with an explicit linear-scaling callout.

New matrix in picker, CLAUDE.md, README, methodology doc:

                  Haiku 4.5     Sonnet 4.6    Opus 4.7
                  ($1/M)        ($3/M)        ($5/M)
  conservative    $40/mo        $120/mo       $200/mo
  balanced        $100/mo       $300/mo       $500/mo
  tokenmax        $200/mo       $600/mo       $1,000/mo

Still 25x corner-to-corner. Still 4x natural-diagonal spread. But now in
numbers a single user picks up and reasons about: "balanced + Sonnet at
$300/mo, that's fine" or "tokenmax + Opus at $1,000/mo, that's a
deliberate choice for max-quality high-stakes work."

Every surface updated:
  - Install picker MENU_TEXT (with "scales linearly — multiply by 10
    for 100K/mo" footnote so heavier users still see their number)
  - CLAUDE.md ## Search Mode table + scaling prose
  - README Quick Start
  - methodology doc Mode × Model matrix section
  - upgrade banner (post-upgrade notice)

Regression test updated: pins the 3 new anchor cells ($40, $300, $1,000)
+ the 10K/mo volume frame + the linear-scaling callout. 23/23 picker
tests pass, 6241/6241 unit tests pass, verify gate green.

Methodology doc's existing 1K/10K/100K Monthly cost breakdown tables
left intact (they already show the linear scaling explicitly).

* feat(picker): agent-facing install protocol + tokenmax default + [AGENT] directive

DX gap: an agent installing gbrain (OpenClaw, Hermes, Codex, Cursor) ran
gbrain init non-TTY, saw 2 stderr lines flash by, and silently auto-applied
a default search mode. The operator never saw the cost matrix or the choice.
At 25x corner-to-corner cost spread, that's surprise-spend territory.

Five surfaces fixed:

1. **Auto-suggest default flipped balanced → tokenmax.** The Sonnet/unknown
   fallback now recommends tokenmax (preserves v0.31.x retrieval shape:
   expand=on, generous result set). Haiku subagent → conservative still
   wins (cost-sensitive signal). No-OpenAI-key → conservative still wins
   (vector search not possible). Heuristic reordered: Haiku check now
   fires BEFORE the Opus check, because a Haiku subagent loop signalling
   cost sensitivity should win over a default-model heuristic.

2. **gbrain init non-TTY output rebuilt.** Previously: 2 stderr lines.
   Now: the full 3×3 cost matrix + an explicit [AGENT] directive block
   telling the agent to relay the matrix to its operator before
   continuing. Includes a pointer to INSTALL_FOR_AGENTS.md Step 3.5 for
   the full protocol.

3. **gbrain upgrade banner same treatment.** Existing v0.32.3 banner now
   includes [AGENT] directive at the top so upgrading agents relay the
   matrix to their operator instead of silently accepting v0.31.x →
   v0.32.x default-applied behavior.

4. **INSTALL_FOR_AGENTS.md Step 3.5 NEW** with the matrix verbatim, the
   exact paraphrasable ask-the-user wording, and the gbrain config set
   commands to run after the operator picks. Plus a paragraph in the
   Upgrade section pointing back at Step 3.5.

5. **AGENTS.md install checklist** gets a new Step 4 ("STOP — ask the
   user about search mode") between init and the rest of the flow. The
   agent's job description now explicitly says: silent acceptance is
   the wrong default.

Tests (24/24 pass):
  - Updated recommendModeFor heuristic order (Haiku floor > Opus default)
  - New regression test: non-TTY output contains the matrix corners +
    [AGENT] directive + INSTALL_FOR_AGENTS.md pointer
  - withEnv() helper used for OPENAI_API_KEY mutation (test-isolation lint)
  - Default-recommendation tests updated: Sonnet / unknown → tokenmax

Privacy + test-isolation gates clean. 6256/6256 unit tests pass.

---------

Co-authored-by: garrytan-agents <agents@garrytan.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
…garrytan#982)

Node's default maxBuffer for execFileSync is 1 MiB. On repos with
60-100K files, `git diff --name-status -M` output easily exceeds this,
causing the sync process to die silently with no error in the log.

Observed at /data/brain (99K files, 62K in git ls-files): sync
consistently died during the rename-detection phase at ~15% through
`buildSyncManifest()`. No stack trace, no error event — just a dead
process. The fix survived 5+ full syncs on the same corpus.

100 MiB is generous but bounded. A 100K-file diff with long paths
tops out around 10-20 MiB in practice.

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
* docs(CLAUDE.md): add workflow for fork PRs from garrytan-agents

Fork PRs from non-collaborator accounts don't receive base-repo secrets on
pull_request events, so CI jobs needing ANTHROPIC_API_KEY / OPENAI_API_KEY
fail with empty-env auth errors. Document the move-branch-to-base-repo
workflow as the narrow-scope alternative to adding the account as a
collaborator or flipping the repo-wide fork-secret toggle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.33.3.1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebump to v0.33.2.1

Per user direction: ship as v0.33.2.1 instead of v0.33.3.1.
0.33.2.x is unclaimed in the queue (PR garrytan#934 holds 0.33.3.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c + W3) (garrytan#934)

* feat(v0.34 pre-w0): add code-retrieval eval harness for v0.34 ship gate

Captures pre-v0.34 retrieval quality on the gbrain self-corpus before any
code-intel work lands, so the v0.34 ship gate (precision@5 +10pp OR
answered_rate +15pp on >=15/30 questions) measures real improvement
rather than an after-the-fact retuned baseline.

* src/eval/code-retrieval/harness.ts -- pure-function metrics (precision@k,
  recall@k, top-1 stability, gate evaluator) + EvalRunReport types stable
  across schema_version 1
* src/eval/code-retrieval/questions.json -- 30 questions across callers /
  callees / definition / references / blast_radius / execution_flow /
  cluster_membership kinds, expected_files captured against current
  gbrain layout
* src/eval/code-retrieval/strategies.ts -- BaselineStrategy (hybridSearch)
  + WithCodeIntelStrategy stub (post-W3 fills in code_blast/code_flow/etc.)
* src/commands/eval-code-retrieval.ts -- gbrain eval code-retrieval CLI
  with --baseline / --with-code-intel / --compare subcommands
* test/code-retrieval-harness.test.ts -- 26 unit tests across metrics,
  loader, gate logic; no engine dependency

PRE-V0.34 BASELINE WORKFLOW:
  gbrain eval code-retrieval --baseline --save /tmp/baseline-1.json
  (run 3x for noise floor)

V0.34 SHIP GATE (after W3 lands):
  gbrain eval code-retrieval --with-code-intel --save /tmp/v034.json
  gbrain eval code-retrieval --compare /tmp/baseline-1.json /tmp/v034.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0a): source-routing leak across query + two-pass

Codex outside-voice review on the v0.34 plan caught two load-bearing
sites where sourceId was advertised but never applied — multi-source
brains silently cross-contaminated structural retrieval:

* operations.ts ~323 — `query` op handler called hybridSearch without
  threading ctx.sourceId. Multi-source agents querying with a
  --source flag got cross-source results.
* two-pass.ts:81 (nearSymbol lookup) and two-pass.ts:131 (unresolved
  edge resolution) — TwoPassOpts.sourceId was declared and threaded
  through hybridSearch's expandAnchors call, but the actual SQL ignored
  it. The walk window crossed source boundaries every time.

Fix:
* `query` op now reads ctx.sourceId AND accepts a new `source_id`
  param (with '__all__' as the explicit force-cross-source escape
  hatch). Per-call param wins over ctx context.
* two-pass.ts both lookups join through pages.source_id when
  opts.sourceId is set; omitted opts.sourceId preserves the legacy
  cross-source contract for callers who want it.

Regression test: test/e2e/source-routing.test.ts seeds two sources
with the same `parseMarkdown` symbol + a cross-source caller edge.
Pins:
  - nearSymbol + sourceId='source-a' returns ONLY source-a chunks
  - nearSymbol + sourceId='source-b' returns ONLY source-b chunks
  - nearSymbol with no sourceId still crosses sources (contract preserved)
  - walk_depth=1 unresolved-edge resolution stays in source-a

PGLite in-memory, no DATABASE_URL needed. The fix proves out under
realistic structural retrieval not just a contrived unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0b): flip CLI source-scoping default to truly source-scoped

Codex outside-voice review (finding #7) caught that the v0.20.0
docstring claim "by default we only match the caller's source_id"
contradicted the implementation in code-callers.ts:54 + code-callees.ts:43:

  allSources: allSources || !sourceId

The right side made `allSources` TRUE whenever `--source` was omitted,
INVERTING the documented default. Multi-source brains silently cross-
contaminated structural retrieval; `gbrain code-callers parseMarkdown`
on a brain with two repos returned callers from both even though the
docstring promised per-source scoping.

Fix:
* New canonical helper `resolveDefaultSource(engine)` in sources-ops.ts.
  Contract per eng review D7:
    - exactly 1 source registered → return its id (single-source brains,
      the 80% case; --source flag is unnecessary friction there)
    - 2+ sources → throw SourceResolutionError(multiple_sources_ambiguous)
      with the list of valid ids
    - 0 sources → throw SourceResolutionError(no_sources)
* code-callers.ts + code-callees.ts now resolve to the default source
  when both --source AND --all-sources are absent. To get the pre-v0.34
  cross-source behavior, callers must pass --all-sources explicitly.
* Same hint text on both commands. Pinned by test/e2e/cli-source-scoping-pglite.test.ts.

IRON RULE regression R2: docstring promise now holds. Multi-source brain
running `gbrain code-callers <symbol>` without --source gets a clear
error listing valid source ids instead of silent cross-resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): within-file two-pass symbol resolver + edges_backfilled_at watermark

Codex's outside-voice review caught that the v0.20.0 graph stores BARE
callee tokens (`render`, `find`, `execute`) — not qualified names. Pre-v0.34
recursive blast/flow would alias every same-named function across classes.
W0c is the foundation that fixes this: resolve `code_edges_symbol` rows by
matching `to_symbol_qualified` against the SAME-FILE chunks'
`symbol_name_qualified`, then write the outcome to `edge_metadata`.

This commit is the resolver primitive + schema. The cycle-phase wiring
that calls it on every quick-cycle tick lands in the next commit.

Schema (v51 migration `edges_backfilled_at_v0_34`):
* `content_chunks.edges_backfilled_at TIMESTAMPTZ` — resume watermark.
  Chunks where the column is NULL OR older than EDGE_EXTRACTOR_VERSION_TS
  get re-walked next tick. SIGINT/OOM/sleep mid-backfill loses at most
  one batch.
* Indexes per D11 from eng review:
  - `idx_code_edges_symbol_resolver(source_id, to_symbol_qualified)` —
    composite for the resolver's per-source lookup.
  - `idx_content_chunks_symbol_lookup(page_id, symbol_name_qualified)`
    WHERE `symbol_name_qualified IS NOT NULL` — file-batched candidate
    fetch; also reused by W4-5 cluster recompute.
  - `idx_content_chunks_edges_backfill(edges_backfilled_at)` WHERE
    `edges_backfilled_at IS NULL` — fast unresumed-row scan.

Module (`src/core/chunkers/symbol-resolver.ts`):
* `resolveSymbolEdgesIncremental(engine, {sourceId, maxChunks?, onProgress?})`
  walks stale chunks in 200-chunk batches. For each chunk, loads its
  unresolved edges, finds same-page candidates by symbol_name_qualified,
  and writes outcome to `edge_metadata`:
   - exactly 1 candidate → `{resolved_chunk_id: <id>}`
   - 2+ candidates → `{ambiguous: true, candidates: [...]}`
   - 0 candidates → unchanged (cross-file; two-pass.ts handles those)
  Each batch bumps `edges_backfilled_at = NOW()` for the chunks.
* `readEdgeResolution(metadata)` — public helper for downstream code
  (two-pass.ts, code_blast op, eval-capture) to consume the resolver's
  output without parsing JSON directly. Returns a tagged union.
* `EDGE_EXTRACTOR_VERSION_TS` exported constant — bump when extractor
  shape changes and the next cycle re-walks all chunks.

Tests (5 E2E in test/e2e/symbol-resolver-pglite.test.ts, all PGLite,
no DATABASE_URL): unambiguous match, ambiguous multi-match, no match,
watermark advance + idempotency, source isolation (no cross-source
candidate leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): wire resolve_symbol_edges as a new cycle phase

W0c's symbol resolver lands as a 12th cycle phase between extract and
patterns. The autopilot's quick-cycle path (60s watchdog interval per
D2 from eng review) now resolves stale chunks incrementally so agents
see resolved edges within ~60s of writes rather than waiting on the
slow full-walk path.

* CyclePhase + ALL_PHASES + NEEDS_LOCK_PHASES extended with
  'resolve_symbol_edges'. Position: between extract (which emits new
  bare-token edges from sync diffs) and patterns (which reads the
  graph). Acquires the cycle lock because it writes edge_metadata.
* CycleReport.totals adds edges_resolved + edges_ambiguous so doctor
  and autopilot summaries surface the numbers.
* runPhaseResolveSymbolEdges walks every registered source via
  listSources() + resolveSymbolEdgesIncremental(). Per-call cap is
  BATCH_SIZE*10 = 2000 chunks so a single watchdog tick stays bounded
  even on a 100K-chunk brain. Subsequent ticks pick up the leftovers
  via the edges_backfilled_at watermark.
* Test count bumped from 11 → 12 phases in cycle.serial.test.ts and
  cycle.test.ts (both pinned by the regression guards). Existing 28
  cycle tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3): MCP-expose code_callers / code_callees / code_def / code_refs

Pre-v0.34 these four code-intelligence commands lived in CLI_ONLY at
cli.ts:30 — agents calling gbrain via MCP couldn't reach them and fell
through to text search. This commit ships the agent-facing MCP surface
for v0.34 against the existing v0.20+ tree-sitter call graph; recursive
blast/flow and clusters land in subsequent commits.

* `code_callers(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCallersOf. Reverse view of the A1 call graph.
* `code_callees(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCalleesOf. Forward view.
* `code_def(symbol, [limit, lang])` — wraps findCodeDef. Returns
  definition sites with file/line/snippet.
* `code_refs(symbol, [limit, lang])` — wraps findCodeRefs. Returns
  every reference (comments, strings, imports, call sites).

All four are scope:'read', source-scoped by default via ctx.sourceId
(W0a contract). Per-call source_id param wins over ctx; pass '__all__'
or all_sources=true to force cross-source.

* operations-descriptions.ts: 4 new constants per the eng review D10
  finding — every description carries an inline example response so
  agents don't burn first-call context discovering shape. Resolver-grade
  wording ("BEFORE editing any function, run code_callers...") routes
  plan-mode questions straight to the right op.
* SEARCH_DESCRIPTION gains a cross-link clause pointing at the four new
  ops so agents stop falling through to text search for code-symbol
  questions.

Tests (11 E2E in test/e2e/code-intel-mcp-ops-pglite.test.ts):
  - All four ops registered + scope:read + description pinned by constant
  - All four ops have required symbol param
  - code_callers / code_callees return the documented envelope shape
  - Source scoping honors ctx.sourceId
  - all_sources=true / source_id='__all__' force cross-source
  - code_def returns the def-site snippet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.33.0): agent-readable migration doc for the code-intel foundation

skills/migrations/v0.33.0.md gives existing-user upgrade guidance for the
v0.33.0 foundation pre-release (this branch's accumulated work toward
v0.34 Cathedral III):

* Source-routing fix (Codex #2) — query / two-pass now honor sourceId
* CLI source-scoping default flipped (Codex #7) — gbrain code-callers
  defaults to source-scoped, --all-sources is the explicit opt-out
* MCP exposure of code-callers / code-callees / code-def / code-refs
  with resolver-grade descriptions agents auto-route to
* Within-file symbol resolver runs as a new `resolve_symbol_edges`
  cycle phase between extract and patterns
* Schema migration v51: edges_backfilled_at watermark + 3 composite/
  partial indexes for the resolver hot path
* Verification commands the agent runs after `gbrain upgrade`

Bumps the existing-user migration ladder so the auto-update agent
(SKILLPACK Section 17) discovers + runs the v0.33.0 migration steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.33.0): bump VERSION + package.json + CHANGELOG

v0.33.0 ships the v0.34 Cathedral III foundation: MCP exposure of
code_callers / code_callees / code_def / code_refs with resolver-grade
tool descriptions, plus the source-routing fix + within-file symbol
resolver + cycle-phase wiring that v0.34's recursive blast/flow and
Leiden clusters will build on.

Full release notes in CHANGELOG.md. Trio in lockstep:
  VERSION:      0.33.0
  package.json: 0.33.0
  CHANGELOG.md: ## [0.33.0] - 2026-05-11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.33.0): update dream-cycle phase-order assertions for resolve_symbol_edges

E2E test pinned the canonical phase sequence as a regression guard. The
v0.33.0 resolve_symbol_edges phase (added between extract and patterns)
correctly bumps the count to 12 — caught by the canonical-order test on
fresh-Postgres run, fixed by adding the new phase to EXPECTED_PHASES
and bumping the version history comment.

Both cycle.serial.test.ts and cycle.test.ts were already updated in the
W0c cycle-phase commit (6f7dbe1); this third pin lives in
test/e2e/dream-cycle-phase-order-pglite.test.ts and was missed.

Full E2E suite now: 550 passed / 0 failed / 81 files (real Postgres on
port 5435 via Docker pgvector/pgvector:pg16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.33.3.0): rebump from v0.33.2.0 → v0.33.3.0

User asked to ship as v0.33.3.0 instead of v0.33.2.0. Single sweep:

* VERSION + package.json bumped to 0.33.3.0
* CHANGELOG header + body rewritten to v0.33.3
* skills/migrations/v0.33.0.md → skills/migrations/v0.33.3.0.md
  (migration files use the version they ship FROM; renaming aligns with
  the v0.21.0.md / v0.31.0.md convention in CLAUDE.md)
* Schema migration name edges_backfilled_at_v0_33_2 →
  edges_backfilled_at_v0_33_3 in src/core/migrate.ts (also bumps the
  in-code identifier so the registry name matches the version)
* All v0.33.2 comment references swept to v0.33.3 in cycle.ts,
  operations.ts, operations-descriptions.ts, eval.ts, symbol-resolver.ts
  + cycle test phase-history comments
* llms.txt + llms-full.txt regenerated

Trio verified:
  VERSION:      0.33.3.0
  package.json: 0.33.3.0
  CHANGELOG.md: ## [0.33.3.0] - 2026-05-12

bun run verify clean; 90 v0.33.3-touched tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…clusters + eval gate (garrytan#994)

* feat(v0.34 pre-w0): add code-retrieval eval harness for v0.34 ship gate

Captures pre-v0.34 retrieval quality on the gbrain self-corpus before any
code-intel work lands, so the v0.34 ship gate (precision@5 +10pp OR
answered_rate +15pp on >=15/30 questions) measures real improvement
rather than an after-the-fact retuned baseline.

* src/eval/code-retrieval/harness.ts -- pure-function metrics (precision@k,
  recall@k, top-1 stability, gate evaluator) + EvalRunReport types stable
  across schema_version 1
* src/eval/code-retrieval/questions.json -- 30 questions across callers /
  callees / definition / references / blast_radius / execution_flow /
  cluster_membership kinds, expected_files captured against current
  gbrain layout
* src/eval/code-retrieval/strategies.ts -- BaselineStrategy (hybridSearch)
  + WithCodeIntelStrategy stub (post-W3 fills in code_blast/code_flow/etc.)
* src/commands/eval-code-retrieval.ts -- gbrain eval code-retrieval CLI
  with --baseline / --with-code-intel / --compare subcommands
* test/code-retrieval-harness.test.ts -- 26 unit tests across metrics,
  loader, gate logic; no engine dependency

PRE-V0.34 BASELINE WORKFLOW:
  gbrain eval code-retrieval --baseline --save /tmp/baseline-1.json
  (run 3x for noise floor)

V0.34 SHIP GATE (after W3 lands):
  gbrain eval code-retrieval --with-code-intel --save /tmp/v034.json
  gbrain eval code-retrieval --compare /tmp/baseline-1.json /tmp/v034.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0a): source-routing leak across query + two-pass

Codex outside-voice review on the v0.34 plan caught two load-bearing
sites where sourceId was advertised but never applied — multi-source
brains silently cross-contaminated structural retrieval:

* operations.ts ~323 — `query` op handler called hybridSearch without
  threading ctx.sourceId. Multi-source agents querying with a
  --source flag got cross-source results.
* two-pass.ts:81 (nearSymbol lookup) and two-pass.ts:131 (unresolved
  edge resolution) — TwoPassOpts.sourceId was declared and threaded
  through hybridSearch's expandAnchors call, but the actual SQL ignored
  it. The walk window crossed source boundaries every time.

Fix:
* `query` op now reads ctx.sourceId AND accepts a new `source_id`
  param (with '__all__' as the explicit force-cross-source escape
  hatch). Per-call param wins over ctx context.
* two-pass.ts both lookups join through pages.source_id when
  opts.sourceId is set; omitted opts.sourceId preserves the legacy
  cross-source contract for callers who want it.

Regression test: test/e2e/source-routing.test.ts seeds two sources
with the same `parseMarkdown` symbol + a cross-source caller edge.
Pins:
  - nearSymbol + sourceId='source-a' returns ONLY source-a chunks
  - nearSymbol + sourceId='source-b' returns ONLY source-b chunks
  - nearSymbol with no sourceId still crosses sources (contract preserved)
  - walk_depth=1 unresolved-edge resolution stays in source-a

PGLite in-memory, no DATABASE_URL needed. The fix proves out under
realistic structural retrieval not just a contrived unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(v0.34 W0b): flip CLI source-scoping default to truly source-scoped

Codex outside-voice review (finding #7) caught that the v0.20.0
docstring claim "by default we only match the caller's source_id"
contradicted the implementation in code-callers.ts:54 + code-callees.ts:43:

  allSources: allSources || !sourceId

The right side made `allSources` TRUE whenever `--source` was omitted,
INVERTING the documented default. Multi-source brains silently cross-
contaminated structural retrieval; `gbrain code-callers parseMarkdown`
on a brain with two repos returned callers from both even though the
docstring promised per-source scoping.

Fix:
* New canonical helper `resolveDefaultSource(engine)` in sources-ops.ts.
  Contract per eng review D7:
    - exactly 1 source registered → return its id (single-source brains,
      the 80% case; --source flag is unnecessary friction there)
    - 2+ sources → throw SourceResolutionError(multiple_sources_ambiguous)
      with the list of valid ids
    - 0 sources → throw SourceResolutionError(no_sources)
* code-callers.ts + code-callees.ts now resolve to the default source
  when both --source AND --all-sources are absent. To get the pre-v0.34
  cross-source behavior, callers must pass --all-sources explicitly.
* Same hint text on both commands. Pinned by test/e2e/cli-source-scoping-pglite.test.ts.

IRON RULE regression R2: docstring promise now holds. Multi-source brain
running `gbrain code-callers <symbol>` without --source gets a clear
error listing valid source ids instead of silent cross-resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): within-file two-pass symbol resolver + edges_backfilled_at watermark

Codex's outside-voice review caught that the v0.20.0 graph stores BARE
callee tokens (`render`, `find`, `execute`) — not qualified names. Pre-v0.34
recursive blast/flow would alias every same-named function across classes.
W0c is the foundation that fixes this: resolve `code_edges_symbol` rows by
matching `to_symbol_qualified` against the SAME-FILE chunks'
`symbol_name_qualified`, then write the outcome to `edge_metadata`.

This commit is the resolver primitive + schema. The cycle-phase wiring
that calls it on every quick-cycle tick lands in the next commit.

Schema (v51 migration `edges_backfilled_at_v0_34`):
* `content_chunks.edges_backfilled_at TIMESTAMPTZ` — resume watermark.
  Chunks where the column is NULL OR older than EDGE_EXTRACTOR_VERSION_TS
  get re-walked next tick. SIGINT/OOM/sleep mid-backfill loses at most
  one batch.
* Indexes per D11 from eng review:
  - `idx_code_edges_symbol_resolver(source_id, to_symbol_qualified)` —
    composite for the resolver's per-source lookup.
  - `idx_content_chunks_symbol_lookup(page_id, symbol_name_qualified)`
    WHERE `symbol_name_qualified IS NOT NULL` — file-batched candidate
    fetch; also reused by W4-5 cluster recompute.
  - `idx_content_chunks_edges_backfill(edges_backfilled_at)` WHERE
    `edges_backfilled_at IS NULL` — fast unresumed-row scan.

Module (`src/core/chunkers/symbol-resolver.ts`):
* `resolveSymbolEdgesIncremental(engine, {sourceId, maxChunks?, onProgress?})`
  walks stale chunks in 200-chunk batches. For each chunk, loads its
  unresolved edges, finds same-page candidates by symbol_name_qualified,
  and writes outcome to `edge_metadata`:
   - exactly 1 candidate → `{resolved_chunk_id: <id>}`
   - 2+ candidates → `{ambiguous: true, candidates: [...]}`
   - 0 candidates → unchanged (cross-file; two-pass.ts handles those)
  Each batch bumps `edges_backfilled_at = NOW()` for the chunks.
* `readEdgeResolution(metadata)` — public helper for downstream code
  (two-pass.ts, code_blast op, eval-capture) to consume the resolver's
  output without parsing JSON directly. Returns a tagged union.
* `EDGE_EXTRACTOR_VERSION_TS` exported constant — bump when extractor
  shape changes and the next cycle re-walks all chunks.

Tests (5 E2E in test/e2e/symbol-resolver-pglite.test.ts, all PGLite,
no DATABASE_URL): unambiguous match, ambiguous multi-match, no match,
watermark advance + idempotency, source isolation (no cross-source
candidate leak).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W0c): wire resolve_symbol_edges as a new cycle phase

W0c's symbol resolver lands as a 12th cycle phase between extract and
patterns. The autopilot's quick-cycle path (60s watchdog interval per
D2 from eng review) now resolves stale chunks incrementally so agents
see resolved edges within ~60s of writes rather than waiting on the
slow full-walk path.

* CyclePhase + ALL_PHASES + NEEDS_LOCK_PHASES extended with
  'resolve_symbol_edges'. Position: between extract (which emits new
  bare-token edges from sync diffs) and patterns (which reads the
  graph). Acquires the cycle lock because it writes edge_metadata.
* CycleReport.totals adds edges_resolved + edges_ambiguous so doctor
  and autopilot summaries surface the numbers.
* runPhaseResolveSymbolEdges walks every registered source via
  listSources() + resolveSymbolEdgesIncremental(). Per-call cap is
  BATCH_SIZE*10 = 2000 chunks so a single watchdog tick stays bounded
  even on a 100K-chunk brain. Subsequent ticks pick up the leftovers
  via the edges_backfilled_at watermark.
* Test count bumped from 11 → 12 phases in cycle.serial.test.ts and
  cycle.test.ts (both pinned by the regression guards). Existing 28
  cycle tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3): MCP-expose code_callers / code_callees / code_def / code_refs

Pre-v0.34 these four code-intelligence commands lived in CLI_ONLY at
cli.ts:30 — agents calling gbrain via MCP couldn't reach them and fell
through to text search. This commit ships the agent-facing MCP surface
for v0.34 against the existing v0.20+ tree-sitter call graph; recursive
blast/flow and clusters land in subsequent commits.

* `code_callers(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCallersOf. Reverse view of the A1 call graph.
* `code_callees(symbol, [limit, source_id, all_sources])` — wraps
  engine.getCalleesOf. Forward view.
* `code_def(symbol, [limit, lang])` — wraps findCodeDef. Returns
  definition sites with file/line/snippet.
* `code_refs(symbol, [limit, lang])` — wraps findCodeRefs. Returns
  every reference (comments, strings, imports, call sites).

All four are scope:'read', source-scoped by default via ctx.sourceId
(W0a contract). Per-call source_id param wins over ctx; pass '__all__'
or all_sources=true to force cross-source.

* operations-descriptions.ts: 4 new constants per the eng review D10
  finding — every description carries an inline example response so
  agents don't burn first-call context discovering shape. Resolver-grade
  wording ("BEFORE editing any function, run code_callers...") routes
  plan-mode questions straight to the right op.
* SEARCH_DESCRIPTION gains a cross-link clause pointing at the four new
  ops so agents stop falling through to text search for code-symbol
  questions.

Tests (11 E2E in test/e2e/code-intel-mcp-ops-pglite.test.ts):
  - All four ops registered + scope:read + description pinned by constant
  - All four ops have required symbol param
  - code_callers / code_callees return the documented envelope shape
  - Source scoping honors ctx.sourceId
  - all_sources=true / source_id='__all__' force cross-source
  - code_def returns the def-site snippet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.33.0): agent-readable migration doc for the code-intel foundation

skills/migrations/v0.33.0.md gives existing-user upgrade guidance for the
v0.33.0 foundation pre-release (this branch's accumulated work toward
v0.34 Cathedral III):

* Source-routing fix (Codex #2) — query / two-pass now honor sourceId
* CLI source-scoping default flipped (Codex #7) — gbrain code-callers
  defaults to source-scoped, --all-sources is the explicit opt-out
* MCP exposure of code-callers / code-callees / code-def / code-refs
  with resolver-grade descriptions agents auto-route to
* Within-file symbol resolver runs as a new `resolve_symbol_edges`
  cycle phase between extract and patterns
* Schema migration v51: edges_backfilled_at watermark + 3 composite/
  partial indexes for the resolver hot path
* Verification commands the agent runs after `gbrain upgrade`

Bumps the existing-user migration ladder so the auto-update agent
(SKILLPACK Section 17) discovers + runs the v0.33.0 migration steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.33.0): bump VERSION + package.json + CHANGELOG

v0.33.0 ships the v0.34 Cathedral III foundation: MCP exposure of
code_callers / code_callees / code_def / code_refs with resolver-grade
tool descriptions, plus the source-routing fix + within-file symbol
resolver + cycle-phase wiring that v0.34's recursive blast/flow and
Leiden clusters will build on.

Full release notes in CHANGELOG.md. Trio in lockstep:
  VERSION:      0.33.0
  package.json: 0.33.0
  CHANGELOG.md: ## [0.33.0] - 2026-05-11

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.33.0): update dream-cycle phase-order assertions for resolve_symbol_edges

E2E test pinned the canonical phase sequence as a regression guard. The
v0.33.0 resolve_symbol_edges phase (added between extract and patterns)
correctly bumps the count to 12 — caught by the canonical-order test on
fresh-Postgres run, fixed by adding the new phase to EXPECTED_PHASES
and bumping the version history comment.

Both cycle.serial.test.ts and cycle.test.ts were already updated in the
W0c cycle-phase commit (6f7dbe1); this third pin lives in
test/e2e/dream-cycle-phase-order-pglite.test.ts and was missed.

Full E2E suite now: 550 passed / 0 failed / 81 files (real Postgres on
port 5435 via Docker pgvector/pgvector:pg16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 STEP 0): promote OperationContext.sourceId to REQUIRED (D4)

Flip src/core/operations.ts:350 `sourceId?: string` → `sourceId: string`.
Mirrors v0.26.9 `remote` REQUIRED pattern that closed the HTTP RCE class —
the compiler is the first defense against any v0.34 code-intel op
forgetting to thread sourceId and silently cross-contaminating retrieval
across sources.

- src/mcp/dispatch.ts: buildOperationContext auto-fills 'default' when
  opts.sourceId is undefined. Single-source brains (~80% of installs)
  keep working with no caller change; multi-source brains pass sourceId
  explicitly via dispatch opts.
- src/cli.ts:makeContext: always populates sourceId via the existing
  resolveSourceId() 6-tier chain, falling back to 'default' on
  fresh/pre-init brains where the sources table doesn't exist yet.
- src/commands/book-mirror.ts, src/core/minions/tools/brain-allowlist.ts:
  Two production context-builders that previously omitted sourceId.
  Both now pass sourceId: 'default' (operator-trust path, single-source
  by design).
- 10 test/* files: every OperationContext literal now passes sourceId.

test/operation-context-sourceid-required.test.ts: paired contract test
(6 cases) pinning the type contract. @ts-expect-error directives on
omitted-sourceId / undefined-sourceId guard against future regression;
runtime tests verify buildOperationContext's auto-fill safety net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W1): receiver-type resolution at edge-extraction time

The edge-extractor emits qualified callee names (Class::method,
module::method) for the 3 MUST-resolve patterns from the design doc
when running against JS/TS/TSX + Python source:

  1. `import { x } from 'y'; x.method()` → emit `y::method`
  2. `class C { m() { this.m() } }` → emit `C::m`
  3. `const c = new C(); c.m()` → emit `C::m`

When the receiver can't be resolved within WALK_DEPTH_CAP (32) ancestor
hops of the call site, falls back to bare-token emit (pre-W1 behavior).
Ambiguous-but-named-correctly beats wrong-but-confident; the symbol
resolver's second pass still gets a chance to disambiguate via same-page
symbol_name_qualified lookups.

Per D18 from eng review — only JS/TS/TSX + Python get receiver
resolution. Ruby/Go/Rust/Java keep pre-W1 bare-token emit semantics.
RECEIVER_RESOLUTION_LANGS pins the eligible set.

Per D12 from eng review — WALK_DEPTH_CAP=32 covers any realistic code
shape; JSX-in-JSX or closure chains rarely exceed depth-20. The cap
prevents one pathological file from multiplying cycle cost across the
whole brain on every dream run.

- src/core/chunkers/edge-extractor.ts: new `resolveReceiverType` helper
  + WALK_DEPTH_CAP export + RECEIVER_RESOLUTION_LANGS set. extractCallEdges
  attempts resolution on every member-call emit; falls back on miss.
- src/core/chunkers/symbol-resolver.ts: EDGE_EXTRACTOR_VERSION_TS bumped
  to 2026-05-14 so the next dream cycle re-walks every chunk and lets
  the resolver pick up qualified-name matches.

test/code-intel/scope-walker-resolution.test.ts: 10 hermetic snapshot
tests covering all 3 MUST patterns + bare-call fallback + unresolvable
member call. Tests load tree-sitter WASMs on demand and short-circuit
when grammars are unavailable in the test runtime.

Scope reduction from the original plan: the .scm pattern-file
architecture envisioned by the design doc is deferred to v0.34.1. The
codebase doesn't use tree-sitter's Query API anywhere today; introducing
it across chunkers/scope/patterns/* is a multi-day investment that
duplicates the manual-AST-walker idiom edge-extractor.ts already uses.
This commit ships the same functional outcome (qualified names for the
3 MUST patterns + depth cap + honest language scope) via the existing
idiom; v0.34.1 can refactor to .scm files if/when query-API benefits
materialize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W2): edge densification — imports + references edge types

Edge extractor now emits three edge kinds:
  - calls (v0.20 baseline; v0.34 W1 added qualified-name receiver
    resolution for JS/TS/TSX + Python)
  - imports (NEW in v0.34 W2; JS/TS/TSX + Python at depth)
  - references (NEW in v0.34 W2; TS-only)

Why this matters: Leiden clusters on a calls-only graph produce overfit
garbage (GitNexus showed 0.052 cluster/node on calls-only — useless).
Adding imports + references densifies the graph so W4-5's clusters can
land meaningful communities. Per design doc Constraint #1.

- src/core/chunkers/edge-extractor.ts: new extractImportEdges and
  extractReferenceEdges functions + combined extractAllEdges wrapper.
  ExtractedEdge.edgeType widened to 'calls' | 'imports' | 'references'.
- src/core/chunkers/code.ts: switched the chunker's edge-extraction call
  site from extractCallEdges to extractAllEdges so imports + references
  flow into code_edges_symbol alongside calls.
- src/core/chunkers/symbol-resolver.ts: EDGE_EXTRACTOR_VERSION_TS bumped
  to 2026-05-14T01:00:00Z so the next dream cycle re-walks every chunk.

Language scope per D18 from eng review:
  - JS/TS/TSX: imports + references emitted
  - Python: imports emitted, references skipped (Python type hints too
    sparse for v0.34; v0.35 may revisit)
  - Ruby/Go/Rust/Java: calls only — no imports, no references. Honest
    coverage matrix; code_blast/code_flow return 'unsupported_language'
    response for these langs (W2 commit 4 wires this).

Edge schema reused: code_edges_symbol.edge_type is the existing TEXT
column populated by the unique constraint
(from_chunk_id, to_symbol_qualified, edge_type). Adding new types
doesn't conflict with existing calls edges.

test/code-intel/edge-densification.test.ts: 13 hermetic tests covering
named/default/namespace/aliased/side-effect imports for JS/TS, from-x-
import-y + import-pkg for Python, function parameter + return type
references for TS, and unsupported-language returns-empty contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3b): code_traversal_cache table, module, and clear admin op

Schema migration v56 (code_traversal_cache_v0_34):
  - new table: code_traversal_cache (id, symbol_qualified, depth,
    source_id, response_json JSONB, max_chunk_updated_at, xmin_max,
    cluster_generation, computed_at)
  - unique index on (symbol_qualified, depth, source_id)
  - secondary index on source_id for cheap source-scoped clears

D3 — generation-counter cache invalidation. cluster_generation is a
BIGINT column on every cache row; bumped once per recompute_code_clusters
phase via bumpClusterGeneration(). Cache rows referencing stale
generations naturally miss on read. Eliminates the bug class where
cluster recompute leaves stale cache entries that reference dropped or
renamed clusters.

D8 — destructive-guard parity. clearTraversalCache requires either
source_id OR all_sources=true. Without either it throws. Mirrors v0.26.5
destructive-guard pattern; the MCP op (code_traversal_cache_clear,
scope: admin, localOnly: true) inherits the gate.

- src/core/code-intel/traversal-cache.ts: cache module with public API
  - getClusterGeneration / bumpClusterGeneration (config-backed counter)
  - getCachedTraversal / putCachedTraversal (low-level read/write)
  - getCachedOrCompute (try-cache-then-compute wrapper for W3 ops)
  - clearTraversalCache (admin clear with source-scope gate)
- src/core/operations.ts: code_traversal_cache_clear op registered with
  scope: 'admin' + localOnly: true. Dry-run aware; resolves source_id
  from params or ctx.

v0.34.0.0 scope: cache writes use xmin_max=0 sentinel (no snapshot
isolation). REPEATABLE READ + xmin_max snapshot isolation + PGLite
serialization_failure retry is wired in the module but disabled by
default; v0.34.1 enables it once W3 ops produce enough load to justify
the correctness gain. Under low-write workloads (the common case for an
agent's plan-mode session, 5-15 blast calls without concurrent sync),
the cache stays correctness-safe via the cluster_generation invalidation
+ the natural UPSERT on conflict.

test/code-intel/traversal-cache.test.ts: 13 hermetic PGLite tests
covering cache hit/miss, D3 generation-counter invalidation, UPSERT
replacement, source-scoped + all-sources clear paths, and getCachedOrCompute
try-cache-then-compute happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W3): code_blast + code_flow recursive ops + sinks

Recursive caller (code_blast) + recursive callee (code_flow) walks land
as first-class MCP ops. The user-facing payoff for v0.34: v0.33.3
shipped flat callers/callees; v0.34 ships depth-grouped recursive walks
with cycle detection, truncation flags, freshness reporting, sink
tagging on terminal nodes, and bare-name disambiguation with
did_you_mean suggestions.

- src/core/code-intel/recursive-walk.ts: BFS over existing engine
  single-hop methods (getCallersOf, getCalleesOf). Depth-grouped output;
  confidence = clamp(1 / (1 + 0.3 * depth), 0.05, 1.0). Cycle detection
  via visited-set; truncation enum captures both depth_cap and max_nodes
  exhaustion. Source-scoped per D4 sourceId REQUIRED.
- src/core/code-intel/sinks/{ts,py,index}.ts: per-language sink patterns
  as TypeScript constants (D9 — auditable literal-string + glob; NOT
  regex). Pattern cache hits warm after first match per process.
  TS_SINKS covers fetch, axios.*, fs.*, Bun.*, execSync, spawnSync;
  PY_SINKS covers requests.*, urllib.*, subprocess.*, open, pathlib.*.
- src/core/operations.ts: code_blast + code_flow registered with
  scope: 'read'. Both wrap their walks through
  getCachedOrCompute (W3b) so repeat blasts in a plan-mode session hit
  cache. depth + max_nodes hard-capped at handler entry per design doc
  Constraints. exact: true skips bare-name disambiguation.

Response envelope (shared):
  { result: 'ok' | 'not_found' | 'ambiguous' | 'unsupported_language',
    depth_groups?, cycles_detected?, truncation?, freshness?,
    did_you_mean?, candidates?, supported? }
code_flow adds: terminal_nodes: [{symbol, sink_kind}] where sink_kind ∈
  'db_call' | 'http_call' | 'file_io' | 'process_exec' | 'unknown'

Per D18 from eng review — only JS/TS/TSX + Python get walks. Other
languages return {result: 'unsupported_language', supported: ['ts',
'tsx','js','py']} cleanly rather than aliasing same-named callees.

test/code-intel/recursive-walk.test.ts: 11 hermetic PGLite tests:
  - 7 sinks classifier cases (http_call, file_io, db_call, process_exec
    for TS + Python, unknown for made-up symbol, unknown for ruby lang)
  - not_found returns did_you_mean
  - happy-path: caller chain emerges in depth_groups; confidence ~0.77
    at depth 1
  - truncation: depth_cap fires when walk exceeds depth
  - sink-tagging: fetch lands in terminal_nodes with http_call kind

v0.34.0.0 scope reductions: stdio rate limiter at dispatch.ts and CLI
wrappers (gbrain blast / gbrain flow) deferred — the ops are MCP-
reachable today and the W8 release packaging step adds CLI thin-shims.
The eng-review's stdio limiter at dispatch.ts (D10) is queued behind
the eval gate run; concurrent code-intel load needed to justify it
hasn't materialized at v0.34.0.0 ship time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W6): gbrain edges-backfill CLI

Operator escape hatch for the symbol-resolution backfill chain. Thin
wrapper over resolveSymbolEdgesIncremental that takes explicit
--source / --all-sources / --max-chunks flags.

Resumable via the edges_backfilled_at watermark (W0c). Per-batch
transactions commit, so Ctrl-C leaves a clean resumable state. A re-run
picks up where the prior invocation stopped.

Usage:
  gbrain edges-backfill                # default source
  gbrain edges-backfill --source <id>  # specific source
  gbrain edges-backfill --all-sources  # every registered source
  gbrain edges-backfill --json         # machine-readable output

Wired into src/cli.ts CLI_ONLY + dispatch table.

Scope reduction from the original plan: gbrain wiki (the zero-LLM
cluster aggregator) is deferred to v0.34.1 alongside W4-5 clusters —
without clusters, the wiki aggregator has nothing to aggregate.
gbrain upgrade backfill prompt is also deferred to v0.34.1; v0.34.0.0's
upgrade chain runs apply-migrations only, and users who want to
materialize the new W1/W2 edge shapes invoke gbrain edges-backfill
manually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.34 W7): per-op graph-traversal metrics module

src/core/eval-capture-graph.ts — pure-function metrics module for
comparing code_blast / code_flow / code_cluster_get result shapes
across two runs (eval-replay's regression check).

Per Codex finding #3 from the plan-review: page-slug Jaccard is the
wrong metric for graph traversal. v0.34 W7 ships proper per-op metrics:

  - nodeSetJaccard(a, b): set Jaccard over (file, line, symbol)
    tuples. Right metric for code_blast/code_flow node sets.
  - depthGroupStability(a, b): 1 - (displaced / |union|). Catches the
    case where node membership is identical but nodes moved between
    depth buckets between runs.
  - truncationMatch(a, b): boolean match on the truncation enum.
    Discrete signal that pairs with Jaccard.
  - adjustedRandIndex(a, b): cluster-membership stability via ARI for
    code_cluster_get. v0.34.1 consumer; lands in W7 alongside the rest
    so the cluster-replay path is ready when clusters ship.
  - compareCodeWalk(a, b): convenience wrapper returning
    {jaccard, depth_stability, truncation_match} in one call.

Hermetic — no engine, no DB, fully unit-testable. 20 test cases
covering identical / disjoint / partial-overlap / empty / dedup /
file+line-distinguished, depth-bucket reshuffles, truncation-enum
matching, ARI identical-clustering recognition through label-rename,
ARI singleton-vs-all-one expected-zero, equal-length contract, and
combined compareCodeWalk envelope.

Scope reduction from the original plan: extending
src/core/eval-capture.ts capture wrapper with `tool` field +
`result_shape` payload, and extending src/commands/eval-replay.ts to
dispatch on tool — both deferred to v0.34.1. The metric MODULE is the
load-bearing piece (Codex finding #3's primary fix); wiring it through
the existing capture/replay surface is a follow-up that doesn't change
production behavior until clusters ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.34.0.0): VERSION + package.json + CHANGELOG + migration doc

Final release packaging for v0.34.0.0. Three-line audit will show:
  VERSION:     0.34.0.0
  package.json: 0.34.0.0
  CHANGELOG:   ## [0.34.0.0] - 2026-05-14

CHANGELOG entry follows CLAUDE.md voice rules:
  - Bold headline + lead paragraph
  - "What ships in v0.34.0.0" itemized list
  - "Slip handling — deferred to v0.34.1" honest scope note
  - Numbers-that-matter table comparing v0.33.3 → v0.34.0.0
  - Mandatory "## To take advantage of v0.34.0.0" block with verify
    commands (gbrain edges-backfill, gbrain doctor, code_blast/flow,
    eval gate run)

skills/migrations/v0.34.0.0.md — agent-readable upgrade doc. Lists
the mechanical migration chain (apply-migrations adds v56), the
manual `gbrain edges-backfill --all-sources` step for re-walking
existing chunks with the new W1/W2 emission shape, and the slipped
v0.34.1 scope.

v0.34.0.0 ships:
  STEP 0 (sourceId REQUIRED), W1 (receiver-type resolution),
  W2 (imports + references), W3b (traversal cache),
  W3 (code_blast + code_flow + sinks),
  W6 (gbrain edges-backfill CLI),
  W7 (eval-capture-graph metrics module).

v0.34.1 backlog: W4-5 Leiden clusters, W6 wiki, W7 capture wiring,
W1 .scm rewrite, W3 stdio limiter, W3 CLI shims, D2 autopilot
sub-loop. All deferred per the plan's explicit slip-handling clause
because the cluster ship gate (≤0.03 clusters/node) and the eval
gate (+10pp precision@5) both require real brain data unavailable
at ship time.

Test surface in v0.34.0.0 (73 hermetic pass across 6 new files):
  - test/operation-context-sourceid-required.test.ts (6 cases)
  - test/code-intel/scope-walker-resolution.test.ts (10 cases)
  - test/code-intel/edge-densification.test.ts (13 cases)
  - test/code-intel/traversal-cache.test.ts (13 cases)
  - test/code-intel/recursive-walk.test.ts (11 cases)
  - test/code-intel/eval-capture-graph.test.ts (20 cases)

Migration v56 (code_traversal_cache_v0_34) verified applying clean
on PGLite via the test suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.34 D7): snapshotIndexes helper for cross-engine index parity

Extends test/helpers/schema-diff.ts with snapshotIndexes() +
diffIndexSnapshots() + isCleanIndexDiff() + formatIndexDiffForFailure().

Why this matters: the existing snapshotSchema() captures
information_schema.columns only, so a missing INDEX (not column)
between Postgres and PGLite silently passes the schema-drift test
while the symbol resolver degrades from index-only-scan to Cartesian
on 96K-chunk brains. The v0.34 D7 finding from the eng review called
this out specifically for the W4-5 hot-path indexes
(code_edges_symbol_unresolved_idx partial composite +
content_chunks_symbol_lookup_idx composite).

Implementation: queries pg_index + pg_class via pg_catalog views
(supported by both Postgres and PGLite). Captures index name, owning
table, full pg_get_indexdef() shape, uniqueness, partial-predicate.
The diff compares definitions after normalizing whitespace +
lowercasing — engine-specific formatting differences are filtered out
so only real shape drift surfaces.

Reused by future test/e2e/schema-drift.test.ts wiring (sibling test
that spins up real Postgres + PGLite, snapshots both, diffs).

test/helpers/schema-diff-indexes.test.ts: 7 hermetic cases on
synthetic snapshots — matching, pg-only, pglite-only, uniqueness
mismatch, partial-predicate mismatch, allowlist suppression, and the
formatter producing a readable failure message naming the missing
side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(v0.34): update 4 pre-existing tests for new emit shapes + sourceId contract

Three test files updated to match the v0.34 contract changes:

- test/edge-extractor.test.ts: two assertions on `toSymbol` exact-match
  were brittle to the W1 receiver-type resolution. `this.go()` /
  `self.go()` now resolve to `Foo::go` instead of bare `go`. Tests
  accept either form for back-compat with brains still on pre-W1
  extracted edges.

- test/source-id-tx-regression.test.ts: the D16 "back-compat
  cross-source view preserved" test was asserting that ctx.sourceId
  undefined → cross-source view. v0.34 STEP 0 (D4) closes that path
  by design — it's the exact cross-source-bleed bug class STEP 0
  fixed. Test renamed + assertion updated to reflect: makeCtx() with
  no override now falls back to 'default' (per the dispatch + cli
  auto-fill), and cross-source visibility is an explicit caller
  decision, not an implicit consequence of ctx omission.

- test/chunker-timeout.test.ts: the GBRAIN_CHUNKER_TIMEOUT_MS=1
  fallback case asserted edges=[] under the calls-only extractor.
  W2's extractAllEdges emits imports/references from top-level
  statements even on a partial parse, so the timeout-fallback path
  can return non-empty edges. Assertion relaxed to "edges is an
  array" — the contract that matters is "returns cleanly without
  hanging," not the edges-array shape.

Full unit suite (parallel + serial): 6132 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(migrate): remove duplicate edges_backfilled_at migration at v58

CI surfaced a duplicate migration version in test/migrate.test.ts:371
("runMigrations sorts by version ascending" — uniq.size === versions.length).

Root cause: the second master merge (PR garrytan#934 v0.33.3.0 foundation, commit
3fc0ca5) brought in master's `edges_backfilled_at` migration alongside
the one already in my branch. Both functionally identical (ALTER TABLE
content_chunks ADD COLUMN edges_backfilled_at + 3 indexes), both
renumbered to v58 (mine via the f25b674 merge that pushed past master's
v55 search-lite migrations; master's PR garrytan#934 originally claimed v55
which would have collided). Auto-merge kept both, named `_v0_33_2` and
`_v0_33_3`. Tests caught it.

Fix: deleted the `_v0_33_3` duplicate. The remaining `_v0_33_2` entry at
v58 is unchanged; SQL idempotency (ALTER TABLE IF NOT EXISTS + CREATE
INDEX IF NOT EXISTS) means brains that already applied either label
pass through cleanly.

Verification:
- 55 migrations total, all unique versions
- `bun run typecheck` clean
- `bun test test/migrate.test.ts`: 109 pass / 0 fail / 321 expect calls

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ederated_read + 3 more (garrytan#996)

* fix(mcp): skip stdin EOF handlers when MCP_STDIO=1

OpenClaw's bundle-mcp gateway and similar wrappers pipe the JSON-RPC
handshake on stdin then close their stdin half. Pre-fix, both stdin
'end' and 'close' listeners (server.ts:65-66 and serve.ts:204-206)
treated this as a permanent disconnect and shut the server down before
the first tool call arrived.

Guard both sites with `process.env.MCP_STDIO !== '1'`. Signal handlers
(SIGTERM/SIGINT/SIGHUP), transport.onclose, and the parent-process
watchdog still cover legitimate shutdown paths. The serve.ts site
threads the env read through an injectable `mcpStdio?: boolean` on
ServeOptions so tests stay isolated (no process.env mutation per
scripts/check-test-isolation.sh R1).

Tests: 3 new cases in test/serve-stdio-lifecycle.test.ts pin the
guard's invariants — mcpStdio=true must NOT trigger shutdown on stdin
EOF, signals must still drive shutdown with mcpStdio=true, and
mcpStdio=false (default) preserves existing CLI behavior. 25/25 pass.

Origin: PR garrytan#870.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(oauth): honor token_endpoint_auth_method=none for PKCE public clients

RFC 7591 §3.2.1: when a DCR client declares
token_endpoint_auth_method="none" (PKCE-only public clients like Claude
Code, Cursor), the authorization server MUST NOT issue a client_secret.
Pre-fix, registerClient unconditionally minted a secret, and the MCP
SDK's clientAuth middleware then rejected valid public-client flows on
/token because it expected client.client_secret to match.

Three changes to src/core/oauth-provider.ts:registerClient:

  - Gate clientSecret generation on isPublicClient = (auth_method === 'none').
    Public clients store client_secret_hash = NULL.
  - Omit client_secret from the response payload for public clients.
    Confidential clients (default client_secret_post and explicit
    client_secret_basic) keep their existing one-time-reveal shape.
  - Normalize NULL secret_hash to JS undefined in getClient so SDK
    middleware (which checks client.client_secret === undefined, not
    === null) correctly identifies public clients and skips the
    secret-comparison branch on /token.

Schema is already permissive (client_secret_hash TEXT, no NOT NULL on
both src/schema.sql and src/core/pglite-schema.ts) — no migration
needed.

Tests: 5 new cases in test/oauth.test.ts pin:
  - public client → no client_secret in response (#11 from plan)
  - default auth_method → secret unchanged (regression guard)
  - explicit client_secret_post → secret unchanged
  - getClient NULL→undefined normalization
  - PKCE full /authorize → /token end-to-end with no secret (#15 from plan)

69/69 oauth.test.ts cases pass. typecheck clean.

Origin: PR garrytan#909.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(serve-http): --bind HOST, default to loopback (127.0.0.1)

Adds `gbrain serve --http --bind <interface>` to control which network
interface the HTTP MCP server listens on. Default flipped from
`0.0.0.0` (pre-v0.34) to `127.0.0.1` (v0.34.0+).

Why the flip: gbrain's primary use case is a personal-knowledge brain on
a laptop. The previous default exposed brains on every interface — one
accidental `--http` invocation away from publishing the brain to a LAN.
Server operators who need remote access pass `--bind 0.0.0.0` (or a
specific interface). Codex's outside-voice on the original PR garrytan#864
correctly flagged that the additive flag wasn't actually the fix; the
default needed to change for the safety claim to hold.

If `--public-url` is set but `--bind` is unset, runServeHttp prints a
loud stderr WARN at startup recommending `--bind 0.0.0.0`. Declaring a
public URL while quietly binding loopback is almost always a
misconfiguration; we want the operator to see it on first start, not
silently fail remote requests.

Startup banner now includes a `Bind:` row so the listening interface is
visible alongside Port / Engine / Issuer.

Origin: PR garrytan#864, extended with D11 (default flip) per /plan-eng-review
codex outside-voice review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mcp): seal source-isolation leak on read path (P0)

Pre-fix, an authenticated OAuth MCP client scoped to source-A could
enumerate source-B pages via six read-side ops: search, query (text
AND image paths), list_pages, traverse_graph, and find_experts. The
v0.31.8 source-scoping pattern shipped through dispatch.ts but the op
handlers never threaded ctx.sourceId into their engine calls, and
hybridSearch.ts:223's explicit SearchOpts rebuild dropped sourceId
even when callers passed it.

Sealing the leak:

  - src/core/operations.ts adds sourceScopeOpts(ctx), the canonical
    precedence ladder: ctx.auth.allowedSources (federated) wins over
    ctx.sourceId (scalar) wins over nothing. Threaded into all 5
    read-side op handlers + the query-image-path searchVector call
    (the 6th leak surface codex caught in plan review).

  - src/core/search/hybrid.ts:223 now threads sourceId + sourceIds
    fields through the inner SearchOpts rebuild. The explicit pick
    shape is preserved (HNSW inner-CTE ordering depends on it) but
    extended.

  - src/core/types.ts adds sourceIds?: string[] to SearchOpts +
    PageFilters (D9: federated read needs array-shaped engine filter
    or fan-out; array wins for hot retrieval).

  - src/core/operations.ts AuthInfo gains sourceId + allowedSources
    (D2: identity surface symmetric with the federated_read column
    garrytan#876 will add).

  - Both engines now apply WHERE source_id = $N (scalar) or = ANY($N::text[])
    (array) at the SQL layer for searchKeyword, searchKeywordChunks,
    searchVector, listPages, traverseGraph, traversePaths. Array form
    wins when both are set. The searchVector filter pushes into the
    inner HNSW CTE (codex flagged this placement during plan review).

  - traverseGraph + traversePaths signatures gain opts.sourceId +
    opts.sourceIds; engine.ts interface updated.

  - findExperts (the whoknows op, D3 5th leak surface) accepts
    sourceId + sourceIds and threads them into its internal
    hybridSearch call. PR garrytan#861 was authored before v0.33 shipped so
    this op wasn't covered in the original PR.

Auth wiring:

  - GBrainOAuthProvider.verifyAccessToken populates AuthInfo.sourceId
    from oauth_clients.source_id. JOIN guarded by isUndefinedColumnError
    so pre-v55 brains degrade to legacy projection rather than refusing
    every token verification.

  - GBrainOAuthProvider.registerClientManual gains a sourceId
    parameter (defaults to 'default'). DCR registerClient also sets
    source_id='default' on the inserted row.

  - serve-http.ts:929 cleanup: AuthInfo.sourceId is now a real typed
    field. The cast + GBRAIN_SOURCE env fallback chain is gone (D13).
    Legacy bearer tokens default to 'default' source in
    verifyAccessToken.

  - http-transport.ts (legacy access_tokens path) threads
    sourceId='default' through DispatchOpts so v0.22.7 callers stay
    source-scoped.

  - auth.ts CLI adds --source flag to gbrain auth register-client.

Migration v55 (D10 + D13):

  - ALTER TABLE oauth_clients ADD COLUMN source_id TEXT (nullable).
  - Backfill UPDATE source_id = 'default' WHERE source_id IS NULL —
    preserves v0.33 effective behavior verbatim for legacy clients.
  - ADD CONSTRAINT FK ... REFERENCES sources(id) ON DELETE SET NULL,
    wrapped in DO block so re-runs against fresh-install brains (where
    the FK already lives inline in SCHEMA_SQL) no-op cleanly.
  - CREATE INDEX idx_oauth_clients_source_id WHERE source_id IS NOT NULL
    for the verifyAccessToken JOIN.
  - GBRAIN_ACCEPT_SILENT_WIDEN env-flag wired through the runner via
    SET LOCAL gbrain.accept_silent_widen — reserved for future migrations
    that hit the silent-widen footgun codex flagged. This migration
    doesn't need it (column is brand new; no pre-existing stale values
    possible by definition).
  - src/core/pglite-schema.ts + src/schema.sql include the column +
    FK + index inline for fresh installs.

Tests: new test/e2e/source-isolation-pglite.test.ts with 13 regression
cases — one per leak surface (search/list_pages/traverse/etc.) plus
explicit AuthInfo.sourceId and AuthInfo.allowedSources op-handler
threading checks. Full unit suite: 6034 pass / 0 fail. PGLite
initSchema time dropped from 2.4s to 850ms after consolidating v55's
DO blocks (multiple DO blocks were slow on PGLite; one DO block for
the FK install only is fine).

Origin: PR garrytan#861 + plan-eng-review decisions D2/D3/D4/D9/D10/D13 + F2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gateway): multimodal embedding for openai-compatible providers

Pre-fix, embedMultimodal hardcoded a recipe.id === 'voyage' branch and
threw AIConfigError for every other recipe. Multimodal-capable providers
fronted by LiteLLM (or any openai-compatible proxy) were unreachable
even when the operator had wired up the model.

The fix:

  - src/core/ai/gateway.ts adds embedMultimodalOpenAICompat() that
    POSTs to the standard /embeddings endpoint with content arrays
    carrying image_url entries. Routing comes from the existing
    recipe.implementation switch — Voyage stays on its own
    /multimodalembeddings path; every other openai-compatible recipe
    flows through the new helper.

  - src/core/ai/recipes/litellm-proxy.ts declares
    supports_multimodal: true so embedMultimodal accepts the recipe.
    No multimodal_models allow-list: LiteLLM is a passthrough proxy
    and the user owns model-id selection; provider rejection (400 from
    upstream) is the right enforcement layer there. Voyage's static
    allow-list shape stays unchanged (its 12 models share
    supports_multimodal but only one is multimodal-capable).

  - D12 runtime dimension validation: the new helper checks the
    returned vector length against the recipe's declared default_dims
    (preferred) or the brain's embedding_dimensions config. Mismatch
    throws AIConfigError with model id + observed + expected so the
    operator can swap models or rebuild the column. Pre-fix, a
    wrong-dim response would surface as a cryptic pgvector
    "vector dimension mismatch" at INSERT time.

  - Auth resolution routes through the existing defaultResolveAuth
    helper so optional-auth recipes (LiteLLM proxy with no
    LITELLM_API_KEY) and required-auth recipes both share one code
    path. Optional-auth sends "Authorization: Bearer unauthenticated"
    which servers like Ollama / llama-server ignore but the SDK
    contract requires.

Tests: 11 new cases in test/openai-compat-multimodal.test.ts cover
happy-path, multi-input batching, unauthenticated proxy, D12 dim
mismatch + default-dim fallback, 401 / 400 / malformed-JSON / non-array
error paths, and an explicit Voyage-regression test pinning that the
new openai-compat route doesn't accidentally hijack the Voyage path.
All 41 multimodal-related tests pass (existing voyage suite + new).
typecheck clean.

Origin: PR garrytan#875 + plan-eng-review D12 (runtime dim validation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(oauth): federated_read read scope (garrytan#876)

Pre-fix, OAuth clients had a single source-scope axis (source_id, added
in v55). A client could either write+read one source OR be a super-reader
across all sources (via NULL source_id). There was no middle ground —
WeCare-style L3 dept clients that need to write to dept-x but read
dept-x + parent canon + shared canon had no expression.

garrytan#876 adds federated_read TEXT[] as an orthogonal read-scope axis. source_id
is the WRITE authority; federated_read is the READ authority. They default
to matching values (read scope == write scope, the pre-v0.34 default)
when a client is registered without an explicit federated read list.

Migrations v56-v60 (six new migrations on top of v55):

  - v56: ALTER TABLE ... ADD COLUMN federated_read TEXT[] NOT NULL DEFAULT '{}'.
  - v57 (F5): explicit CASE backfill so source_id IS NULL → '{}' (not an
    array containing NULL — codex caught this ambiguity during plan review).
  - v58: post-backfill validation. Fails loud if any row's source_id isn't
    in its federated_read array, pointing at a logic bug in v57 if fired.
  - v59: flip the source_id FK from ON DELETE SET NULL to ON DELETE
    RESTRICT now that federated_read provides the alternative scope-loss
    path. Pre-flip, deleting a source could silently widen any oauth_client
    to super-reader; post-flip, source delete is refused if any client
    references it (operator must revoke/re-scope first).
  - v60: GIN index on federated_read for array-containment queries.

Auth wiring:

  - GBrainOAuthProvider.verifyAccessToken JOINs c.federated_read and
    populates AuthInfo.allowedSources. Pre-v56 / pre-v55 brains degrade
    via the existing isUndefinedColumnError fallback chain.
  - registerClientManual gains a federatedRead?: string[] parameter
    (defaults to [sourceId]).
  - DCR registerClient sets source_id='default' + federated_read=['default']
    on the inserted row.
  - auth.ts CLI adds --federated-read SRC1,SRC2,... flag. The
    register-client output now prints "Federated reads:" so operators
    confirm the scope they set.

Engines consume the federated array through the SearchOpts.sourceIds /
PageFilters.sourceIds field that garrytan#861 added (no engine changes here — the
plumbing was D9). sourceScopeOpts in operations.ts already prefers the
auth.allowedSources array over scalar ctx.sourceId when set.

Test seam:
  - test/book-mirror.test.ts now spawns the CLI with GBRAIN_HOME pointed
    at a tempdir so the test isn't sensitive to the developer's local
    ~/.gbrain/config.json. Pre-fix the test could silently inherit a real
    Postgres connection and hang past the default 5s test timeout. Fresh
    GBRAIN_HOME → "No brain configured" → exit 1 in <1s.
  - test/e2e/source-isolation-pglite.test.ts gains one more regression
    case: AuthInfo.allowedSources = [] (explicit empty) MUST NOT widen
    scope to "all sources" — the silent-widen footgun precedence ladder.
  - test/openai-compat-multimodal.test.ts is part of the wave's commits
    via the migrate.ts changes that bump the schema chain. typecheck-only
    fix on a captured-auth type was already in garrytan#875's tree.

6045 unit tests pass / 0 fail. typecheck clean. PGLite initSchema runs
v55-v60 in ~786ms total (within the test-harness budget for tests using
the canonical beforeAll engine pattern).

Origin: PR garrytan#876 + plan-eng-review F5 (CASE backfill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.34.0.0: MCP fix wave (garrytan#870 garrytan#909 garrytan#864 garrytan#861 garrytan#875 garrytan#876)

VERSION + package.json + CHANGELOG bump for the six-PR MCP fix wave.
Schema chain extends from v54 → v60; oauth_clients gains source_id +
federated_read columns; auth'd MCP clients now stay inside their scope
across all read-side ops; PKCE-only DCR works; --bind defaults to
loopback; LiteLLM multimodal embedding ships.

Contributed by @Hansen1018 (garrytan#870), @ding-modding (garrytan#909), @DukeDawg
(garrytan#864), @toilalesondev (garrytan#861 + garrytan#876), @yoelgal (garrytan#875).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.34.0.0

Sync README, CLAUDE.md, SECURITY.md, docs/architecture/topologies.md,
and docs/mcp/DEPLOY.md to reflect the v0.34.0.0 MCP fix wave:

- README: document --bind HOST default (loopback), --source +
  --federated-read register-client flags, PKCE public-client gate
- SECURITY.md: note loopback-by-default for serve --http, update the
  trust-proxy contract to point at the new default
- CLAUDE.md: annotate operations.ts (sourceScopeOpts helper),
  oauth-provider.ts (verifyAccessToken JOIN + PKCE public clients),
  serve-http.ts (--bind flag), gateway.ts (openai-compat multimodal +
  dim validation), mcp/server.ts (MCP_STDIO guard), auth.ts (--source
  + --federated-read), migrate.ts (v58-v63 chain), engine.ts
  (sourceIds field). Add 4 new test-file entries for
  source-isolation-pglite, openai-compat-multimodal,
  serve-stdio-lifecycle, oauth.test.ts PKCE cases
- docs/architecture/topologies.md: source-scoped register-client
  example, --bind 0.0.0.0 for thin-client host setup
- docs/mcp/DEPLOY.md: --bind explanation in the ngrok section,
  source-scoped client recipe
- llms-full.txt: regenerated per the CLAUDE.md-edit chaser rule

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump v0.34.0.0 → v0.34.1.0

Renumbering the MCP fix wave from v0.34.0.0 to v0.34.1.0 so the
release slot lands between master's v0.33.2.1 and the next minor.

Touches every release-artifact mention:
- VERSION: 0.34.0.0 → 0.34.1.0
- package.json: same
- CHANGELOG.md header + "To take advantage" block
- CLAUDE.md key-files annotations (8 entries that document this wave)
- llms-full.txt (regen from CLAUDE.md)
- README.md / SECURITY.md / docs/architecture/topologies.md / docs/mcp/DEPLOY.md
- Wave code-comment markers ("// v0.34.0 (#NNN):" → "// v0.34.1 (#NNN):")

Test files renamed alongside since they were committed with the wave.

Commit subjects on the original 6 PR commits + the v0.34.0.0 bump
commit (4f533c76b47db7) intentionally NOT rewritten — those are
history. `git log` finds the implementation by message subject, not by
version tag.

6275 unit tests pass, typecheck clean, migration chain v58-v63 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…drop + failed-file-skip + sort-flip bugs (garrytan#988)

* feat(sync): sort files newest-first for faster salience on recent content

Problem: sync processes files in git-diff order (alphabetical), so
meetings/2020-* embeds before meetings/2026-*. After a burst of writes,
new pages can be invisible to search for hours while older pages process first.

Fix: sort addsAndMods descending in both incremental sync and full import.
Brain paths are date-prefixed by convention, so lexicographic descending
naturally prioritizes recent content.

This ensures the most relevant pages become searchable first.

* feat(import): path-based checkpoint resume + sort-newest-first helper

Replace gbrain import's positional `processedIndex` checkpoint with a
path-set checkpoint via `src/core/import-checkpoint.ts`. A file is only
"done" when its processFile returns success — failed files never enter
the set, parallel workers can't lose slow files, and sort-order changes
don't drop the newest N files on resume.

Three bug classes fixed:
- Parallel import + slow worker = silent file drop on crash-resume
- Failed file = checkpoint advanced past it, never retried until manual clear
- Sort-order flip (v0.33.x) = cross-version resume drops newest N files

Old positional checkpoints are detected on first resume and discarded
with a stderr log line. Re-walking is cheap because content_hash
short-circuits unchanged files.

Also extracts the descending-lex sort into src/core/sort-newest-first.ts
so import.ts and sync.ts share a single source of truth.

Tests:
- test/sort-newest-first.test.ts (5 hermetic cases)
- test/import-checkpoint.test.ts (18 unit cases over the helpers)
- test/import-resume.test.ts (refactored — GBRAIN_HOME isolation,
  drives runImport against PGLite, 5 integration cases including
  SLUG_MISMATCH retry regression)

Includes the original sort-newest-first contribution from
@garrytan-agents's PR garrytan#964 (commit 8dbcf6a).

* chore: bump version and changelog (v0.34.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update project documentation for v0.34.2.0

Add CLAUDE.md Key Files entries for the path-based import checkpoint
work: new entries for src/core/import-checkpoint.ts and
src/core/sort-newest-first.ts, plus a dedicated src/commands/import.ts
entry covering the v0.34.2.0 refactor. Update src/commands/sync.ts
entry to reference sortNewestFirst. Regenerate llms-full.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(tests): swap banned /data/brain placeholder for /tmp/example-brain

scripts/check-privacy.sh banlist includes /data/brain/ (legacy private
OpenClaw fork layout). New test files must not use it — CI privacy
guard caught this on PR garrytan#988's first push.

No behavior change. test/import-checkpoint.test.ts is unit-level with
no fs access; the dir string is just an identity marker for the
loadCheckpoint dir-mismatch guard.

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…rrytan#1003)

* fix: supervisor treats code=0 watchdog exits as crashes

The RSS watchdog triggers gracefulShutdown() which exits with code 0.
The supervisor was counting ALL exits < 5min as crashes, including
clean code=0 exits. After 10 watchdog-triggered restarts (typical with
a 96K-page brain where autopilot inflates RSS), the supervisor gave up
with max_crashes_exceeded.

Fix: code=0 exits reset crashCount to 0 and restart immediately with
no backoff. Only code≠0 exits count toward the crash limit.

Root cause: process.memoryUsage().rss reports 7GB during autopilot
sync on large repos (possibly shared page inflation from git mmap).
The 4096MB threshold triggers on every cycle. This is a separate
issue (RSS measurement accuracy) but the supervisor should handle
clean exits regardless.

* fix: use RssAnon instead of VmRSS for watchdog threshold

process.memoryUsage().rss returns VmRSS which includes file-backed
mmap'd pages. On repos with large git packfiles (96K+ pages), git
operations inflate VmRSS to 7GB+ while actual heap usage is ~100MB.
The kernel reclaims these pages under memory pressure — they're cache.

Replace with /proc/self/status RssAnon + RssShmem which measures only
anonymous pages (heap, stack, anonymous mmap). This is the memory that
actually matters for OOM risk.

Falls back to process.memoryUsage().rss on non-Linux.

Before: watchdog triggers every autopilot cycle (7GB VmRSS > 4GB threshold)
After:  watchdog only triggers on real memory growth (~100MB << 4GB threshold)

Related: garrytan#1002 (supervisor crash-count fix for the same symptom)

* refactor(minions): extract ChildWorkerSupervisor with D1/D2 amendments

MinionSupervisor and src/commands/autopilot.ts each owned a separate
spawn-and-respawn loop. PR garrytan#1003 fixed the supervisor's crash-counter
bug (counting code=0 watchdog drains as crashes) but the autopilot
loop has the same bug class. Worse, the as-shipped garrytan#1003 fix reset
crashCount=0 on every code=0 exit, which lost the "flapping worker"
signal in mixed-exit sequences.

Extract the shared spawn loop into ChildWorkerSupervisor so both
consumers compose one tested core. The new class bakes in two
amendments resolved during plan-eng-review:

D1 (lastExitCode track): code=0 exits no longer touch crashCount.
They emit ms:0 backoff and restart immediately, but the counter
survives across them. A worker alternating exit 1 / exit 0 / exit 1
correctly trips max_crashes; a worker drained 100 times by the
watchdog stays at crashCount=0 and runs forever (also correct).

D2 (clean-restart budget): on platforms where the watchdog measures
VmRSS instead of RssAnon (macOS, kernel <4.5, restricted containers),
a perpetually over-threshold worker could clean-exit in a tight loop
with no observability. New `cleanRestartBudget` option (default 10
clean restarts per 60s window) emits a `health_warn` and applies
backoff once exceeded.

The supervisor now delegates spawn/respawn/backoff to the inner
class and maps ChildSupervisorEvent → existing SupervisorEvent
emit() channel so JSONL audit consumers see byte-compatible output.
PID lock, signal handlers, health check, and process.exit on
max-crashes stay in MinionSupervisor (those are standalone-daemon
concerns the autopilot composer doesn't need).

Tests: 6 new ChildWorkerSupervisor cases (D1 classifier, interleaved
exits, stable-run + clean-exit interaction, D2 budget tripping, per-
instance config isolation, event shape regression). Existing supervisor
tests updated to use exit-1 workers where they previously relied on
clean-exit-as-crash semantics; their assertions (env plumbing, PID
lock, audit shape) are unaffected.

Co-Authored-By: Wintermute <wintermute@garrytan.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(autopilot): compose ChildWorkerSupervisor instead of inline spawn loop

src/commands/autopilot.ts:165-197 used to have its own spawn-and-
respawn loop separate from MinionSupervisor's. It hardcoded
maxCrashes=5, fixed 10s backoff, and counted every exit (including
code=0) toward the crash limit. Codex flagged this during plan-eng
review: the parallel implementation had the same bug class fixed
in garrytan#1003, just on a different code path. Anyone running
`gbrain autopilot` as a long-running daemon (instead of
`gbrain jobs supervisor`) would hit it.

Replace the inline `startWorker` + `child.on('exit')` block with
a ChildWorkerSupervisor instance. Drops the parallel `crashCount`,
`lastWorkerStartTime`, and `STABLE_RUN_RESET_MS` state. The
ChildWorkerSupervisor's D1 lastExitCode track + D2 clean-restart
budget apply to autopilot for free.

Shutdown now drains via the supervisor's killChild + awaitChildExit
typed surface instead of reaching into `workerProc` directly. The
onMaxCrashesExceeded callback routes through autopilot's existing
shutdown('max_crashes') path so the lockfile gets cleaned up
(pre-refactor, the inline loop called process.exit(1) directly and
bypassed the cleanup).

Regression coverage in test/autopilot-supervisor-wiring.test.ts:
static-shape grep guards for `--max-rss 2048`, `maxCrashes: 5`,
the shutdown-via-callback wiring, and absence of the legacy inline
names (startWorker, workerProc, crashCount, lastWorkerStartTime,
STABLE_RUN_RESET_MS).

Co-Authored-By: Wintermute <wintermute@garrytan.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(worker): parse RssAnon as field-presence + soften OOM docstring

Two follow-ups to the RssAnon watchdog fix (b81c598), both surfaced
during plan-eng-review by Codex.

M1: getAccurateRss() used `if (anonKb > 0) return ...` to decide
whether to use the /proc/self/status reading or fall back to
process.memoryUsage().rss. That conflated "RssAnon field missing"
(old kernel, non-Linux) with "RssAnon field present but zero" (a
near-empty worker process whose only memory is shmem). The legitimate
shmem-only worker case fell through to VmRSS even though /proc had a
valid reading.

Fix: split the pure parser (parseRssFromProcStatus) into a separate
exported function that checks field presence via regex match, not
value comparison. Returns null only when the field text doesn't
match `^RssAnon:\s+(\d+)` AND `^RssShmem:\s+(\d+)`. Both fields
present + both zero is now a valid reading of 0 bytes.

M2: the docstring claimed RssAnon + RssShmem was "the memory that
actually matters for OOM risk." Codex pushed back: this is correct
for per-process leak detection but NOT a full container-OOM metric,
because cgroup memory pressure includes page cache. Soften to
"non-file-backed resident memory used for per-process leak
detection" and call out the cgroup caveat explicitly.

getAccurateRss now takes an optional readStatus function for
testability. Production callers use the default; tests inject
canned status text to cover the M1 regression and the fallback paths
without mocking the filesystem.

Tests: 11 cases covering parseRssFromProcStatus (normal, M1 regression
with anon=0 + shmem>0, both-zero, missing fields, malformed values,
shmem-only) and getAccurateRss (injected reader, ENOENT fallback,
old-kernel fallback, malformed-value fallback).

Co-Authored-By: Wintermute <wintermute@garrytan.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(minions): awaitChildExit short-circuits when child already exited

Pre-fix, awaitChildExit registered `child.once('exit', ...)` without
checking whether the child had already terminated. If the child drained
between killChild('SIGTERM') and awaitChildExit() — common on fast
SIGTERM responders — Node's 'exit' event had already fired, the late
listener never resolved, and the caller waited out the full timeout.
On the supervisor's clean shutdown path that's a 35-second hang on
every quick child.

Probe `child.exitCode` and `child.signalCode` first; resolve
immediately when either is non-null. Sub-second clean shutdown
restored.

Pre-existing in the legacy supervisor.ts shape (same bug pattern),
but since the refactor consolidates child-process management into one
class, fix the pattern at the new seam.

Regression test in test/child-worker-supervisor.test.ts: run one full
spawn cycle, then call awaitChildExit on the already-finished cycle
and assert it returns in under 200ms (well under any test timeout).

Surfaced during pre-landing /review on the fix wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.34.3.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update CLAUDE.md key-files entries for v0.34.3.0

Reflects the ChildWorkerSupervisor extraction shipped in this branch:

- Add new entry for src/core/minions/child-worker-supervisor.ts
  covering D1 lastExitCode classifier, D2 clean-restart budget, the
  awaitChildExit short-circuit, and test pinning at
  test/child-worker-supervisor.test.ts
- Update src/core/minions/supervisor.ts entry to note the spawn-loop
  extraction into the shared core + the byte-compatible event-shape
  mapping that preserves JSONL audit consumers
- Update src/commands/autopilot.ts entry to note the parallel-
  supervisor elimination + the shutdown-via-callback wiring
- Update src/core/minions/worker.ts entry with the new RssAnon /
  getAccurateRss exports + the M1 field-presence parser fix

Regenerated llms-full.txt to match (per project rule: every CLAUDE.md
edit must be followed by bun run build:llms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D4/D6/D7/D8 + regression test) (garrytan#991)

* perf(embed): cursor-paginated stale loading + rate-limit backoff + partial index

Three fixes for embed --stale on large brains (300K+ chunks):

## 1. Cursor-paginated listStaleChunks (embed timeout fix)

The previous implementation pulled ALL stale rows (up to 100K) in one
query. On a 373K-row content_chunks table with 48K stale rows, this
query took >2 min and hit Supabase's 2-min statement_timeout, causing
embed --stale to silently fail with zero progress.

Fix: keyset pagination on (page_id, chunk_index) with a default batch
size of 2000 rows. Each query finishes in <1s. The embedAllStale loop
pages through batches, embeds each batch, then advances the cursor.

## 2. Rate-limit-aware retry (429 backoff)

The OpenAI SDK's built-in retry has a ~4s max backoff window, which is
too short for TPM (tokens-per-minute) limits on large pages (~90K
tokens). The embed loop would fail after 3 SDK retries and skip the
page entirely.

Fix: embedBatchWithBackoff wrapper parses the retry delay from the
429 error message (e.g. 'try again in 248ms') and sleeps for that
duration + 500ms padding. Up to 5 retries with parsed delays (60s
fallback when unparseable).

## 3. Migration v58: partial index for NULL embeddings

`CREATE INDEX idx_chunks_embedding_null ON content_chunks (page_id,
chunk_index) WHERE embedding IS NULL` — makes countStaleChunks() and
the paginated listStaleChunks() instant instead of full-table-scanning
373K rows.

## Testing

Verified on a 99K-page / 373K-chunk brain with 48K stale chunks.
Before: embed --stale hung for 2+ min then timed out (0 progress).
After: loads 2K rows in <1s, embeds concurrently, pages through all
stale chunks without timeout.

* fix(embed): wave of hardening + tests on cursor-paginated --stale path

Lands the 9 decisions + regression test set from /plan-eng-review on PR garrytan#991's
embed-perf cherry-pick. Implements the codex outside-voice findings folded in
during plan review.

Architecture / correctness:
- D2 jitter on the parsed retry-after delay (±30%) so 20 concurrent workers
  don't relock on the next 429 wave (thundering herd fix).
- D3 + D3a + D8 wall-clock budget (GBRAIN_EMBED_TIME_BUDGET_MS, default 30
  min) threaded as an AbortSignal into THREE places: the retry sleep
  (abortableSleep), the per-key worker claim loop, and the gateway embed
  call itself (so a worker mid-fetch on a ~30s OpenAI HTTP timeout cancels
  within seconds instead of waiting it out).
- D4 structured 429 detection that unwraps the gateway's AITransientError
  wrap via cause chain (depth-limited to 5). Naive `e.status === 429` was
  silently false against normalized errors; message-match stays as
  fallback. detect429FromCause exported as @internal helper.
- D4a `maxRetries: 0` passthrough through embedBatch → gateway →
  embedMany so the AI SDK's default 2-retry stack doesn't multiply this
  wrapper's 5 attempts (was up to 15 total cycles per call).
- D6 migration v59 (embed_stale_partial_index) rewritten to use
  CREATE INDEX CONCURRENTLY + handler-based engine-branching (mirrors v14
  invalid-remnant pattern). Plain CREATE INDEX would have taken ShareLock
  on the 373K-row content_chunks table for the duration of the build.
- D7 sourceId threaded through countStaleChunks + listStaleChunks +
  embedAllStale. `gbrain embed --stale --source X` was silently dropping
  the flag pre-fix and counting/embedding across every source. Both
  Postgres and PGLite engines updated.

Tests added:
- D5 8 unit cases for embedBatchWithBackoff in test/embed.serial.test.ts:
  ms / s retry-after parse, fallback, non-rate-limit rethrow, jitter
  variance, budget abort during sleep+fetch, normalized-error cause
  unwrap, maxRetries:0 passthrough verification.
- D5a fixed every pre-existing stale-row mock to include source_id +
  page_id (required on StaleChunkRow as of v0.33.3 cursor pagination —
  TypeScript's structural typing was hiding these).
- D7 unit cases asserting CLI `--source X` parses + threads sourceId.
- Gap scan: end-to-end wall-clock budget firing in the outer pagination
  loop via runEmbedCore.
- D6 migration v59 test cases in test/migrate.test.ts: source-shape
  assertion (CONCURRENTLY + invalid-remnant DROP-before-CREATE ordering),
  PGLite handler-branch idempotency, partial-index materialization.
- REGRESSION: new test/e2e/embed-stale-pagination.test.ts covering
  static (every chunk visited exactly once), failed-page (cursor advances
  past failures, next run picks up), page-split-across-batches,
  source-scoped scan, duplicate-slug-across-sources.
- PGLite parity cases for cursor pagination, page split, source filter
  in test/pglite-engine.test.ts (pins tuple-compare against WASM build).

Gate:
- bun run test: 6305 pass / 0 fail / 0 skip across all 8 shards + serial.
- DATABASE_URL=... bun run test:e2e: 90 files, 603 tests, 0 failures.

Plan: ~/.claude/plans/system-instruction-you-are-working-iterative-torvalds.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.34.3.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ai): add ZeroEntropy recipe + reranker touchpoint type

Widens `TouchpointKind` with `'reranker'`, adds `RerankerTouchpoint`
interface, extends `Recipe.touchpoints` and `AIGatewayConfig` to carry
reranker model state. Registers `zeroentropyai` recipe (zembed-1
embeddings + zerank-{2,1,1-small} rerankers) in the recipe registry.

Recipe declares the 7 Matryoshka dims (2560/1280/640/320/160/80/40),
Voyage-style dense-payload hedge (chars_per_token=1, safety_factor=0.5),
and 5MB rerank payload cap. Pinned by test/ai/zeroentropy-recipe.test.ts
including F1 regression (implementation literal is 'openai-compatible')
and F2 regression (base_url_default ends with /v1, no doubling).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai/dims): thread input_type 4th-arg + ZE flexible-dim allowlist

`dimsProviderOptions` gains an optional `inputType?: 'query' | 'document'`
4th param so asymmetric providers (ZE zembed-1, Voyage v3+) can route
query-side vs document-side encoding. Per-model filtering inside the
openai-compatible branch keeps `input_type` from leaking to symmetric
providers (OpenAI text-3, DashScope, Zhipu) that would 400 on it.

Adds `ZEROENTROPY_VALID_DIMS` allowlist (2560/1280/640/320/160/80/40),
`supportsZeroEntropyDimension(modelId)`, and `isValidZeroEntropyDim(dims)`.
Throws `AIConfigError` with paste-ready fix hint when zembed-1 is
configured with an invalid dim (most common: defaulting to 1536 from
DEFAULT_EMBEDDING_DIMENSIONS).

The 4th-arg is optional; existing call sites (1 production + N tests
across Voyage/OpenAI/DashScope/Zhipu/MiniMax) compile unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai/gateway): zeroEntropyCompatFetch + embedQuery + gateway.rerank()

Two seams land together because they share the same recipe + auth path.

zeroEntropyCompatFetch handles ZE's non-OpenAI-compatible wire shape:
  - URL rewrite: SDK's `${base_url}/embeddings` -> `${base_url}/models/embed`
  - Body inject: `input_type` (default 'document'; 'query' when threaded
    via providerOptions) + explicit `encoding_format: 'float'`
  - Response rewrite: `{results: [{embedding}]}` -> `{data: [{embedding,
    index}]}` so the AI SDK's openai-compat schema validates
  - `usage.prompt_tokens` injected from `total_tokens` (Voyage hit the
    same SDK schema requirement at :655)
  - Layer 1 (Content-Length) + Layer 2 (per-embedding size) OOM caps
    via tagged `ZeroEntropyResponseTooLargeError` (kept separate from
    `VoyageResponseTooLargeError` because the Voyage cap tests do
    structural source-text greps pinning the Voyage name)
  - Wired in `instantiateEmbedding()` via the existing
    `recipe.id === 'voyage' ? voyageCompatFetch : ...` ternary pattern

embedQuery(text) routes `inputType: 'query'` through dimsProviderOptions
for the search hot path. Companion to embed(texts) which now takes an
optional 2nd-arg inputType (defaults to undefined -> 'document' for
asymmetric providers).

gateway.rerank() is the new native HTTP path (no AI-SDK reranking
abstraction). Resolves the configured reranker model via
`getRerankerModel()` (new accessor), parses + asserts the model is in
the recipe's touchpoint.reranker.models allowlist (CDX2-F11:
assertTouchpoint does not enforce allowlists for openai-compatible
recipes — rerank() does it directly). Posts to
`${recipe.base_url}/models/rerank` with bearer auth. Returns
`RerankResult[]` sorted by `relevanceScore`. Errors classify into
`RerankError.reason: 'auth' | 'rate_limit' | 'network' | 'timeout' |
'payload_too_large' | 'unknown'`. 5s default timeout. Pre-flight payload
guard rejects bodies over `recipe.max_payload_bytes` BEFORE any HTTP
call so applyReranker can fail-open without burning a round-trip.
`_rerankTransport` + `__setRerankTransportForTests` mirror the embed
test seam.

`AIGatewayConfig.reranker_model` + isAvailable('reranker') branch +
configureGateway / reconfigureGatewayWithEngine extensions thread the
reranker model through the same state path as embedding/expansion/chat.
`applyResolveAuth` + `defaultResolveAuth` widen the touchpoint param to
include `'reranker'`. `KnownTouchpointKey` + `getTouchpoint()` in
model-resolver widen to cover `'reranker'`.

Pinned by:
- test/ai/embedQuery.test.ts (8): returns single Float32Array, threads
  input_type='query' for ZE, drops field for OpenAI text-3,
  back-compat: legacy embed() callers without 4th arg keep their
  previous Voyage no-input_type shape
- test/ai/rerank.test.ts (21): URL (F2 regression — no /v1/v1/), body
  shape, bearer header, response parsing, error classification across
  6 HTTP shapes, payload pre-flight (no transport call), allowlist
  enforcement
- test/ai/zeroentropy-compat-fetch.test.ts (14): structural source
  assertions for the shim that mirror test/voyage-response-cap.test.ts —
  URL rewrite path, body injection, response rewrite, usage.prompt_tokens
  injection, OOM caps Layer 1 + Layer 2 + instanceof rethrow,
  instantiateEmbedding wiring branch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(search): applyReranker + rerank-failure audit + hybrid wire-in

src/core/search/rerank.ts — the call-site abstraction. Slices the top
`opts.topNIn` deduped candidates, sends to gateway.rerank(), reorders by
relevanceScore desc, appends the un-reranked tail in its original RRF
order (recall protection). Fail-open on every RerankError.reason: logs
via `logRerankFailure` and returns the input array unchanged. Stamps
`rerank_score` onto reordered items. `topNOut: null` is the explicit
"don't truncate" signal — distinct from `undefined` (fall through to
mode bundle); pin in test (CDX2-F16).

src/core/rerank-audit.ts — failure-only JSONL audit at
`~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl` (ISO-week rotation;
mirrors `src/core/audit-slug-fallback.ts`). Exports `logRerankFailure`
+ `readRecentRerankFailures(days)`. **No `logRerankSuccess`** — CDX2-F22
deliberately drops success-event logging: writing once per tokenmax
search is hot-path I/O churn AND success events leak query
volume + timing into a local audit. The doctor check reads
`search.reranker.enabled` first so "no events in window" gets
interpreted correctly (disabled -> healthy by definition; enabled ->
healthy because nothing failed). Query text is SHA-256-prefix-hashed
(8 hex chars) for privacy. Honors `GBRAIN_AUDIT_DIR`.

src/core/search/hybrid.ts — slots `applyReranker` between
`dedupResults()` and `enforceTokenBudget()` in the main RRF path.
Resolution: per-call `opts.reranker` overrides; otherwise pulled from
the resolved mode bundle (tokenmax -> enabled, others -> disabled in
commit 5). Cache rows store final reranked results; the bumped
knobsHash (commit 5) ensures rows can't leak across reranker configs.

src/core/types.ts — adds `SearchOpts.reranker` as a structural type so
callers can pass per-call overrides; runtime type lives in
src/core/search/rerank.ts (avoids circular import).

Tests:
- test/search/rerank.test.ts (14): reorder, tail preserve, fail-open on
  every error class, topNOut null vs number, score stamping, empty +
  enabled=false pass-through
- test/rerank-audit.test.ts (10): JSONL round-trip, error_summary
  truncated to 200, corrupt rows skipped, missing dir -> [], ISO-week
  rotation walks current + previous week, no logRerankSuccess export
  (CDX2-F22 contract)
- test/search/hybrid-reranker-integration.test.ts (6): reranker fires
  when enabled, doesn't when disabled, reorders correctly, preserves
  tail, stamps rerank_score, fail-opens on rerankerFn throw — uses
  PGLite + stubbed embed transport, no API keys

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(search/mode): reranker mode-bundle fields + KNOBS_HASH_VERSION v=2

Extends `ModeBundle` with five reranker fields: `reranker_enabled`,
`reranker_model`, `reranker_top_n_in`, `reranker_top_n_out`,
`reranker_timeout_ms`. Per-mode defaults:

  - conservative -> enabled=false (cost-sensitive)
  - balanced     -> enabled=false (opt-in via search.reranker.enabled)
  - tokenmax     -> enabled=true  (the high-cost-tolerant tier; ~$0.0003/query)

Defaults model to `zeroentropyai:zerank-2`, topNIn=30, topNOut=null
(no truncate by default; preserves tokenmax's searchLimit=50 end-to-end
per CDX2-F16), timeout_ms=5000.

`SearchKeyOverrides` + `SearchPerCallOpts` + `resolveSearchMode.pick`
all extend to thread the new fields through the resolution chain
(per-call -> per-key config -> mode bundle -> default).

`loadOverridesFromConfig` adds parsers for the five new
`search.reranker.*` config keys. `top_n_out` parsing distinguishes
three input shapes (CDX2-F15):
  key absent           -> undefined (fall through to mode bundle)
  'null'|'none'|empty  -> explicit null (no truncate)
  positive integer     -> that number

`SEARCH_MODE_CONFIG_KEYS` extends so `gbrain search modes --reset`
clears the reranker overrides too.

**KNOBS_HASH_VERSION bumps 1 -> 2** (CDX1-F14). Five new entries
appended to `parts[]` (append-only convention CDX2-F13; reordering
existing fields would silently rebuild every existing cache row).
Includes `reranker_timeout_ms` so a 5s -> 100ms change invalidates
stale rows (CDX2-F14: more fail-opens = different search behavior).

Mid-rolling-deploy note (CDX2-F12): v=1 and v=2 processes produce
distinct cacheRowIds for the same (source_id, query_text). Expect a
temporary hit-rate dip + cache-row doubling for hot queries. Clears
naturally within `cache.ttl_seconds` (default 3600s).

src/commands/search.ts extends `KNOB_DESCRIPTIONS` with five new
entries so `gbrain search modes` renders them. test/search-mode.test.ts
extends the three bundle fixtures and bumps the KNOBS_HASH_VERSION
expectation to 2.

Pinned by test/search/knobs-hash-reranker.test.ts (13): each of the 5
reranker fields independently flips the hash, top_n_out=null renders
stable, append-only convention enforced via source-position assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): probeRerankerConfig + reranker_health check

`gbrain models doctor` gains two new probes:

- `probeRerankerConfig` (zero-network) validates that the configured
  reranker model resolves through the recipe registry, that the recipe
  declares a `reranker` touchpoint, and that the model is in
  `touchpoint.models[]`. Direct allowlist check here — assertTouchpoint
  does not enforce allowlists for openai-compatible recipes (CDX2-F11).
  Surfaces paste-ready `gbrain config set search.reranker.model
  <zerank-2|zerank-1|zerank-1-small>` fix hint.

- `probeRerankerReachability` (1-token-equivalent) sends a minimal
  `{query: "probe", documents: ["probe"]}` rerank to verify auth + URL.
  Failures classify via `classifyError` into auth/rate_limit/network/
  unknown. Skipped silently when reranker is unconfigured.

Also extends `probeEmbeddingConfig` with a `providerId === 'zeroentropyai'`
branch that catches the silent-1536-default bug class for zembed-1
configurations (same posture as the existing Voyage branch).

`ProbeResult.touchpoint` widens to include `'reranker_config'`.

`gbrain doctor` adds `checkRerankerHealth` to both the abbreviated
(doctorReportRemote) and full (runDoctor) check sets. Logic:

  1) Read `search.reranker.enabled` first. Disabled + no failures =>
     'reranker disabled'. Enabled + no failures => healthy.
  2) Walk last 7 days of ~/.gbrain/audit/rerank-failures-*.jsonl.
  3) ANY auth failure warns (config-time problem the probe should have
     caught — surface it).
  4) ANY payload_too_large failure warns (workload mismatch).
  5) Transient (network/timeout/rate_limit) warns at >=5 in window.
     Below that they're noise; reranker fails open anyway.

CDX2-F21 blind-spot fix: reading enabled state first means "no events"
gets interpreted correctly — never confuses "never-used" with "success
logging broken" (the latter is impossible because there is no success
logging by design, CDX2-F22).

Engine-agnostic; file-based + one config-key read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): ZeroEntropy live API round-trip + wire into Tier 2 CI

test/e2e/zeroentropy-live.test.ts exercises the full stack against the
real api.zeroentropy.dev: embed (default 2560-dim + flexible 1280),
embedQuery (asymmetric query side), batch embed (3 distinct vectors),
rerank (3 docs sorted by relevance score, photosynthesis-relevant docs
beat the irrelevant cat doc), rerank with topN truncation.

Gated on `ZEROENTROPY_API_KEY`: every test prints `[skip]` and returns
early without assertions when the env var is unset, so fork PRs and
contributor machines without a ZE account stay green.

CI wire-up: `.github/workflows/e2e.yml` Tier 2 step adds
`test/e2e/zeroentropy-live.test.ts` to its `bun test` invocation and
exposes `ZEROENTROPY_API_KEY: ${{ secrets.ZEROENTROPY_API_KEY }}` to
the runner. The secret is set on garrytan/gbrain at the repo scope
(separately from this commit — set via `gh secret set` so the value
never lands in source).

Tier 1 stays mechanical (no API keys); Tier 2 is the natural home for
provider-live tests because it's already the API-keyed lane.

Cost: each full run fires ~6 small HTTP calls totaling well under a
cent at the published $0.025/1M-token rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.33.3.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker

Release notes for the ZeroEntropy support wave: zembed-1 embeddings
(flexible-dim 2560/1280/640/320/160/80/40, asymmetric input_type) and
zerank-2 cross-encoder reranking land as a new openai-compatible recipe
alongside OpenAI/Voyage. Reranker defaults ON for tokenmax mode, OFF
for conservative/balanced (~$0.0003/query at tokenmax topNIn=30; rounding
error vs the tier's $700/mo Opus pairing per the CLAUDE.md cost matrix).

Search now ends with `RRF -> dedup -> reranker -> token-budget` when
reranker is enabled; fails open to RRF order on any error class
(audit-logged at ~/.gbrain/audit/rerank-failures-*.jsonl).

`KNOBS_HASH_VERSION` bumps 1 -> 2 to fold reranker config into the
query_cache row key. Rolling-deploy operators should expect a temporary
cache hit-rate dip + cache-row doubling for hot queries (clears
naturally within `cache.ttl_seconds`, default 3600s).

Files in this commit are pure docs / version bump:
- VERSION + package.json bump to 0.33.3.0
- CHANGELOG.md release-summary entry with "How to take advantage" block
- CLAUDE.md Key Files annotations for the new recipe + rerank.ts +
  rerank-audit.ts + gateway extensions
- docs/ai-providers/zeroentropy.md one-pager (setup, knob reference,
  failure observability, troubleshooting table)
- skills/migrations/v0.33.3.md (purely informational: no required user
  action; reranker is opt-in everywhere, ZE embedding is opt-in)
- llms-full.txt regenerated to match CLAUDE.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sume-from) (garrytan#1055)

* docs(designs): 2026-05 embedder shootout eval plan

Adds docs/designs/2026_05_EVAL_PLAN.md — the approved plan + 6 Conductor session
briefs for the OpenAI vs Voyage vs ZeroEntropy embedder comparison.

Why: produce a publishable comparison report for v0.35.x release notes pinning
"which embedder wins, and does zerank-2 carry the win for ZeroEntropy" against
public LongMemEval + in-house BrainBench.

Each session brief is self-contained — repo, branch, commits, verify, ship,
deliverable, hand-off. Stewardable one section per Conductor session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(pricing): add voyage-4-large + zembed-1 to EMBEDDING_PRICING

v0.35.0.0 shipped ZeroEntropy zembed-1 + zerank-2 reranker support and
expanded the Voyage allow-list to include voyage-4-large. The pricing
table missed both, so `gbrain upgrade`'s post-upgrade reembed prompt
silently fell back to "estimate unavailable" for users on these models.

- voyage:voyage-4-large @ $0.18/MTok (same as voyage-3-large)
- zeroentropyai:zembed-1 @ $0.05/MTok

New test file pins both entries plus the openai/voyage-3-large baselines,
case-insensitive provider matching, bare-model openai-default fallback,
table integrity (lowercase providers, finite non-negative prices), and
the estimateCostFromChars approximation. 11 cases, 46 expect() calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(exports): expose gbrain/ai/gateway with canary test

Adds ./ai/gateway to the package.json exports map so external eval
consumers (notably gbrain-evals, the sibling repo running the embedder
shootout in docs/designs/2026_05_EVAL_PLAN.md) can call configureGateway
directly to swap embedding providers per cell.

Why: pre-v0.35.1.0, gbrain-evals adapters hardcoded gbrain/embedding,
which means every retrieval adapter was OpenAI-only. The newly-exposed
gateway lets adapters route through Voyage and ZeroEntropy without
forking gbrain or duplicating the recipe wiring.

- package.json: add "./ai/gateway" -> "./src/core/ai/gateway.ts"
- scripts/check-exports-count.sh: bump expected count 17 -> 18
- test/public-exports.test.ts: add canary pinning configureGateway + embed,
  bump expected count assertion

Pre-existing import-resolution failures in this test file (16 on master)
are unrelated to this change — they're a longstanding Bun package
self-import behavior. The count + EXPECTED_EXPORTS list-match assertions
both pass cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): add --resume-from <jsonl> to gbrain eval longmemeval

Multi-cell embedder shootouts spend $50+/cell on the gpt-4o judge after
gbrain emits hypotheses. A mid-run abort (rate-limit, cost-cap, OS
interrupt, SIGKILL) previously meant re-paying the full cell. This flag
makes those aborts cheap: re-invoke with --resume-from pointed at the
partial JSONL and only the unanswered question_ids re-run.

Behavior:
- Read question_ids from the file; skip them on this run.
- Rows with non-empty hypothesis count as done.
- Rows with hypothesis="" AND an error field are NOT skipped (retry case
  for per-question failures recorded by the existing try/catch).
- Corrupt trailing lines (SIGKILL'd writer mid-line) are silently skipped
  with a stderr warn.
- When --resume-from path == --output path, the output emitter opens the
  file in append mode instead of truncating, so the existing rows survive.
- Empty resume case (all questions already done) returns immediately
  without spinning up the brain or calling the client.

New exported helper loadResumeSet() makes the parser unit-testable.

6 new test cases pinning:
- File-not-found returns empty set
- Well-formed JSONL load
- Error-row retry semantics (empty hypothesis + error -> not in set)
- Truncated final line recovery
- End-to-end resume against the 5-question mini fixture
- All-done early-return (stub client must NOT be invoked)

All 18 cases in test/eval-longmemeval.test.ts green; bun run typecheck
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v0.35.1.0

Bumps VERSION + package.json + CHANGELOG entry for the embedder-shootout
prereq release. Three additive changes from the prior 4 commits:

- pricing: voyage-4-large + zembed-1 entries
- exports: gbrain/ai/gateway is now public
- eval: gbrain eval longmemeval --resume-from <jsonl>

Each commit on this branch is independently bisect-friendly and CI-green;
the CHANGELOG entry is the user-facing rollup. No migrations, no breaking
changes — the gateway export expands the surface, the resume-from flag is
additive, the pricing patch only changes "estimate unavailable" -> a real
dollar figure for two specific models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ytan#1056)

* docs(designs): 2026-05 embedder shootout eval plan

Adds docs/designs/2026_05_EVAL_PLAN.md — the approved plan + 6 Conductor session
briefs for the OpenAI vs Voyage vs ZeroEntropy embedder comparison.

Why: produce a publishable comparison report for v0.35.x release notes pinning
"which embedder wins, and does zerank-2 carry the win for ZeroEntropy" against
public LongMemEval + in-house BrainBench.

Each session brief is self-contained — repo, branch, commits, verify, ship,
deliverable, hand-off. Stewardable one section per Conductor session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(pricing): add voyage-4-large + zembed-1 to EMBEDDING_PRICING

v0.35.0.0 shipped ZeroEntropy zembed-1 + zerank-2 reranker support and
expanded the Voyage allow-list to include voyage-4-large. The pricing
table missed both, so `gbrain upgrade`'s post-upgrade reembed prompt
silently fell back to "estimate unavailable" for users on these models.

- voyage:voyage-4-large @ $0.18/MTok (same as voyage-3-large)
- zeroentropyai:zembed-1 @ $0.05/MTok

New test file pins both entries plus the openai/voyage-3-large baselines,
case-insensitive provider matching, bare-model openai-default fallback,
table integrity (lowercase providers, finite non-negative prices), and
the estimateCostFromChars approximation. 11 cases, 46 expect() calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(exports): expose gbrain/ai/gateway with canary test

Adds ./ai/gateway to the package.json exports map so external eval
consumers (notably gbrain-evals, the sibling repo running the embedder
shootout in docs/designs/2026_05_EVAL_PLAN.md) can call configureGateway
directly to swap embedding providers per cell.

Why: pre-v0.35.1.0, gbrain-evals adapters hardcoded gbrain/embedding,
which means every retrieval adapter was OpenAI-only. The newly-exposed
gateway lets adapters route through Voyage and ZeroEntropy without
forking gbrain or duplicating the recipe wiring.

- package.json: add "./ai/gateway" -> "./src/core/ai/gateway.ts"
- scripts/check-exports-count.sh: bump expected count 17 -> 18
- test/public-exports.test.ts: add canary pinning configureGateway + embed,
  bump expected count assertion

Pre-existing import-resolution failures in this test file (16 on master)
are unrelated to this change — they're a longstanding Bun package
self-import behavior. The count + EXPECTED_EXPORTS list-match assertions
both pass cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): add --resume-from <jsonl> to gbrain eval longmemeval

Multi-cell embedder shootouts spend $50+/cell on the gpt-4o judge after
gbrain emits hypotheses. A mid-run abort (rate-limit, cost-cap, OS
interrupt, SIGKILL) previously meant re-paying the full cell. This flag
makes those aborts cheap: re-invoke with --resume-from pointed at the
partial JSONL and only the unanswered question_ids re-run.

Behavior:
- Read question_ids from the file; skip them on this run.
- Rows with non-empty hypothesis count as done.
- Rows with hypothesis="" AND an error field are NOT skipped (retry case
  for per-question failures recorded by the existing try/catch).
- Corrupt trailing lines (SIGKILL'd writer mid-line) are silently skipped
  with a stderr warn.
- When --resume-from path == --output path, the output emitter opens the
  file in append mode instead of truncating, so the existing rows survive.
- Empty resume case (all questions already done) returns immediately
  without spinning up the brain or calling the client.

New exported helper loadResumeSet() makes the parser unit-testable.

6 new test cases pinning:
- File-not-found returns empty set
- Well-formed JSONL load
- Error-row retry semantics (empty hypothesis + error -> not in set)
- Truncated final line recovery
- End-to-end resume against the 5-question mini fixture
- All-done early-return (stub client must NOT be invoked)

All 18 cases in test/eval-longmemeval.test.ts green; bun run typecheck
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v0.35.1.0

Bumps VERSION + package.json + CHANGELOG entry for the embedder-shootout
prereq release. Three additive changes from the prior 4 commits:

- pricing: voyage-4-large + zembed-1 entries
- exports: gbrain/ai/gateway is now public
- eval: gbrain eval longmemeval --resume-from <jsonl>

Each commit on this branch is independently bisect-friendly and CI-green;
the CHANGELOG entry is the user-facing rollup. No migrations, no breaking
changes — the gateway export expands the surface, the resume-from flag is
additive, the pricing patch only changes "estimate unavailable" -> a real
dollar figure for two specific models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(eval): longmemeval adapter handles _s split + sanitizes session_id slugs

Three tightly-coupled bugs blocked `gbrain eval longmemeval` against the
public LongMemEval _s split from HuggingFace (the dataset every shootout
cell needs):

1. HAYSTACK SHAPE: the _s split serializes haystack_sessions as
   LongMemEvalTurn[][] (each inner array is one session's turns directly)
   plus a parallel `haystack_session_ids: string[]` field. The
   pre-v0.35.1.1 adapter expected only the oracle `{session_id, turns}`
   shape and crashed with `session.turns is undefined` on every question.
   Fix: new `normalizeSessions` helper accepts both shapes, mirroring the
   proven `normalizeSessions` in gbrain-evals/eval/runner/longmemeval.ts.

2. SLUG VALIDATOR: the _s split's session_ids look like
   `sharegpt_yywfIrx_0` — underscored and mixed-case. The v0.32.7 CJK
   wave's `validatePageSlug` rejects both (allowed set is `[a-z0-9-]`
   case-insensitive, slash-separated). Fix: `sanitizeSessionIdForSlug`
   lowercases and replaces `_` + `.` + any other non-[a-z0-9-] character
   with `-`. The frontmatter `session_id:` keeps the original verbatim
   for downstream JSONL emit; only the SLUG is rewritten.

3. INTERFACE: `LongMemEvalQuestion.haystack_sessions` typed as a union
   of `LongMemEvalSession[] | LongMemEvalTurn[][]` so TypeScript callers
   see both shapes are accepted. New `haystack_session_ids?: string[]`
   field documented as parallel to the array-of-turns shape.

Pre-v0.35.1.1 caught by a fresh smoke pre-spend (3 questions × ZE @ 2560
→ 3 errors). Post-fix: 3/3 OK with non-empty hypotheses, single-session
recall measured (low on a 3-question sample but the pipeline runs).

2 new regression test cases pinning:
- _s split shape normalizes (slugs sanitized + frontmatter preserves
  original session_id + dates flow through)
- _s split with missing haystack_session_ids synthesizes
  `lme_<question_id>_<i>` ids

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): configure AI gateway before running gbrain eval longmemeval

v0.28.8 skipped connectEngine() for `gbrain eval longmemeval` so the
subcommand could run on machines without a configured brain. Side
effect (silent until v0.35.1.0 made it observable via the embedder
shootout): the gateway was never configureGateway()'d either, so the
first embed call inside importFromContent crashed with "AI gateway is
not configured. Call configureGateway() during engine connect."

Fix: call configureGateway() before runEvalLongMemEval, mirroring the
connectEngine() path. Reads `~/.gbrain/config.json` when present; falls
back to env vars (GBRAIN_EMBEDDING_MODEL, GBRAIN_EMBEDDING_DIMENSIONS,
OPENAI_API_KEY, etc.) when there's no config — preserving the v0.28.8
"runs on fresh machine" property.

Gated on the --help short-circuit so `gbrain eval longmemeval --help`
still works without spinning up the gateway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v0.35.1.1

Bumps VERSION + package.json + CHANGELOG entry for the longmemeval fix
wave. Three commits this branch:

1. fix(eval): adapter handles _s split + sanitizes session_id slugs
2. fix(cli): configure AI gateway before running gbrain eval longmemeval
3. chore: v0.35.1.1

Each commit independently bisects; CHANGELOG entry is the user-facing
rollup. No schema migration; no breaking change.

Caught pre-spend by smoking Phase 1 of the embedder shootout — would
otherwise have wasted ~$476 in judge tokens across 7 cells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: retrigger workflows

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Important

Review skipped

Too many files!

This PR contains 217 files, which is 67 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4b74728d-8906-49c3-bd77-fafa2d4ac786

📥 Commits

Reviewing files that changed from the base of the PR and between 172dbcc and 6336047.

📒 Files selected for processing (217)
  • .github/workflows/e2e.yml
  • AGENTS.md
  • CHANGELOG.md
  • CLAUDE.md
  • INSTALL_FOR_AGENTS.md
  • README.md
  • SECURITY.md
  • TODOS.md
  • VERSION
  • docs/ai-providers/zeroentropy.md
  • docs/architecture/topologies.md
  • docs/designs/2026_05_EVAL_PLAN.md
  • docs/eval/METRIC_GLOSSARY.md
  • docs/eval/SEARCH_MODE_METHODOLOGY.md
  • docs/mcp/DEPLOY.md
  • llms-full.txt
  • openclaw.plugin.json
  • package.json
  • plugins/gbrain-codex/.codex-plugin/plugin.json
  • plugins/gbrain-codex/package.json
  • plugins/openclaw-gbrain/README.md
  • scripts/check-eval-glossary-fresh.sh
  • scripts/check-exports-count.sh
  • scripts/generate-metric-glossary.ts
  • scripts/update-local-install.sh
  • skills/RESOLVER.md
  • skills/conventions/search-modes.md
  • skills/manifest.json
  • skills/migrations/v0.33.0.md
  • skills/migrations/v0.33.3.0.md
  • skills/migrations/v0.34.0.0.md
  • skills/migrations/v0.35.0.0.md
  • src/cli.ts
  • src/commands/auth.ts
  • src/commands/autopilot.ts
  • src/commands/book-mirror.ts
  • src/commands/cache.ts
  • src/commands/check-update.ts
  • src/commands/claw-test.ts
  • src/commands/code-callees.ts
  • src/commands/code-callers.ts
  • src/commands/config.ts
  • src/commands/doctor.ts
  • src/commands/edges-backfill.ts
  • src/commands/embed.ts
  • src/commands/eval-code-retrieval.ts
  • src/commands/eval-compare.ts
  • src/commands/eval-longmemeval.ts
  • src/commands/eval-replay.ts
  • src/commands/eval-run-all.ts
  • src/commands/eval-whoknows.ts
  • src/commands/eval.ts
  • src/commands/import.ts
  • src/commands/init-mode-picker.ts
  • src/commands/init.ts
  • src/commands/models.ts
  • src/commands/search.ts
  • src/commands/serve-http.ts
  • src/commands/serve.ts
  • src/commands/sync.ts
  • src/commands/upgrade.ts
  • src/commands/whoknows.ts
  • src/core/ai/dims.ts
  • src/core/ai/gateway.ts
  • src/core/ai/model-resolver.ts
  • src/core/ai/recipes/index.ts
  • src/core/ai/recipes/litellm-proxy.ts
  • src/core/ai/recipes/voyage.ts
  • src/core/ai/recipes/zeroentropyai.ts
  • src/core/ai/types.ts
  • src/core/chunkers/code.ts
  • src/core/chunkers/edge-extractor.ts
  • src/core/chunkers/symbol-resolver.ts
  • src/core/code-intel/recursive-walk.ts
  • src/core/code-intel/sinks/index.ts
  • src/core/code-intel/sinks/py.ts
  • src/core/code-intel/sinks/ts.ts
  • src/core/code-intel/traversal-cache.ts
  • src/core/cycle.ts
  • src/core/cycle/extract-facts.ts
  • src/core/doctor-remote.ts
  • src/core/embedding-pricing.ts
  • src/core/embedding.ts
  • src/core/engine.ts
  • src/core/eval-capture-graph.ts
  • src/core/eval/drift-watch.ts
  • src/core/eval/metric-glossary.ts
  • src/core/git-remote.ts
  • src/core/import-checkpoint.ts
  • src/core/migrate.ts
  • src/core/minions/child-worker-supervisor.ts
  • src/core/minions/supervisor.ts
  • src/core/minions/tools/brain-allowlist.ts
  • src/core/minions/worker.ts
  • src/core/oauth-provider.ts
  • src/core/operations-descriptions.ts
  • src/core/operations.ts
  • src/core/pglite-engine.ts
  • src/core/pglite-schema.ts
  • src/core/postgres-engine.ts
  • src/core/rerank-audit.ts
  • src/core/schema-embedded.ts
  • src/core/search/hybrid.ts
  • src/core/search/intent-weights.ts
  • src/core/search/mode.ts
  • src/core/search/query-cache.ts
  • src/core/search/rerank.ts
  • src/core/search/telemetry.ts
  • src/core/search/token-budget.ts
  • src/core/search/two-pass.ts
  • src/core/sort-newest-first.ts
  • src/core/sources-ops.ts
  • src/core/types.ts
  • src/eval/code-retrieval/harness.ts
  • src/eval/code-retrieval/questions.json
  • src/eval/code-retrieval/strategies.ts
  • src/eval/longmemeval/adapter.ts
  • src/mcp/dispatch.ts
  • src/mcp/http-transport.ts
  • src/mcp/server.ts
  • src/mcp/tool-defs.ts
  • src/schema.sql
  • test/ai/dims-zeroentropy.test.ts
  • test/ai/embedQuery.test.ts
  • test/ai/gateway.test.ts
  • test/ai/rerank.test.ts
  • test/ai/zeroentropy-compat-fetch.test.ts
  • test/ai/zeroentropy-recipe.test.ts
  • test/autopilot-supervisor-wiring.test.ts
  • test/benchmark-knowledge-runtime.ts
  • test/benchmark-put-page-latency.ts
  • test/book-mirror.test.ts
  • test/check-update.test.ts
  • test/child-worker-supervisor.test.ts
  • test/chunker-timeout.test.ts
  • test/code-intel/edge-densification.test.ts
  • test/code-intel/eval-capture-graph.test.ts
  • test/code-intel/recursive-walk.test.ts
  • test/code-intel/scope-walker-resolution.test.ts
  • test/code-intel/traversal-cache.test.ts
  • test/code-retrieval-harness.test.ts
  • test/commands-search.test.ts
  • test/commands/models.serial.test.ts
  • test/config-unset.test.ts
  • test/core/cycle.serial.test.ts
  • test/doctor-remote.test.ts
  • test/doctor-search-mode.test.ts
  • test/drift-watch.test.ts
  • test/e2e/cli-source-scoping-pglite.test.ts
  • test/e2e/code-intel-mcp-ops-pglite.test.ts
  • test/e2e/cycle.test.ts
  • test/e2e/dream-cycle-phase-order-pglite.test.ts
  • test/e2e/embed-stale-pagination.test.ts
  • test/e2e/eval-contradictions-postgres.test.ts
  • test/e2e/graph-quality.test.ts
  • test/e2e/mechanical.test.ts
  • test/e2e/postgres-bootstrap.test.ts
  • test/e2e/source-isolation-pglite.test.ts
  • test/e2e/source-routing.test.ts
  • test/e2e/symbol-resolver-pglite.test.ts
  • test/e2e/whoknows.test.ts
  • test/e2e/zeroentropy-live.test.ts
  • test/edge-extractor.test.ts
  • test/embed.serial.test.ts
  • test/embedding-pricing.test.ts
  • test/eval-compare.test.ts
  • test/eval-contradictions-integrations.test.ts
  • test/eval-longmemeval.test.ts
  • test/eval-run-all.test.ts
  • test/eval-whoknows.test.ts
  • test/extract-facts-phase.test.ts
  • test/find-experts-op.test.ts
  • test/fixtures/whoknows-eval.jsonl
  • test/get-brain-identity.test.ts
  • test/git-remote.test.ts
  • test/helpers/schema-diff-indexes.test.ts
  • test/helpers/schema-diff.ts
  • test/hybrid-search-lite.serial.test.ts
  • test/import-checkpoint.test.ts
  • test/import-resume.test.ts
  • test/init-mode-picker.test.ts
  • test/install-contract.test.ts
  • test/intent-weights.test.ts
  • test/local-updater-contract.test.ts
  • test/mcp-tool-defs.test.ts
  • test/metric-glossary.test.ts
  • test/migrate.test.ts
  • test/oauth.test.ts
  • test/openai-compat-multimodal.test.ts
  • test/operation-context-sourceid-required.test.ts
  • test/pglite-engine.test.ts
  • test/public-exports.test.ts
  • test/put-page-namespace.test.ts
  • test/query-cache-knobs-hash.test.ts
  • test/query-cache.test.ts
  • test/reindex-code.test.ts
  • test/rerank-audit.test.ts
  • test/schema-bootstrap-coverage.test.ts
  • test/search-mode.test.ts
  • test/search-telemetry.test.ts
  • test/search-types-filter.test.ts
  • test/search/hybrid-reranker-integration.test.ts
  • test/search/knobs-hash-reranker.test.ts
  • test/search/rerank.test.ts
  • test/serve-stdio-lifecycle.test.ts
  • test/skillpack-sync-guard.test.ts
  • test/sort-newest-first.test.ts
  • test/source-id-tx-regression.test.ts
  • test/sources-mcp.test.ts
  • test/supervisor.test.ts
  • test/token-budget.test.ts
  • test/upgrade.test.ts
  • test/voyage-multimodal.test.ts
  • test/voyage-response-cap.test.ts
  • test/whoknows-doctor.test.ts
  • test/whoknows.test.ts
  • test/worker-rss.test.ts

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch eva/merge-upstream-v0.35.1.1

Comment @coderabbitai help to get the list of available commands and usage tips.

@100yenadmin 100yenadmin force-pushed the eva/merge-upstream-v0.35.1.1 branch 6 times, most recently from 9f7229a to 8440fb1 Compare May 17, 2026 06:48
@100yenadmin 100yenadmin marked this pull request as ready for review May 17, 2026 07:05
@100yenadmin 100yenadmin force-pushed the eva/merge-upstream-v0.35.1.1 branch from 8440fb1 to de69dd0 Compare May 17, 2026 07:10
…m-v0.35.1.1

# Conflicts:
#	AGENTS.md
#	README.md
#	llms-full.txt
#	src/cli.ts
#	src/commands/auth.ts
#	src/commands/embed.ts
#	src/commands/eval-replay.ts
#	src/commands/upgrade.ts
#	src/core/ai/dims.ts
#	src/core/ai/gateway.ts
#	src/core/embedding-pricing.ts
#	src/core/engine.ts
#	src/core/pglite-engine.ts
#	src/core/postgres-engine.ts
#	src/core/search/hybrid.ts
#	src/core/types.ts
#	test/ai/gateway.test.ts
#	test/book-mirror.test.ts
#	test/eval-contradictions-integrations.test.ts
#	test/voyage-response-cap.test.ts
@100yenadmin 100yenadmin force-pushed the eva/merge-upstream-v0.35.1.1 branch from de69dd0 to 6336047 Compare May 17, 2026 08:58
@100yenadmin 100yenadmin merged commit 4a5df13 into master May 17, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge upstream GBrain v0.35.1.1 while preserving Eva OpenClaw defaults

3 participants