Skip to content

v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes#1182

Open
garrytan wants to merge 33 commits into
masterfrom
garrytan/fix-wave
Open

v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes#1182
garrytan wants to merge 33 commits into
masterfrom
garrytan/fix-wave

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 18, 2026

Summary

v0.36.1.0 — community-reported bug fix wave. 28 atomic fixes in this PR, one per commit, each with its regression test. No new features, no breaking changes.

Phase 1 — closed as already shipped on master: 22 PRs + 14 issues across six clusters.

Phase 2 — 28 atomic commits in this PR. Per-fix bisect-friendly history.

P0 — install / upgrade was broken

P1 — data loss / silent wrong-destination

P2 — security / privilege

P3 — reliability / UX

Maintenance

  • f0ba8447 Quarantine upgrade tests that use process.env mutation pattern

Deferred to follow-up PRs (with reasoning)

Plan Completion

Full plan at ~/.claude/plans/system-instruction-you-are-working-proud-gosling.md. Plan-eng-review passed with 5 issues resolved + codex outside-voice processed (22 findings; 3 cross-model tensions resolved, 19 additive corrections folded directly into the plan).

Test plan

  • bun run verify — typecheck + privacy + isolation + admin-build + WASM-embedded + admin-embedded — all pass
  • bun test — 656 tests across 38 files, all pass in isolation; one pre-existing cross-file flake (chunk-grain-fts.test.ts PGLite beforeEach timeout) reproduces on master without these changes
  • Manual smoke: gbrain init --pglite && gbrain apply-migrations --yes walks full chain through v0.32.2 on fresh install
  • CI: brain-writer test fix pushed; should go green
  • bun run test:e2e — Postgres E2E lifecycle to be run before merge (DATABASE_URL gated)
  • Post-merge: smoke gbrain config set openai_api_key sk-test does NOT print key value
  • Post-merge: fresh gbrain serve --http returns 200 on curl http://localhost:3131/admin

Co-Authored-By: aadachi aadachi@users.noreply.github.com
Co-Authored-By: Aashiqe10 Aashiqe10@users.noreply.github.com
Co-Authored-By: amreshtech amreshtech@users.noreply.github.com
Co-Authored-By: ArshyaAI ArshyaAI@users.noreply.github.com
Co-Authored-By: bnc-ss bnc-ss@users.noreply.github.com
Co-Authored-By: clarajohan clarajohan@users.noreply.github.com
Co-Authored-By: Cossackx Cossackx@users.noreply.github.com
Co-Authored-By: curtitoo curtitoo@users.noreply.github.com
Co-Authored-By: DF-FrancoOS DF-FrancoOS@users.noreply.github.com
Co-Authored-By: DmitryBMsk DmitryBMsk@users.noreply.github.com
Co-Authored-By: hnshah hnshah@users.noreply.github.com
Co-Authored-By: jeunessima jeunessima@users.noreply.github.com
Co-Authored-By: johnybradshaw johnybradshaw@users.noreply.github.com
Co-Authored-By: kkroo kkroo@users.noreply.github.com
Co-Authored-By: lubos-buracinsky lubos-buracinsky@users.noreply.github.com
Co-Authored-By: lukejduncan lukejduncan@users.noreply.github.com
Co-Authored-By: mnuradli1 mnuradli1@users.noreply.github.com
Co-Authored-By: mvanhorn mvanhorn@users.noreply.github.com
Co-Authored-By: navin-moorthy navin-moorthy@users.noreply.github.com
Co-Authored-By: nezovskii nezovskii@users.noreply.github.com
Co-Authored-By: p3ob7o p3ob7o@users.noreply.github.com
Co-Authored-By: panda850819 panda850819@users.noreply.github.com
Co-Authored-By: sergeclaesen sergeclaesen@users.noreply.github.com
Co-Authored-By: sharziki sharziki@users.noreply.github.com
Co-Authored-By: sliday sliday@users.noreply.github.com
Co-Authored-By: vincedk-alt vincedk-alt@users.noreply.github.com
Co-Authored-By: 100yenadmin 100yenadmin@users.noreply.github.com

🤖 Generated with Claude Code

garrytan and others added 30 commits May 18, 2026 12:57
Terraform repos were invisible to `gbrain sync --strategy code` because
the three HCL-family extensions never reached the file walker. Silent
data loss — the user thinks the sync covered the repo but the IaC layer
was dropped on the floor.

detectCodeLanguage() returns null for these extensions, so the chunker
falls back to recursive (no tree-sitter grammar for HCL) — the same
path toml/yaml take.

Closes #878.

Co-Authored-By: johnybradshaw <johnybradshaw@users.noreply.github.com>
`gbrain upgrade --strategy bun` was failing on canonical
`bun install -g github:garrytan/gbrain` installs because `execSync('bun
update gbrain')` ran in the user's shell cwd. Bun's update operates on
whatever package.json it finds via cwd-walk, so a user not standing in
the global root got "No package.json, so nothing to update".

resolveBunGlobalRoot() returns the right directory:
1. `$BUN_INSTALL/install/global` when set (operator override).
2. `~/.bun/install/global` (Bun's documented default).
3. Walk up from realpath(argv[1]) looking for `node_modules/gbrain` —
   handles non-standard installs without trusting argv naming.

execFileSync replaces execSync (no shell), with cwd pinned. Error path
prints the exact `cd && bun update` recovery command instead of a vague
hint.

Closes #1029. Cherry-picked from PR #1032.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>
)

`gbrain config set openai_api_key sk-...` was echoing the full key to
stderr via `console.log('Set %s = %s', key, value)`. Shell scrollback and
tmux scroll buffers commonly retain stderr for hours; a screen-share or
shoulder-glance during set leaked the secret.

The `show` path already redacted but used a naive `.includes('key')`
substring check that would mask 'monkey' or 'parsekey' (no false-negative
but ugly).

Single source of truth: `isSensitiveConfigKey()` uses a word-boundary
regex (`(^|[._-])(key|secret|token|password|pwd|passwd|auth)([._-]|$)/i`)
so 'openai_api_key' matches but 'monkey' doesn't. `redactConfigValue()`
composes the postgresql:// URL redactor + sensitive-key check, used by
both `show` and `set`. Helpers exported for unit tests.

Closes #892. Cherry-pick of @sharziki's PR #918 (config.ts hunk only —
the extract.ts walker change in that PR is unrelated and tracked in #202).

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>
`verifyAccessToken` was throwing bare `Error` on expired or invalid
tokens. The MCP SDK's `requireBearerAuth` middleware catches
`InvalidTokenError` and returns 401 with WWW-Authenticate; bare Error
falls through to 500. Result: legitimate clients with stale tokens hit
500-not-401, so token-refresh logic (which keys off 401) never fires.

Two call sites in verifyAccessToken: token-expired path and
invalid-token path. Both now throw InvalidTokenError. Existing tests
continue to pass because they assert on the throw, not the message class.

Closes #935. Cherry-picked from PR #1012.

Co-Authored-By: Aashiqe10 <Aashiqe10@users.noreply.github.com>
MCP Streamable HTTP spec says GET /mcp opens an optional SSE backchannel
for server-initiated messages. gbrain's transport is stateless and
doesn't push server-initiated messages, so per spec we MUST return 405
with Allow: POST, DELETE — not 404. Probing clients (claude.ai, etc.)
distinguish "endpoint exists, no SSE channel" from "endpoint missing"
on this status code; 404 makes them give up.

Cherry-picked from PR #1076.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>
`gbrain doctor` warned about a missing whoknows fixture for every install
that wasn't standing in the gbrain source repo at run time — which is
everyone. The check used `process.cwd()` to locate the fixture, so any
real user (running doctor against `~/.gbrain`) saw a spurious warning.

`resolveWhoknowsFixturePath()` walks up from `import.meta.url` looking
for the source-repo signature (`src/cli.ts` + `skills/RESOLVER.md`),
respects `GBRAIN_WHOKNOWS_FIXTURE_PATH` env override (absolute or
cwd-relative), and returns null with an actionable warning when the
fixture can't be located.

Closes #969. Cherry-picked from PR #1034.

Co-Authored-By: mvanhorn <mvanhorn@users.noreply.github.com>
`gbrain frontmatter validate --fix` and `gbrain frontmatter generate
--fix` wrote `<file>.bak` siblings into the source tree. Users running
gbrain over a brain repo found .bak files scattered through people/,
companies/, etc. that broke gitignore expectations and showed up in
`git status` after every fix pass.

Backups now land under `~/.gbrain/backups/frontmatter/<run-id>/<rel>.bak`
with an iso-week-sorted run-id so a multi-fix session keeps the same
parent directory. Backup directory + per-file structure mirrored from
the original file's relative path. The .bak safety contract is intact
for both git and non-git brain repos.

Also adds `--include-catch-all` opt-in to `frontmatter generate` so the
default catch-all rule (`type: note`) is no longer applied to arbitrary
workspace documents that happen to live under a brain root.

Closes #902. Cherry-picked from PR #903.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>
The GBRAIN_HOME validator rejected every valid Windows path (`C:\\Users\\...`,
`D:\\gbrain`, etc.) because it used `trimmed.startsWith('/')` to check for
absoluteness — only POSIX absolute paths pass that. `path.isAbsolute()` is
the cross-platform check.

Same fix for the `..` traversal check: split on both `/` and `\` so
Windows path separators don't sneak `..` through.

Closes #1019. Cherry-picked from PR #1083.

Co-Authored-By: sharziki <sharziki@users.noreply.github.com>
…ipes

Gateway construction was warning on stderr for every recipe with an
embedding touchpoint missing max_batch_tokens — including providers the
brain isn't using. Users on Voyage saw noise about OpenAI / Google /
DashScope / etc. recipes that never get loaded.

Filter the warning to recipes whose provider id is referenced by
`embedding_model` or `embedding_multimodal_model` in the active config.
The structural protection against forgetting max_batch_tokens stays in
place for the recipes that actually run; the noise for unrelated recipes
goes away.

Cherry-picked from PR #1117.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>
`gbrain sync` ran `git pull` unconditionally and printed scary stderr
on every cycle for brains that have no `origin` remote (local-only
workflows, single-machine setups, brains initialized via `gbrain init
--pglite` against an arbitrary directory). The pull failed harmlessly
but the noise was confusing and made operators think sync was broken.

`hasOriginRemote()` probes `git remote get-url origin` with stdio
ignored; on failure (`no such remote`), skip the pull, print a single
informational line, and proceed with the local working tree.

Cherry-picked from PR #1119.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>
The query cache write was fired with `void promise.catch(...)` — true
fire-and-forget. On a fast CLI invocation (`gbrain query <q>` exits in
~50ms), the process terminates before the cache write commits. Result:
the cache effectively never warms from CLI use; every query is a miss.

`awaitPendingSearchCacheWrites()` tracks each in-flight cache write in a
module-level Set. The CLI dispatcher awaits the set after `query`
finishes formatting output but before the process exits. MCP server path
unchanged (long-lived process, fire-and-forget remains correct).

Cherry-picked from PR #1125.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>
…page

A source page that mentions the same entity N times produced N
duplicate "Referenced in" lines on the target. `extractEntityRefs`
returns one EntityRef per occurrence, and the per-ref `hasBacklink`
check reads a snapshot of `target.content` that's frozen at outer
scope — so every iteration sees "no backlink yet" and appends another
gap. The cumulative effect on a long meeting note with multiple
mentions of the same person was visible in PRs landing 3-5 identical
Timeline entries.

Track seen target slugs per source page; cap gaps at one pair.

Cherry-picked from PR #967 with a current-master regression test
covering both markdown-link and Obsidian-wikilink formats in the same
source page.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>
The dream/autopilot maintenance cycle ran the backlinks phase in 'fix'
mode, which writes "Referenced in" timeline bullets into entity pages
every sync. The graph extractor + auto-link path is the canonical link
store during sync/dream/autopilot — the legacy filesystem fixer wrote
markdown that fought with both the user's manual edits and the graph
layer's own timeline.

Cycle now runs backlinks in 'check' mode (audit-only); the materializer
remains available via `gbrain check-backlinks fix` for users who really
want markdown backlinks committed to disk.

Cherry-picked from PR #1027.

Co-Authored-By: sliday <sliday@users.noreply.github.com>
zshenv is the canonical place for env vars in zsh on macOS — zshrc is
sourced only for interactive shells, so vars exported in zshrc don't
reach a non-interactive subprocess like the autopilot wrapper. Users
who exported GBRAIN_DATABASE_URL, OPENAI_API_KEY, or ANTHROPIC_API_KEY
in zshrc and assumed autopilot would inherit them hit silent missing-
secret failures on the LaunchAgent.

Source ~/.zshenv first (always reaches non-interactive shells per zsh
docs), then fall back to ~/.zshrc / ~/.bashrc for users on other
profile conventions.

Cherry-picked from PR #966.

Co-Authored-By: p3ob7o <p3ob7o@users.noreply.github.com>
`gbrain apply-migrations list`, `gbrain apply-migrations --dry-run`, and
the "All migrations up to date" path were returning from the async
function but never calling `process.exit(0)`. The CLI dispatcher in
cli.ts treated the implicit fall-through as exit 1 when the parent
process inspected status via shell scripts, breaking automation that
gates on `apply-migrations list && do-something`.

Three call sites: list, dry-run, and the no-op path. All three now
exit(0) explicitly.

Cherry-picked from PR #1062.

Co-Authored-By: nezovskii <nezovskii@users.noreply.github.com>
`gbrain sync --source-id X` triggered auto-embed for the affected slugs
but `runEmbed` ran with no `--source` flag, so it fell back to the
default source. For non-default-source syncs the page row lives at
(sourceId, slug) — the embed code saw "Page not found" for the right
slug under the wrong source, swallowed the error as best-effort, and
the sync result reported `embedded: 0` for the wrong reason.

`buildAutoEmbedArgs(slugs, sourceId)` is the new helper: when sourceId
is set, prepends `--source X`. Exported for the regression test.

Pairs with the upcoming source-id write-path audit (P1 #8). Cherry-picked
from PR #1120.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>
Two related corrections:

1. `gbrain query --no-expand` parsed `--no-expand` as the literal key
   `no_expand` instead of negating the boolean `expand` param. Result:
   the flag was silently ignored and expansion always ran. Now any
   `--no-<key>` where `<key>` is a boolean param flips it false.

2. The `query` op's source-id resolution treated `ctx.sourceId` as
   authoritative, so an explicit per-call `source_id` was overridden by
   the federated read scope. Now per-call `source_id` wins;
   `source_id=__all__` is an explicit opt-out for local cross-source
   search.

Cherry-picked from PR #1124.

Co-Authored-By: hnshah <hnshah@users.noreply.github.com>
The autopilot orphans phase detects orphan PAGES (no inbound links via
page-graph) but never scans FK-child tables. After a bulk delete or a
pre-FK-migration code path, orphan rows can persist indefinitely in
content_chunks, page_versions, tags, takes, raw_data, timeline_entries,
or links — all declared ON DELETE CASCADE, so any orphan row is
unexpected.

`childTableOrphansCheck` enumerates 10 FK columns across 8 tables:
- 8 NOT NULL columns (cascade): any value not in pages.id is an orphan.
- 2 nullable SET NULL columns (links.origin_page_id, files.page_id):
  NULL is valid; only NOT-NULL-but-missing-in-pages counts.

Surfaces paste-ready cleanup SQL when orphans are found.

Cherry-picked from PR #1064.

Co-Authored-By: vincedk-alt <vincedk-alt@users.noreply.github.com>
…cycles

Two compounding bugs under KeepAlive=true:

1. Autopilot tripped its circuit breaker on cycle.status === 'partial',
   not just 'failed'. 'partial' means at least one phase warned/failed
   while others ran — a soft signal, not fatal. On every cycle that
   warned, autopilot logged a failure and the supervisor respawned the
   worker.

2. The orphans phase emitted 'warn' when `count > 20` orphan pages.
   That threshold was tuned for small dev brains; on any corpus past a
   few hundred pages it fires every cycle in steady state. Together
   with bug 1, this produced visible respawn storms.

Fix:
- Autopilot trips only on cycle.status === 'failed'.
- Orphans phase warns by ratio: orphans / total_pages > 0.5 (the real
  "your graph fell apart" signal), not by absolute count.

Cherry-picked from PR #1113.

Co-Authored-By: sergeclaesen <sergeclaesen@users.noreply.github.com>
`embedSubBatch` only validated the FIRST embedding's dimension and never
asserted the response length matched the input length. If a provider
returned fewer embeddings than requested (rate-limit truncation,
malformed response, etc.), the gateway silently indexed an offset-shifted
result — every page after the missing index got the embedding of a
different page's chunk.

Two new guards:
1. `result.embeddings.length === texts.length` — fail loud if any count
   mismatch, with a paste-ready retry hint.
2. Validate dim on EVERY embedding, not just the first.

Cherry-picked from PR #926.

Co-Authored-By: 100yenadmin <100yenadmin@users.noreply.github.com>
…ients

The admin dashboard's /admin/api/register-client endpoint hardcoded
client_credentials and ignored grantTypes, redirectUris, and
tokenEndpointAuthMethod. Result: you couldn't register a browser-based
PKCE client (claude.ai Custom Connector, Cursor, etc.) through the
dashboard — only confidential machine-to-machine clients worked.

Pass grantTypes / redirectUris through to registerClientManual. When
tokenEndpointAuthMethod === 'none', NULL out client_secret_hash so the
SDK's clientAuth middleware skips the hash-vs-plaintext compare that
would otherwise reject the no-secret PKCE flow.

Cherry-picked from PR #1077.

Co-Authored-By: lukejduncan <lukejduncan@users.noreply.github.com>
`runExtractFacts` checked `opts.slugs && opts.slugs.length > 0` to
decide between scoped and full-brain walk. Both `undefined` (caller
omits → full walk intended) AND `[]` (sync no-op → zero work intended)
fall through to the same `else` branch and triggered
`engine.getAllSlugs()`.

On a multi-thousand-page brain, the unintended full walk exceeded
the autopilot-cycle ~600s timeout and dead-lettered the job — visible
in production as `[cycle.extract_facts] start` followed by silence
until `Autopilot stopping (cycle-failure-cap)`.

Use presence (`opts.slugs !== undefined`), not truthiness, to
distinguish the two modes. Empty array is a real incremental no-op.

Closes #1096. Three regression cases in test/extract-facts-phase.test.ts:
slugs=[] no-op, slugs=undefined still walks, slugs=['a'] walks just one.

Co-Authored-By: navin-moorthy <navin-moorthy@users.noreply.github.com>
…1090)

Pre-fix, /admin returned 404 on every globally-installed binary because
serve-http.ts:780 resolved admin/dist via process.cwd(). The admin SPA
files are checked into git but `bun build --compile` does NOT embed
arbitrary directories — only assets imported via `with { type: 'file' }`
ESM imports land in the compiled binary.

Wire:

- scripts/build-admin-embedded.ts walks admin/dist/, emits
  src/admin-embedded.ts with one `with { type: 'file' }` import per
  file + a manifest map (request path → resolved path + mime).
  Auto-invoked by `bun run build:admin`.

- src/admin-embedded.ts is the auto-generated module. Bun resolves
  every file: import to a path that works at runtime inside the
  compiled binary (same pattern as src/core/chunkers/code.ts WASM
  imports).

- serve-http.ts switches to two-tier resolution: cwd-relative
  admin/dist for dev (Vite hot-rebuild), embedded manifest otherwise.
  Embedded path reads bytes lazily and caches per-asset for the
  lifetime of the process.

- scripts/check-admin-embedded.sh CI gate re-runs the generator and
  fails on drift (mirrors check-wasm-embedded.sh). PRs that rebuild
  admin/dist but forget to regenerate the embedded module fail loud.

- package.json wires build:admin-embedded + check:admin-embedded.

Closes #1090.
…#1078)

Audit of every page write path (sync, embed, extract, dream, autopilot,
wikilinks, tags, chunks) confirmed that sourceId already threads
correctly through importFromContent → engine.putPage → SQL INSERT
since v0.18.0. The original bug reports from #891, #978, #1078 were
real at the time and got swept by the multi-source refactor; today's
master is correct.

This commit locks in that correctness with six PGLite regression cases
(no Postgres fixture needed; runs in CI everywhere):

1. importFromContent({sourceId:"work"}) lands at source_id=work, not
   the silent 'default' fallback.
2. Two sources hold the same slug independently.
3. Omitting sourceId falls through to 'default' (legacy contract).
4. Chunks land under the requested source.
5. Tags land under the requested source.
6. FK integrity smoke (originally #1078).

The earlier issue reports stay closed by the existing threading; this
suite ensures any future refactor of the write path can't silently
re-introduce the wrong-source-default bug. The 90-minute write-path
audit budget from the plan resolves here.
`gbrain apply-migrations --yes` was wedging on the v0.11.0 (Minions)
schema phase for PGLite installs. Two compounding bugs:

1. `apply-migrations` pre-flight schema-version warning connects to
   PGLite to read config.version, then disconnects. The brief lock
   hold races with downstream subprocess spawns that try to re-acquire
   it; the 30s lock timeout fires before the parent fully releases.
   Pre-flight is a *warning*; on PGLite it adds no information the
   orchestrators don't already handle. Skip the probe for PGLite.

2. v0.11.0 phase A spawned `gbrain init --migrate-only` as an execSync
   subprocess to apply schema migrations. PGLite is single-writer;
   the subprocess inherits HOME and tries to lock the same DB. On
   Postgres this works (concurrent connections OK); on PGLite it
   deadlocks. Route in-process for PGLite — create + connect +
   initSchema + disconnect directly, skipping the subprocess hop.
   Postgres keeps the legacy execSync path.

Verified: fresh PGLite install now walks the full migration chain
through v0.32.2 (Facts SoR) and lands "All migrations up to date" on
re-run.

Closes #1100.
`gbrain serve --http` regenerated the admin bootstrap token on every
restart and printed it to stderr. In supervisor-managed production
deployments (LaunchAgent, systemd, k8s) every restart leaks the value
into log aggregators and rotates the access for any agent that paste-
copied it.

Two new knobs:

- **GBRAIN_ADMIN_BOOTSTRAP_TOKEN** env var: when set, used as the
  bootstrap secret instead of a fresh per-process token. Validated:
  must match `^[A-Za-z0-9_-]{32,}$` (32-char minimum), else refuse to
  start with a paste-ready generator hint. Failing closed beats
  silently accepting a weak token.

- **--suppress-bootstrap-token** CLI flag: suppresses the printed
  token line entirely. Operator takes responsibility for tracking the
  value out-of-band.

Startup banner now reflects the chosen source:
- `Admin Token: suppressed` when the flag is set.
- `Admin Token: from $GBRAIN_ADMIN_BOOTSTRAP_TOKEN` when env-sourced.
- Full token print only when both are absent (default behavior, dev
  installs).

Closes #1024.

Co-Authored-By: billy-armstrong <billy-armstrong@users.noreply.github.com>
Pre-v0.32 docs and some community templates used a config shape:

  { "provider": "voyage", "model": "voyage-4-large" }

The canonical shape (since the v0.31.12 gateway seam) is:

  { "embedding_model": "voyage:voyage-4-large" }

Users on the legacy shape hit silent fallthrough to the hardcoded
OpenAI default; sync + embed errored out with "OpenAI embedding
requires OPENAI_API_KEY" regardless of their actual provider config.

loadConfig() now translates the legacy keys at parse time:
- emits a one-line stderr nudge with the paste-ready canonical key
- preserves the rest of the config unchanged
- skipped when `embedding_model` is already set (forward-compat)

Closes #1086.

Co-Authored-By: jeunessima <jeunessima@users.noreply.github.com>
PR #1032's cherry-picked tests use the static-snapshot + try/finally
pattern for env vars instead of the project's withEnv() helper. The
test-isolation lint catches process.env mutations outside withEnv to
prevent cross-test leakage in parallel runs.

Renaming to *.serial.test.ts (the quarantine convention) is the
documented out: runs sequentially, no cross-file race. A future cleanup
PR can migrate the tests to withEnv() and drop the quarantine.
…path

The v0.36.x frontmatter backup change (bd60cdfcloses #902) moved
.bak files from sibling-of-source to ~/.gbrain/backups/frontmatter/...
The old test still asserted on the sibling path, so CI failed even
though the production behavior was correct.

Updated assertion contract: backup lands under the injected backupRoot
(test-isolated), the returned backupPath ends in .bak and exists, and
no sibling .bak is created next to the source file. The pre-fix
sibling-path is now a negative assertion.
v0.36.1.0 — community fix wave (28 atomic fixes + 22 PRs closed as
already-shipped + 14 issues triaged).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan changed the title fix-wave: community PR triage + 28 atomic fixes v0.36.1.0 fix-wave: community PR triage + 28 atomic fixes May 18, 2026
Resolves VERSION/package.json/CHANGELOG conflicts after master shipped
its own v0.36.1.0 (calibration loop). Re-bumps to v0.36.1.1 so this
wave gets a unique slot. CHANGELOG keeps both entries (0.36.1.1 on top,
0.36.1.0 below).

Regenerated src/admin-embedded.ts to track master's admin/dist filename
rotation (calibration tab added).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan changed the title v0.36.1.0 fix-wave: community PR triage + 28 atomic fixes v0.36.1.1 fix-wave: community PR triage + 28 atomic fixes May 19, 2026
garrytan and others added 2 commits May 18, 2026 19:58
After the fix-wave shipped, an audit found 11 commits with no new test
file. Some were inherently structural (build pipelines, shell content)
or had existing test coverage that worked either way; others had real
regression risk with no guard. This commit closes the gaps that matter.

New regression tests for:

- OAuth `verifyAccessToken` throws `InvalidTokenError` (not bare Error)
  on both expired and unknown token paths. Pre-fix, the SDK's
  `requireBearerAuth` middleware fell through to 500 instead of 401 →
  client token-refresh logic never fired (#935).

- `loadConfig` translates legacy `{provider, model}` config shape to
  the canonical `embedding_model: <provider>:<model>`. 3 cases: pure
  legacy → migrated; canonical wins over legacy when both present;
  canonical-only is untouched. Pre-fix, Voyage/Cohere/Mistral users
  silently fell through to OpenAI (#1086).

- `configDir` rejects relative paths; rejects `..` segments via both
  separators (regression guard for the Windows path acceptance fix
  #1019 / cherry-pick #1083).

- `resolveBootstrapToken` (new exported helper extracted from
  `runServeHttp`). 9 cases: unset env generates fresh, valid env
  accepted, hyphens/underscores accepted, < 32 chars rejected, special
  chars rejected, whitespace trimmed, empty string rejected, 32-char
  boundary accepted, 31-char one-short rejected. Security-critical
  validation surface (#1024).

- GET /mcp returns 405 with `Allow: POST, DELETE` (E2E case in
  `serve-http-oauth.test.ts`). Pre-fix, claude.ai and other probing
  MCP clients saw 404 and gave up (#1076).

- apply-migrations `process.exit(0)` on list / dry-run / up-to-date
  paths. Source-shape assertion locks the rule in; shell scripts
  gating on `$?` work (#1062).

- Autopilot wrapper sources `~/.zshenv` BEFORE `~/.zshrc`. zshenv is
  the canonical place for env vars in non-interactive zsh; without
  this ordering, LaunchAgent subprocesses never inherit secrets
  exported in zshrc (#966).

- `test/fix-wave-structural.test.ts` consolidates source-shape
  regression guards for fixes whose behavior is hard to runtime-test
  without heavy mocking: query cache drain (#1125), admin embed
  manifest + handler (#1090), admin register-client PKCE branch
  (#1077), PGLite v0.11.0 phase A in-process routing (#1100), query
  `--no-expand` negation (#1124). 9 source-grep assertions.

Refactored `runServeHttp` to extract `resolveBootstrapToken` as a pure
helper. The boot path now consumes the helper's tagged-union result
({kind:'ok'|'error'}); side effects (`process.exit`, `console.error`)
moved to the caller. Unit-testable without spinning up Express.

Test counts: oauth 71 (was 69), config 20 (was 14), apply-migrations
19 (was 18), autopilot-install 5 (was 4), serve-http-bootstrap-token
9 (new file), fix-wave-structural 9 (new file). Net: +28 cases across
6 files; +1 new exported function with full coverage.

Remaining audit gaps (deferred):
- e82dda0 admin embed E2E (post-deploy curl smoke covers this)
- d93fa81 apply-migrations PGLite chain E2E (already smoke-tested
  manually in the original commit; subprocess test would be flaky in
  CI without DATABASE_URL gating)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both gaps now have real behavior coverage. No DATABASE_URL needed (PGLite
engine), so they run in standard unit CI alongside the rest of the suite.
Serial quarantine because both spawn subprocesses + bind ports / write
tmpdirs.

test/admin-embed-spawn.serial.test.ts (4 cases, ~6s wall-clock):
  - Spawns `gbrain serve --http` from a fresh tmpdir so `process.cwd()/
    admin/dist` does not exist — this forces the embedded-manifest
    branch (the one under test). Pre-fix, this exact setup hit 404.
  - GET /admin/ → 200 + SPA shell HTML (title + #root div), content-type
    text/html.
  - GET /admin/index.html → same body via explicit path.
  - GET /admin/agents → SPA fallback returns index.html for deep links.
  - GET /admin/api/stats → NOT 200 (regression guard: SPA fallback must
    not swallow /admin/api/* routes and silently return HTML to a JSON
    client). Closes #1090.

test/apply-migrations-pglite-spawn.serial.test.ts (3 cases, ~25s):
  - Seeds a fresh PGLite config in a tmpdir, runs `gbrain init
    --migrate-only` + `gbrain apply-migrations --yes --non-interactive`.
    Pre-fix this hit "GBrain: Timed out waiting for PGLite lock" because
    apply-migrations' pre-flight probe + v0.11.0's phase A subprocess
    both wanted the single-writer lock.
  - Asserts exit 0, no "Timed out" string, no "Phase A failed" string,
    brain.pglite file written.
  - Re-run case: idempotent — "All migrations up to date" exits 0
    (also locks in the #1062 exit-code fix end-to-end).
  - --list path exits 0 (third leg of the #1062 contract).
  Closes #1100.

Pinned bootstrap token via GBRAIN_ADMIN_BOOTSTRAP_TOKEN env so the
admin test doesn't have to scrape stderr; the startup banner format
is allowed to drift, the /health probe is the readiness contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant