fix(entities): prefix-expansion resolver + stub-guard + dropped-fact audit#1010
Closed
garrytan wants to merge 32 commits into
Closed
fix(entities): prefix-expansion resolver + stub-guard + dropped-fact audit#1010garrytan wants to merge 32 commits into
garrytan wants to merge 32 commits into
Conversation
…uard
Root cause: resolveEntitySlug fell through to slugify('Jared') → 'jared'
because pg_trgm scores too low on short strings (similarity < 0.4).
writeFactsToFence then stub-created people/jared.md instead of routing
to the existing people/jared-friedman page.
Three-layer fix:
1. resolve.ts: new isBareName() + tryPrefixExpansion() step queries
people/<token>-% then companies/<token>-%, picks highest-connection
match as tiebreaker
2. fence-write.ts: stub-creation guard refuses to create pages with
no directory prefix (bare 'jared' instead of 'people/jared-*')
3. backstop.ts: stubGuardBlocked result routes facts to legacy DB-only
path so they aren't silently dropped
13 new tests (entity-resolve.test.ts), all passing.
Per /plan-eng-review D1 (mossy-popping-crown.md). The original Wintermute
commit bundled in-progress brain artifacts and operator scripts that don't
belong in the entity-resolution fix:
- docs/proposals/temporal-contradiction-probe.md — proposal doc (separate PR)
- reports/network-intelligence/2026-05-{12,13,14}-1700.md — generated brain artifacts
- scripts/sql.mjs — operator script (separate PR if useful)
- scripts/supersede.mjs — operator script with hardcoded /data/brain path
The entity-resolution + stub-guard + backstop fix in resolve.ts, fence-write.ts,
backstop.ts, and the test file is preserved. Codex P1 docs-privacy finding
also dies here (the banned 'Wintermute' name lived in the stripped proposal).
Plan: ~/.claude/plans/mossy-popping-crown.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…merge-phantoms
Implements the /plan-eng-review + /codex review outcomes for the
entity-resolution prefix-expansion fix (plan: mossy-popping-crown.md).
D2 — entities.prefix_expansion_dirs config key
- PREFIX_EXPANSION_DIRS hardcoded list replaced with a config-driven
list. Default ['people','companies','deals','topics'] matches the
stub-guard's recognized prefix set in fence-write.ts. Custom brains
override via `entities.prefix_expansion_dirs` to support funds/,
advisors/, etc. New exported getPrefixExpansionDirs() resolver
drops bad config entries silently (non-string / empty) and falls
back to defaults.
D3 — narrow column-missing catch
- Bare `catch {}` in tryPrefixExpansion / tryExactSlug / tryFuzzyMatch
replaced with isUndefinedColumnError(err, 'deleted_at') check
(v0.26.9 utils.ts pattern). Genuine failures (pool exhaustion,
lock timeout, network blip) propagate instead of masquerading as
"no prefix match." tryFuzzyMatch retains the additional pg_trgm
operator-missing fallback.
D4 — dead-code cleanup
- LIMIT 5 → LIMIT 1 (only rows[0] was ever read).
- Collapsed the unreachable `if (rows.length === 1)` branch — the
multi-match path returned the same expression.
D5 — privacy scrub + full coverage matrix (codex P1)
- test/entity-resolve.test.ts seed rewritten with canonical
placeholders (alice-example, bob-example, charlie-example,
dave-example, acme-example, frank-aaa-example, ...). `bun run
check:privacy` and `check:test-names` now pass.
- 20 new tests beyond the original 13: companies/ prefix expansion,
links contribution to connection_count, identical-count tiebreak
(slug ASC), cross-source isolation, getPrefixExpansionDirs config
resolution (default + override + bad-entries fallback + funds/
end-to-end), and the IRON-RULE integration regression: stub-guard
blocks unprefixed slug → backstop path → DB row exists → no
markdown file created.
D6 — correlated subqueries replace full-table GROUP BYs (codex P2)
- tryPrefixExpansion's connection_count rewritten as three
correlated `(SELECT COUNT(*) ... WHERE col = p.id)` lookups. Hits
the existing idx_links_to, idx_links_from, idx_chunks_page indexes
instead of aggregating the entire links + content_chunks tables
per call. Cost now scales with the slug-LIKE match count (~1-3),
not brain size.
D7 — gbrain merge-phantoms operator command
- New src/commands/merge-phantoms.ts: scans for pre-v0.34.5 phantom
unprefixed entity pages (slug NOT LIKE '%/%' AND type IN
person/company/deal/topic/concept AND deleted_at IS NULL), runs
tryPrefixExpansion on each, repoints facts via UPDATE facts SET
entity_slug = canonical, soft-deletes the phantom via
engine.softDeletePage (v0.26.5 destructive-guard).
- Flags: --dry-run, --source <SOURCE_ID>, --json.
- Idempotent — second run produces zero merges.
- Wired into src/cli.ts as CLI_ONLY + CLI_ONLY_SELF_HELP. NOT
exposed as MCP op (destructive, operator-only).
- 10 tests in test/merge-phantoms.test.ts covering dry-run,
single-canonical merge, no-canonical skip, soft-deleted-target
skip, fact dedup, source-id scoping, idempotency, output shape.
- tryPrefixExpansion newly exported from resolve.ts so
merge-phantoms can run the prefix step in isolation without the
exact-slug short-circuit (phantom would otherwise exact-match
itself).
33 tests across the two test files all pass. `bun run verify`
clean (privacy + test-names + jsonb + source-id + progress +
test-isolation + wasm + admin-build + admin-scope-drift +
cli-exec + system-of-record + eval-glossary + typecheck).
Plan: ~/.claude/plans/mossy-popping-crown.md
Codex review: addressed all 3 findings (2 P1 + 1 P2) per the plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erge
Addresses the two new findings from `/codex review` post-implementation
(plan: mossy-popping-crown.md D8 + D9).
D8 (codex P1) — stub-guard fallback was producing legacy-shape rows
that tripped the v0.32.2 extract_facts reconciliation guard
- backstop.ts stub-guard branch no longer calls engine.insertFact.
Inserting facts with row_num=NULL + entity_slug=NOT NULL produced
rows shaped like pre-v0.32.2 legacy migration rows, and the v0.32.2
extract_facts cycle phase at src/core/cycle/extract-facts.ts:83
refuses to reconcile while any such rows exist. Net: one unknown
bare entity reference silently broke autopilot extract_facts forever.
- New src/core/facts/dropped-audit.ts appends a structured JSONL
entry to ~/.gbrain/facts.dropped.jsonl on each drop. Schema is
documented in the file header (ts/source_id/phantom_slug/reason/
fact/kind/notability/visibility/source/source_session). A future
`gbrain replay-dropped` tool can re-process the entries once the
canonical entity pages exist.
- Backstop emits a stderr warn on each drop so operators see the
cost in real time, not just in the audit file.
- Best-effort writes — JSONL append failures never throw back into
the facts pipeline; the fact is already lost, the audit log is a
recovery aid.
D9 (codex P2) — merge-phantoms must move source_markdown_slug + row_num,
not just entity_slug
- The pre-codex-review merge UPDATE only repointed entity_slug, leaving
source_markdown_slug pointing at the soft-deleted phantom. The next
extract_facts cycle phase that walked the canonical page would parse
the canonical's fence (which doesn't include the migrated facts) and
delete the migrated rows since they're keyed on the phantom's slug.
Net: merge was a 72h delayed data-loss event.
- merge-phantoms now re-fences the phantom's active facts into the
canonical page's `## Facts` fence via writeFactsToFence. The new
DB rows land with source_markdown_slug=canonical + proper row_num,
and the canonical's markdown file on disk is updated to contain
the migrated facts. Two delete passes wipe the old phantom rows
(deleteFactsForPage for post-v0.32.2 + raw DELETE for
NULL-source_markdown_slug pre-v0.32.2 rows).
- Skipped paths added: no_local_path (thin-client source can't
re-fence), fence_write_failed (defensive — canonical has a prefix
so stub-guard structurally can't fire).
- Skip-when-no-facts path: phantoms with zero active facts still
get soft-deleted (the page itself is dead weight; no facts to migrate).
Tests
- test/entity-resolve.test.ts integration regression updated: asserts
no DB row is created, no legacy-shape rows accumulate, and the
audit JSONL line is written. Uses withEnv(GBRAIN_HOME) to scope
the audit file to a tempdir.
- test/merge-phantoms.test.ts: 12 tests (up from 10), including a
new no_local_path skip test, an empty-phantom soft-delete test,
and an explicit "no legacy-shape rows after merge" regression.
Source-id scoping test now asserts source_markdown_slug also
moves correctly. The "fact entity_slug repoint does not multiply
rows" test was removed (replaced by the markdown-file-on-disk
assertion in the main re-fence test, which is stronger).
- tryPrefixExpansion exported from resolve.ts so merge-phantoms can
run the prefix step in isolation without resolveEntitySlug's
exact-slug short-circuit (phantom matches itself).
35 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md D8 + D9.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t, thin-client refusal
Addresses 3 P2 findings from the third /codex review pass (no P1s; gate
elevated to PASS). All three are coverage / routing gaps surfaced after
the D8 + D9 fixes.
P2#1 — match exact prefix slugs during expansion
- resolve.ts:tryPrefixExpansion was probing only `<dir>/<token>-%`
(hyphen-suffix). Bare "Acme" against a canonical `companies/acme`
(no hyphen) fell through to slugify → stub-guard → fact dropped
+ audited. The fix matches BOTH `<dir>/<token>` exactly AND
`<dir>/<token>-%` via `slug = $2 OR slug LIKE $3` in the same
correlated-subquery SELECT — the ORDER BY tiebreak still applies
across both candidate shapes.
P2#2 — include concepts/ in default dir list
- DEFAULT_PREFIX_EXPANSION_DIRS extended from
['people','companies','deals','topics'] to
['people','companies','deals','topics','concepts'].
- Concepts (`concepts/rag`, `concepts/agentic-workflows`, etc.) are
the canonical home for `type: concept` pages across gbrain docs
and example schemas. Bare-name references like "RAG" or "Bitcoin"
now resolve out of the box.
P2#3 — refuse merge-phantoms on thin-client installs
- merge-phantoms is destructive + DB-bound (UPDATE facts, soft-delete
pages, fence-write to local files). Adding it to CLI_ONLY alone
let thin-client configs pass the thin-client refusal guard and
hit a generic local-engine error.
- Added to THIN_CLIENT_REFUSED_COMMANDS + a pinpoint hint in
THIN_CLIENT_REFUSE_HINTS pointing at "run on the host machine."
Tests
- test/entity-resolve.test.ts adds 3 new tests: exact prefix match
via companies/glob, bare concept resolution via concepts/rag, and
a regression assertion that DEFAULT_PREFIX_EXPANSION_DIRS is
exactly ['people','companies','deals','topics','concepts'].
- Two new seed pages (`companies/glob`, `concepts/rag`) exercise the
new prefix-match shape.
38 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ry-run feasibility
Round-4 /codex review surfaced two issues with the merge-phantoms
re-fence path. Both are data-correctness / operator-trust issues that
land cleanly without scope expansion.
P1 — type-constrained prefix expansion in merge-phantoms
- Phantom `acme` of type=company was previously routed through the
global prefix expansion order, which puts `people/` first. A brain
with both `people/acme-example` (high chunk count, wrong type) AND
`companies/acme-example` (correct type) would re-fence the company
facts onto the person page, then soft-delete the phantom. Codex
flagged this as a data correctness P1 — cross-type collisions
silently corrupt the canonical pages.
- Fix: extended `tryPrefixExpansion` with an optional
`opts.dirs?: readonly string[]` that constrains the search to a
caller-provided subset. merge-phantoms passes a single-element
list derived from `row.type` via a new `TYPE_TO_DIR` constant
map (person→people, company→companies, deal→deals, topic→topics,
concept→concepts). Phantoms with unknown types skip cleanly.
P2 — dry-run feasibility-checks
- Dry-run previously reported `merged` for sources with NULL
`local_path` and facts to migrate; a real run would skip the same
phantoms with `no_local_path`. Destructive command's preview was
misleading for thin/no-filesystem source setups.
- Fix: the local_path feasibility check now runs BEFORE the
`opts.dryRun` short-circuit. The dry-run preview matches what
the real run would do, exactly.
Tests
- New cross-type test: phantom `acme` (type=company) with both
`people/acme-example` (100 chunks) and `companies/acme-example`
(3 chunks) seeded — asserts the fact lands on the company page,
NOT the higher-chunk-count person page.
- New dry-run-feasibility test: source with NULL local_path +
phantom with facts — asserts dry-run reports `no_local_path` in
`skipped`, not in `merged`.
40 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rmetic tests
Round-5 /codex review surfaced two P2 issues (no P1s, gate stays PASS).
P2#1 — protect non-phantom top-level entity pages
- A user-imported `rag.md` (type: concept) or `alice.md` (type: person)
with a real body would previously be classified as a phantom because
the slug lacks `/`, then soft-deleted after re-fencing only the
facts. The page's compiled_truth / body / links would be lost.
- Fix: `listPhantomEntityPages` now returns `body_chars` (length of
compiled_truth) alongside the row shape. The merge loop skips with
a new `not_a_stub` reason when `body_chars > 200`. The threshold is
conservative (stubs from `stubEntityPage` in fence-write.ts run
well under 100 chars; 200 leaves headroom for a stub that grew a
couple of small facts in its fence).
- New `PhantomMerge.skipped` variant: `not_a_stub`.
P2#2 — hermetic GBRAIN_HOME isolation in test/merge-phantoms.test.ts
- `writeFactsToFence` acquires page locks under
`gbrainPath('page-locks')`, which resolves to the user's real
`~/.gbrain/page-locks/` when `GBRAIN_HOME` isn't overridden. Tests
that didn't set GBRAIN_HOME wrote lockfiles into the developer's
real brain and would fail in hermetic / read-only-home CI runners
with EPERM.
- Fix: new `gbrainHome` tempdir in beforeAll plus a `withTestHome`
helper that wraps test bodies in `withEnv({ GBRAIN_HOME })`. New
`itHomed(name, body)` test wrapper applied to every test that
calls `runMergePhantomsCore`. Plain `it()` retained for the
listing test and the formatter test that don't touch page locks.
- Same fix applied to test/entity-resolve.test.ts's integration
regression test and the inverse "DOES create" assertion.
Tests
- New test: `skips a top-level page that has real content (not a
stub)` — seeds a 1300-char `rag` page at top level with a canonical
`concepts/rag-example` companion. Asserts merge skips with
`not_a_stub`, page survives intact with deleted_at=NULL.
41 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…content-only stub gate
Codex round-6 caught two issues with the round-4 + round-5 fixes that
collide with each other. Both findings are P2 (gate stays PASS) but
they're exactly the cases merge-phantoms exists to fix.
P2#1 — phantom.type is unreliable for routing
- Pre-fix `stubEntityPage` in fence-write.ts defaulted ALL unprefixed
entity stubs to `type: concept` (the fallback branch for unknown
prefix). A phantom `alice` that represents a real person therefore
has stored type='concept'. The round-4 type-constraint fix routed
that through `concepts/` only and missed the canonical
`people/alice-example`. The merge command silently skipped the
exact phantom shape it was built to fix.
- Resolution: search every configured prefix-expansion directory
rather than the phantom's type slot. Skip with a new `ambiguous`
reason when MORE THAN ONE directory has a candidate (e.g. both
`people/acme-*` AND `companies/acme-*`) — the operator decides
manually instead of auto-picking. Round-4's cross-type mismerge
risk is closed structurally by the ambiguity check, not the type
filter.
- New PhantomMerge fields: `skipped: 'ambiguous'`, `ambiguous_candidates: string[]`.
P2#2 — body_chars > 200 catches fact-bearing phantoms
- Round-5's threshold counted total `compiled_truth` length. A
pre-fix phantom that accumulated facts has the ## Facts fence
inlined into compiled_truth — markers + table header + a couple
of fact rows are well over 200 chars. The "is this a stub?" gate
fired on exactly the phantoms merge-phantoms should migrate.
- Resolution: new `stubBodyChars(compiled_truth)` helper extracts
only the chars that aren't frontmatter, H1 title, ## Facts fence,
or ## Timeline fence. A v0.34.5 stub returns 0 by this measure
regardless of how many facts accumulated; a real user-imported
page returns hundreds. Threshold tightened to 50 chars now that
fence content is excluded.
Tests
- Old "constrains merge to phantom entity type" test replaced with
"cross-directory ambiguity" — same seed (acme + people/+companies/
candidates) but the assertion now expects skip with `ambiguous`
and both candidates listed in `ambiguous_candidates`.
- New test "pre-fix phantom with type=concept still resolves to the
right people/ canonical" pins the round-6 P2 #1 fix.
- New test "fact-bearing phantom (long compiled_truth from fence)
still migrates" pins the round-6 P2 #2 fix — uses a 400+ char
compiled_truth that would have tripped the round-5 gate.
43 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rve Timeline
Codex round-7 caught two P2 issues with the round-6 fixes. No P1s; gate
stays PASS. Both are coverage gaps that the merge-phantoms cleanup
command can't solve on its own — they need fixes in resolveEntitySlug
and the stub-detection helper.
P2#1 — bare-name prefix expansion must run BEFORE exact / fuzzy
- Brains that still have a pre-fix phantom `alice.md` alongside the
canonical `people/alice-example.md` would previously have seen new
"Alice" references resolve via exact-match (slug='alice' matches
the phantom row) or fuzzy-match (title 'Alice' matches the phantom
by trgm). The fact would land in the phantom's fence and the
split would keep growing — the merge-phantoms command can clean
it up after the fact, but every new write recreates the problem.
- Fix: moved the `isBareName(trimmed) → tryPrefixExpansion` block
to step 1 in resolveEntitySlug, before exact-slug and fuzzy. For
bare single-token names, a prefixed canonical wins over the
unprefixed phantom. Multi-word / prefixed / hyphenated inputs
skip the layer structurally (`isBareName` returns false) so their
paths are unchanged.
P2#2 — preserve Timeline content in the stub gate
- stubBodyChars was stripping `## Timeline` fences along with
`## Facts`. A page with substantial Timeline but little prose
would slip past the not_a_stub gate, get re-fenced for facts
only, soft-deleted, and then hard-purged after 72h — losing the
Timeline entirely.
- Fix: stubBodyChars no longer strips Timeline. Facts stays
stripped (codex round-6 P2 #2 — fact-bearing phantoms have to
migrate). Timeline content now correctly trips not_a_stub.
Tests
- test/entity-resolve.test.ts: new test pinning that
resolveEntitySlug('alice') prefers `people/alice-example` over an
unprefixed phantom `alice-phantom-test` in the same source.
- test/merge-phantoms.test.ts: new test seeding a page with only a
populated ## Timeline (no real prose). Asserts merge skips with
`not_a_stub` and the page survives with its Timeline intact.
45 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l-page preservation
Codex round-8 caught two P2 issues where the rounds-6/7 fixes were too
aggressive. No P1s; gate stays PASS. Both fixes calibrate the
"is this a v0.34.5 stub?" detector so it doesn't sweep real
user-authored pages.
P2#1 — strip facts fence by MARKERS, not by heading
- The round-6 stubBodyChars regex stripped `## Facts[\s\S]*?(...)`
which means it ate any user-authored prose under a normal
`## Facts` heading, not just the canonical auto-generated
`<!-- facts --> ... <!-- /facts -->` fence. A legitimate page
with prose under a `## Facts` heading would return ≤50 chars,
classify as stub, get re-fenced for facts only, soft-deleted —
user prose gone.
- Fix: regex now strips ONLY the canonical `<!-- facts -->` ...
`<!-- /facts -->` fence pair (optionally with an immediately
paired `## Facts` heading). Prose under a non-fenced heading
survives in the body count and trips not_a_stub.
P2#2 — preserve exact bare-slug matches for real top-level pages
- The round-7 "prefix-first" change overrode exact bare slugs
unconditionally. A user with a legitimate top-level `rag` page
+ a `concepts/rag` page would see `resolveEntitySlug('rag')`
redirect to `concepts/rag`, sending new facts away from the
intentional bare page.
- Fix: step 1 of resolveEntitySlug now peeks the bare slug's
`compiled_truth` via a new `tryExactSlugBody` helper (3-value
return: 'missing' | body-string). Prefix expansion only fires
when the bare page is missing OR stub-shaped (via isStubBody).
Real pages keep their bare slug; only phantom-shaped pages get
overridden.
Refactor
- stubBodyChars + PHANTOM_STUB_MAX_BODY_CHARS moved from
src/commands/merge-phantoms.ts to src/core/entities/resolve.ts
so the stub-detection contract is shared by both the resolver
and the cleanup command. New exported `isStubBody` predicate
wraps the threshold. merge-phantoms.ts imports them.
Tests
- test/entity-resolve.test.ts:
- New test pinning that a real top-level `rag-real` page (with
a long body) is preserved as the exact slug even when a
`concepts/rag-real-example` canonical exists.
- Existing "prefers prefix expansion over phantom" test renamed
to explicit "STUB-shaped unprefixed phantom" to call out the
contract.
- test/merge-phantoms.test.ts:
- New test for a page with user-authored prose under a bare
`## Facts` heading (no machine markers). Asserts skip with
not_a_stub; page survives with prose intact.
47 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d fact paging
Codex round-9 caught two P1s (data-correctness, not edge cases) on the
round-6/8 fixes. Both fail the gate.
P1#1 — stubBodyChars matched fictional fence markers
- The round-6 regex stripped `<!-- facts --> ... <!-- /facts -->`,
but the actual fence markers used by writeFactsToFence (defined
in src/core/facts-fence.ts as FACTS_FENCE_BEGIN/END) are
`<!--- gbrain:facts:begin -->` / `<!--- gbrain:facts:end -->`
(triple-dash + namespaced markers). The regex never matched real
fences, so fact-bearing phantoms exceeded the 50-char threshold,
got classified as not_a_stub, and the merge-phantoms cleanup
silently skipped EXACTLY the rows it was built to migrate.
- Compounding effect: resolveEntitySlug's bare-name override
treats those phantoms as "real" pages too (their body chars > 50
after the broken strip), so new bare-name facts kept being
appended to the phantom forever.
- Fix: regex updated to match the canonical markers. The optional
preceding `## Facts` heading consumption still works. Added a
regression assertion (the new round-9 test seeds with the REAL
markers and would fail loudly if the regex drifts again).
P1#2 — listFactsByEntity clamp loses overflow facts
- The merge command read phantom facts via
`listFactsByEntity(..., { limit: 10_000 })`. The engine clamps
`limit` at MAX_SEARCH_LIMIT (100) via `clampSearchLimit`, so a
phantom with >100 facts silently returned only 100. The merge
re-fenced those 100, then `deleteFactsForPage` wiped ALL phantom
rows, then softDeletePage soft-deleted the page. Overflow facts
permanently lost.
- Fix: replace listFactsByEntity with a raw SQL select that
bypasses the clamp. Columns mirror FactRow so the mapping to
FenceInputFact below is unchanged. Returns ALL active phantom
rows in id-asc order.
Tests
- New test: fact-bearing phantom with REAL gbrain:facts markers
migrates correctly (pins the marker shape vs. resolve.ts regex).
- New test: phantom with 150 facts migrates ALL 150 (above the
100-clamp). Asserts `facts_moved === 150` and that the canonical
has 150 rows post-merge with 0 phantom rows remaining.
- Old round-6 test (used fictional `<!-- facts -->` markers) was
removed — superseded by the round-9 test with real markers.
48 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erge
When merge-phantoms migrates a phantom fact with an explicit
`valid_until` date (e.g. "Alice on sabbatical through 2026-08-31"),
the date must survive the re-fence so the canonical's row keeps the
same validity window. Pre-fix the FenceInputFact API dropped
valid_until entirely, so time-bound facts became indefinitely active
after migration.
Fix
- src/core/facts/fence-write.ts: FenceInputFact gains optional
`validUntil?: Date`. writeFactsToFence threads it through to
upsertFactRow as an ISO-date string (the fence row format).
- src/commands/merge-phantoms.ts: phantom-fact SELECT now reads
`valid_until` alongside the existing columns. The
FenceInputFact mapping passes it through (NULL → undefined).
extractFactsFromFenceText already honors `validUntil` from fence rows
when computing the DB row's valid_until column, so the chain is end-
to-end after this commit: phantom DB → SELECT → FenceInputFact →
fence row → re-parse → canonical DB.
Tests
- test/merge-phantoms.test.ts: new test seeds a phantom fact via
raw SQL with valid_until = '2026-08-31', runs the merge, and
asserts the canonical row preserves that exact date.
49 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex round-11 caught the final phantom-misclassification edge case:
the 50-char `PHANTOM_STUB_MAX_BODY_CHARS` threshold would still treat
terse-but-intentional pages as stubs. Example: `# RAG` plus a one-
sentence note under 50 chars would classify as stub-shape, get its
exact-slug overridden by prefix expansion in resolveEntitySlug, and
be eligible for soft-deletion in merge-phantoms.
Fix
- PHANTOM_STUB_MAX_BODY_CHARS lowered from 50 → 0. The new
contract: stub iff NOTHING remains after stripping frontmatter,
H1 title, and the canonical machine fence. Any user content
beyond that — even one character — makes the page real and
preserves the bare slug.
- stubEntityPage from fence-write.ts produces exactly
`# Title\n` after the frontmatter, which strips to "" under the
regex chain, so threshold=0 captures real stubs without false
positives.
Tests
- test/entity-resolve.test.ts: new test pinning that
resolveEntitySlug('rag-terse') returns 'rag-terse' (not the
canonical) when the bare page has just one sentence of content.
50 tests across the two files all pass. `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a phantom fact has a non-null `embedding`, `executeRaw` returns it
in a driver-specific shape: postgres-js returns the text literal
`"[0.1,0.2,...]"`, while PGLite returns a Float32Array directly. The
merge command previously passed the raw row through into
FenceInputFact, and `engine.insertFacts` then called `toPgVectorLiteral`
which throws on string input. Result: merge fails mid-flight on
Postgres for any phantom with embedded facts — markdown fence has
already been rewritten, DB hasn't, divergence.
Fix
- merge-phantoms.ts: column typed as `unknown` to match
driver variance.
- Mapping pipes `fr.embedding` through `tryParseEmbedding` from
src/core/utils.ts. Handles both shapes (Float32Array passes
through, string parses into Float32Array). Corrupt rows return
null + warn-once rather than throwing.
50 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex round-13 P2: `gbrain merge-phantoms` without `--source` was
hardcoded to literal 'default'. In multi-source brains where the
operator works in a non-default source (via GBRAIN_SOURCE env,
.gbrain-source dotfile, or cwd-relative source), running merge-phantoms
either reported a false "no phantoms" result OR mutated the wrong
source's data.
Fix
- CLI entry now calls `resolveSourceId(engine, explicitSource)` from
`src/core/source-resolver.ts`. That helper is the project's
canonical chain: `--source` flag → `GBRAIN_SOURCE` env →
`.gbrain-source` dotfile walk-up → CWD-local registered source
→ 'default'. Matches the pattern used by every other
source-scoped command.
50 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ruth post-merge
Codex round-14 P1: when merge-phantoms re-fences phantom facts into the
canonical page, writeFactsToFence updates the markdown file on disk
+ inserts new DB fact rows, BUT it does NOT update
`pages.compiled_truth`. The canonical's DB body stays stale.
Failure chain
1. Operator runs `gbrain merge-phantoms` — facts migrate to canonical
fence on disk; new DB rows have source_markdown_slug=canonical.
2. Next autopilot cycle's `sync` phase sees git HEAD unchanged and
reports up-to-date.
3. `extract_facts` cycle phase reads canonical body via
`engine.getPage()` — gets the STALE compiled_truth without the
migrated fence rows.
4. extract_facts.ts calls `deleteFactsForPage(canonical)` and wipes
the freshly-migrated DB rows.
5. User loses every fact merge-phantoms appeared to save.
Fix
- After writeFactsToFence, call `importFromFile(engine, canonicalPath,
relPath, {sourceId, noEmbed: true, forceRechunk: true})`. This
re-reads the markdown, refreshes compiled_truth + content_chunks
in lockstep, and keeps the next extract_facts cycle from wiping
the rows. `noEmbed: true` keeps the merge fast — the autopilot's
embed phase will catch up later.
- If the re-import fails (malformed markdown post-write), log + skip
with `fence_write_failed` and do NOT delete the phantom rows. The
operator can rerun the merge after fixing the file. The fence-
written DB rows that would otherwise become orphaned are still
valid; the next extract_facts cycle cleans them up via the
stale-body path.
50 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fore deleting phantom
Codex round-15 P2: importFromFile can reject the canonical re-import
via a normal ImportResult (status: 'skipped' | 'error' | 'invalid'
with an `error` field) without throwing. Common cases: rewritten
markdown over MAX_FILE_SIZE, frontmatter slug no longer matches path.
The round-14 fix only caught the throw path, so a non-throwing
rejection would silently fall through to the delete + soft-delete
section. compiled_truth stays stale → next extract_facts cycle wipes
the migrated DB rows. Migration loss without operator visibility.
Fix
- Capture `importStatus = await importFromFile(...)` from both the
happy path and the try/catch error path.
- After the call, gate the phantom-row deletion + soft-delete on
`importStatus.status === 'imported'`. Any other status — or a
null — falls into the same warn + skip + `fence_write_failed`
branch the throw case uses. Phantom rows survive so the operator
can rerun after fixing the markdown.
50 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ retry idempotency note Codex round-16 P2 #1 — Timeline-aware stub detection - listPhantomEntityPages now SELECTs `timeline` alongside compiled_truth. The not_a_stub gate sums stubBodyChars (which looks at compiled_truth minus the canonical fence) PLUS the timeline column's trimmed length. Any non-empty timeline content means the page is real and the merge skips with not_a_stub. - Pre-fix: a phantom-shaped compiled_truth (`# Alice\n`) combined with a populated `pages.timeline` column would slip past the gate, get re-fenced for facts, soft-deleted, and the timeline history would be lost when autopilot's purge phase hard-purged after 72h. Codex round-16 P2 #2 — retry idempotency documented - The fence_write_failed path leaves phantom rows intact so the operator can rerun. Codex flagged "duplicate active facts on rerun" as a concern. The chain is actually idempotent by construction: - writeFactsToFence → upsertFactRow updates fence rows by (claim, kind, source) key (not append). - engine.insertFacts dedups by the partial UNIQUE index on (source_id, source_markdown_slug, row_num). - The visible cost between the failed run and a successful rerun is that pages.compiled_truth stays stale, so the next autopilot extract_facts cycle MIGHT reconcile against the old body. The catch-block comment now documents this and tells the operator to rerun before the next cycle. Tests - New test pinning the round-16 P2 #1 fix: seed via raw SQL with a phantom-shaped compiled_truth + populated pages.timeline column. Asserts skip with not_a_stub and timeline survives intact. 51 tests pass; `bun run verify` clean. Plan: ~/.claude/plans/mossy-popping-crown.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…B before re-fence
Codex round-17 P2: when the canonical page exists in the DB (e.g.
created via MCP put_page on a source with local_path but never synced
to disk), `${localPath}/${canonical}.md` may be missing. The previous
merge code called writeFactsToFence unconditionally, which would
stub-create a NEW markdown file containing only the title. The
subsequent importFromFile would then OVERWRITE the canonical's
pages.compiled_truth with the stub body. Real canonical content
destroyed in the process of "saving" the merged facts.
Fix
- Before writeFactsToFence, check whether the canonical .md exists
on disk. If not AND a DB row for the canonical exists, read the
canonical's compiled_truth + timeline + frontmatter from the DB,
serialize via the canonical `serializeMarkdown` (same helper
every other write path uses), and write it to the expected path.
- writeFactsToFence then APPENDS facts to the materialized file
rather than stub-creating. importFromFile after writeFactsToFence
refreshes compiled_truth from the canonical file that now contains
BOTH the original body AND the migrated fence — no data loss.
- If the canonical DB row doesn't exist either, fall through (the
earlier `canonical_missing` skip already handles that case).
Tests
- New test seeds a DB-only canonical via engine.putPage with a
1000-char body, asserts the .md file doesn't exist pre-merge,
runs the merge, asserts the canonical body survives intact in
BOTH pages.compiled_truth AND the on-disk markdown.
52 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… real pages + resolver reads timeline
Codex round-18 caught a P1 + P2. Both are now stable across the
write-path AND the resolve-path.
P1 — writeFactsToFence dropped facts for legitimate DB-only bare pages
- When an unprefixed slug is referenced by extract_facts AND the
underlying .md file is missing on disk, the v0.34.5 stub-guard
fired and the backstop logged the fact to facts.dropped.jsonl
even when a real DB page existed for the slug (created via MCP
put_page on a source with local_path but never synced to disk).
Fact lost to disk; user has to recover from the audit log.
- Fix: before firing the stub-guard, writeFactsToFence calls
engine.getPage to check for an existing DB row. If found AND
its body looks real (non-stub compiled_truth OR populated
timeline), materialize it to disk via serializeMarkdown and let
the fence append normally. Prefixed-slug stub-create path
unchanged. Pure stub-guard path unchanged.
P2 — tryExactSlugBody missed the timeline column
- The resolver's bare-name-override gate (round-7) probes the bare
slug's body via tryExactSlugBody. That helper only read
compiled_truth, so a real top-level page whose substantive
content lived in pages.timeline looked stub-shaped, and the
resolver would override the bare slug with prefix expansion.
- Fix: tryExactSlugBody reads both compiled_truth + timeline and
concatenates them (`<body>\n\n## Timeline\n\n<timeline>`) so
`isStubBody` sees the timeline content. The check is symmetric
with the merge-phantoms not_a_stub gate (round-16 P2 #1).
Tests
- test/entity-resolve.test.ts: new test pinning the round-18 P2
(bare slug `tl-only-page` with populated timeline column +
canonical `concepts/tl-only-page-canonical` candidate; resolver
must return the bare slug).
- test/entity-resolve.test.ts: new test pinning the round-18 P1
(real DB-only bare page `real-bare-page` with no .md file;
writeFactsToFence materializes from DB and appends the fact
rather than firing stub-guard). Asserts both inserted=1 and the
resulting .md contains BOTH the original body AND the new fact.
54 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oft-delete
Codex round-19 P2: when the phantom page also exists on disk (e.g.
`alice.md` created by the v0.34.5-era writeFactsToFence path), the
merge soft-deleted the DB row but left the markdown file behind. The
next `gbrain sync` / `gbrain import` walk would re-import the file
and resurrect the phantom DB row after autopilot's purge phase
hard-deleted the soft-deleted row — undoing the entire cleanup and
re-opening the bare-name fact-split path that motivated this PR.
Fix
- After `softDeletePage` succeeds, the merge loop attempts to
unlink the phantom .md file via `unlinkSync(join(localPath,
row.slug + '.md'))`. Best-effort: the DB-row state is the
primary correctness contract, so a failed unlink logs a warn
asking the operator to clean up manually and the merge continues.
Soft-delete + file removal is the matched pair that makes the
cleanup durable across the next autopilot cycle.
Tests
- New test seeds a phantom WITH a corresponding .md file on disk
(mirroring what the v0.34.5-era writeFactsToFence produced),
runs the merge, asserts both the .md file is gone AND the DB
row is soft-deleted.
55 tests pass; `bun run verify` clean.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anonical on import fail Codex round-20 found two P2 issues. Both are cleanup/recovery bugs in merge-phantoms that the operator hits in real-world scenarios. P2 #1 — factless-phantom .md file lingers - The `if (facts_moved === 0)` early continue soft-deleted the DB row but bypassed the unlink-on-disk block at the loop tail. A factless phantom with a corresponding .md file would have its DB row purged after 72h, then the next sync re-imports the file and resurrects the phantom — undoing the round-19 cleanup for the factless case. - Fix: the factless branch now ALSO unlinks the phantom .md after softDeletePage. localPath is re-fetched in this branch (the earlier feasibility check only ran when facts_moved > 0). Best- effort: a thin-client source with localPath=null is a no-op; an unlink error logs a warn but doesn't fail the merge. P2 #2 — rollback canonical rows when re-import fails - The round-15/16 comments claimed rerun was idempotent. Codex re-read engine.insertFacts and pointed out it uses plain INSERTs under the partial UNIQUE index on (source_id, source_markdown_slug, row_num). A failed importFromFile leaves canonical fact rows in the DB; rerunning after fixing the markdown hits the UNIQUE constraint and aborts until the operator manually clears the orphans. Not actually idempotent. - Fix: capture `preMergeBody` (the canonical .md content before writeFactsToFence) and rollback on import failure: - DELETE FROM facts WHERE id = ANY(fenceResult.ids). - Restore the markdown file: writeFileSync(preMergeBody) if we read the file pre-merge; unlinkSync if we materialized it from the DB body (sync artifact, not user state). - Both wrapped in best-effort try/catch with warn-on-failure instructions for manual recovery. Tests: 55 pass; `bun run verify` clean. Plan: ~/.claude/plans/mossy-popping-crown.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ynced edits
Codex round-21 P2: when an operator edited the phantom .md on disk
(e.g. added prose to alice.md) but hasn't run `gbrain sync`,
pages.compiled_truth is stale. The merge command trusted the stub-
shaped DB body, classified as phantom, proceeded with re-fence +
soft-delete + unlink — and the unsynced disk edits were lost.
Fix
- When `local_path` is available, also read the on-disk phantom file
and run stubBodyChars against THAT content. If the disk content
is non-stub, skip with `not_a_stub` even when the DB body looks
stubby. Conservative posture on read errors (treat as non-stub
so the file survives).
- Net: the merge only proceeds when BOTH the DB body AND the disk
file (when present) look stub-shaped.
Tests: 56 pass; `bun run verify` clean. New test seeds the
stale-DB-row-with-edited-disk-file scenario and asserts the disk
content survives the merge.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… expansion
Codex round-22 P2: the round-7 "prefix-first" change was too greedy.
When a bare token is the TITLE of one entity (e.g. "Liz" on
people/elizabeth-example) AND a slug prefix of another
(people/liz-smith), the resolver routed via prefix expansion →
people/liz-smith. User actually meant Liz=Elizabeth, so facts/recall
attached to the wrong page.
Fix
- Prefix expansion only short-circuits exact/fuzzy when the bare
slug EXISTS AS A STUB (phantom). When it's 'missing' entirely,
we fall through to exact → fuzzy → catch-all prefix expansion.
Real bare pages still hit exact in step 2 unchanged.
- The catch-all step 4 still rescues bare names that have NO
matching slug AND NO strong fuzzy hit (e.g. Jared with no
`jared` page + low pg_trgm score) — they still route to
`people/jared-friedman` instead of slugifying to a phantom.
Tests: 57 pass; `bun run verify` clean. New regression test seeds the
Liz/Elizabeth scenario and asserts fuzzy wins.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hantom files
Codex round-24 P2: writeFactsToFence's stub-guard only fired on the
missing-file branch. A pre-v0.34.5 phantom .md file ALREADY on disk
would slip past the guard — writeFactsToFence read the existing
stub body and appended new facts to it. resolveEntitySlug can still
hand back an unprefixed slug when prefix expansion finds no
canonical, so future facts kept splitting onto the phantom.
Fix
- The existing-file branch now also checks: if the slug is
unprefixed AND the file body is stub-shaped (via isStubBody),
fire the same stub-guard. Facts go to facts.dropped.jsonl,
operator can recover via the audit log.
- Non-stub bare files (intentional pages) still get appended to —
consistent with the round-18 "real DB-only page" path.
Tests: 58 pass. New regression test seeds a stub-shaped phantom .md
on disk and asserts the stub-guard fires (stubGuardBlocked=true,
inserted=0).
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex round-25 P2: when pg_trgm is unavailable and the input is a
capitalized real-bare-page name (e.g. `Alice` with a real `alice`
page), the old code:
1. Bare-name override step recognized the bare slug was real
(non-stub), fell through.
2. Exact match on the capitalized input failed (case-sensitive).
3. Fuzzy returned null (no pg_trgm).
4. Catch-all prefix expansion routed to `people/alice-*`.
Net: a real top-level page got bypassed on engines without pg_trgm.
Fix
- When bareBody is non-missing AND non-stub, return the token
(the real bare slug) NOW instead of falling through. Eliminates
the fuzzy/prefix detour entirely for capitalized references to
real bare pages.
- Stub bareBody still falls through into prefix expansion as
before (the phantom-redirect case).
- 'missing' bareBody still falls through to exact/fuzzy/prefix as
before (the no-bare-page case from round-22).
Tests: 59 pass; `bun run verify` clean. New regression test pins
that `resolveEntitySlug('Realbare')` returns 'realbare' when a real
bare page exists, even with a competing prefix candidate.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ting-stub-file facts
Codex round-26 P2: the round-24 stub-guard for existing phantom .md
files trusted the disk body alone. If the same page was later updated
via put_page / MCP, pages.compiled_truth could carry real content
while the .md remained a stale stub. Facts for that legitimate bare
page were silently dropped to the audit log instead of landing on
the real DB body.
Fix
- When disk says stub, ALSO read pages.compiled_truth + timeline
via engine.getPage. Only drop when BOTH are stub-shaped.
- When the DB has real content but disk is stub, materialize the
DB body to disk (via serializeMarkdown) so the fence append
lands on the real body. The next read of compiled_truth sees the
full state.
Tests: 60 pass; `bun run verify` clean. New test seeds a real DB body
+ stale-stub .md on disk, runs writeFactsToFence, asserts the fact
landed AND the on-disk file was reconciled to contain the real body
plus the new fact.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ence facts
Codex round-27 P1: when DB has no active fact rows for a phantom but
the on-disk .md has a populated `## Facts` fence (crash between
writeFactsToFence renameSync and DB insert, manual fence edits before
extract_facts reconciles), the merge command's factless branch
treated it as empty, soft-deleted the row, and unlinked the file —
losing the fence facts permanently.
Fix
- Before treating as factless, parse the on-disk fence via
parseFactsFence. If any rows are active (not strikethrough,
not forgotten, not superseded), skip with a new `fence_drift`
reason instead of unlinking. The warn message points the
operator at `gbrain dream --phase extract_facts` to reconcile
the fence into the DB before re-running merge-phantoms.
- Read or parse failures on the on-disk file also fall into the
same skip branch (conservative — don't unlink a file we couldn't
inspect).
Tests: 61 pass; `bun run verify` clean. New test seeds a phantom
with a populated fence on disk but no DB rows, asserts the merge
skips with fence_drift and leaves both the file AND the DB row
intact.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… dry-run report
Codex round-28 P2: when --dry-run is used on a phantom with zero
active DB rows but active facts on the on-disk fence, the preview
reported it under WOULD MERGE with facts_moved=0. The real run later
skipped the same page as `fence_drift` to avoid data loss. Preview
mismatched reality on exactly the case the guard exists for.
Fix
- Lifted the fence-drift parse check out of the post-dry-run
factless branch and into a pre-dry-run gate. Both dry-run and
real runs see the same skip outcome.
- The `factlessLocalPath` lookup result is captured during the
pre-check and reused in the subsequent factless soft-delete +
unlink path, avoiding a duplicate `lookupSourceLocalPath` call.
Tests: 62 pass; `bun run verify` clean. New regression test pins
that dry-run reports skipped[].skipped='fence_drift' for the
unreconciled-fence scenario.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LL file-backed phantoms
Codex round-29 P1: the fence-drift check was gated on `facts_moved === 0`,
which meant a phantom with mixed state (1 DB row + 3 active fence rows
on disk) would migrate only the 1 DB row and unlink the file — losing
the 2 disk-only fact rows permanently. The patch's data-loss
prevention only covered the zero-DB-fact case.
Fix
- Drift check now runs for every file-backed phantom regardless of
DB-row count.
- Drift definition: `activeFenceCount > facts_moved` (disk fence
has more active rows than the DB knows about).
- Same skip path as round-27 — `fence_drift` skip, log message
points at extract_facts reconciliation.
Tests: 63 pass. New regression test seeds a 1-DB-row phantom plus a
3-active-fence-row .md, asserts merge skips with fence_drift, DB
row + file survive intact.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etection
Codex round-30 P2: the round-29 drift check was one-directional —
it only fired when disk fence had MORE active rows than the DB. The
reverse case (user struck out / forgot fence rows but extract_facts
hasn't reconciled yet, so DB still has stale active rows) slipped
through: the merge would copy the stale DB rows into the canonical
page, undoing the user's strikethroughs.
Fix
- Drift detection now fires when activeFenceCount !== facts_moved
in EITHER direction.
- Equal-count-but-different-content case: when counts match and
fence has active rows, check that every active fence
(claim, kind, source) tuple has a matching DB row. Missing
tuples → drift, skip.
- Files WITHOUT machine fence markers (legacy pre-v0.32.2
phantoms) skip the drift check entirely — they're DB-only by
construction, not drift.
Tests: 64 pass; `bun run verify` clean. New regression test seeds 3
DB rows + a fence with 1 active + 2 strikethrough rows on disk;
asserts the merge skips with fence_drift instead of resurrecting
the struck rows.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user direction after 30 rounds of codex review on the
merge-phantoms cleanup command. The command grew its own bug farm
because it duplicated the v0.32.2 extract_facts reconciliation
contract instead of reusing it. Codex kept finding real edge cases
in the parallel implementation; each fix exposed the next axis the
previous shape didn't cover.
Stripped
- src/commands/merge-phantoms.ts (600 lines, deleted)
- test/merge-phantoms.test.ts (31 tests, deleted)
- src/cli.ts entries: CLI_ONLY, CLI_ONLY_SELF_HELP,
THIN_CLIENT_REFUSED_COMMANDS, THIN_CLIENT_REFUSE_HINTS, the
dispatch case, and the help text line
- Unused `stubBodyChars` import in fence-write.ts (was only a
silencer for the merge-phantoms-era import shape)
Kept (independently valuable)
- src/core/entities/resolve.ts — prefix expansion + stub
detection + real-page preservation. 35 tests.
- src/core/facts/fence-write.ts — stub-guard for new + existing
phantom files, with DB-side materialization for legitimate
DB-only bare pages.
- src/core/facts/backstop.ts + dropped-audit.ts — drops
unresolvable facts to ~/.gbrain/facts.dropped.jsonl instead of
creating legacy-shape DB rows that trip the v0.32.2
reconciliation guard.
- entities.prefix_expansion_dirs config key.
- Exported helpers `tryPrefixExpansion`, `stubBodyChars`,
`isStubBody`, `PHANTOM_STUB_MAX_BODY_CHARS` — kept exported for
the future Option Beta implementation.
Documented
- docs/designs/MERGE_PHANTOMS.md captures: the full problem
statement, what the command was supposed to do, the round-by-
round cascade table of 30 codex findings, the meta-insight (the
fence is the system of record; the command was duplicating
extract_facts's job), three plausible alternative designs
(Alpha/Beta/Gamma), regression gotchas extracted from the
iteration as a checklist, and a "how to pick this up" section
for the future implementer.
6485 unit tests pass; `bun run verify` clean.
The resolver fix + stub-guard + backstop audit are ALL still in PR
#1010 and address the real bug that motivated the work (new bare-
name references no longer spawn phantom pages). Cleanup of pre-
existing phantoms is now deferred to a separate PR that should
start from the design doc.
Plan: ~/.claude/plans/mossy-popping-crown.md.
Design doc: docs/designs/MERGE_PHANTOMS.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Closed as duplicate of #1085 which is structurally cleaner (better observability via stub_guard_24h doctor check, supervisor crash classification). The 32-round codex iteration on this branch produced 4 P1 + ~40 P2 findings (catalogued in commit messages 'codex round-N') that may be useful as a regression checklist for #1085 — feel free to cherry-pick test cases. The MERGE_PHANTOMS.md design doc that captured the 30-round retrospective + three platonic-ideal alternative designs (Alpha report-only / Beta extract_facts hook / Gamma canonical_of schema) is being shipped as a separate docs-only PR. |
This was referenced May 17, 2026
garrytan
added a commit
that referenced
this pull request
May 18, 2026
) * feat(cycle): phantom-page redirect inside extract_facts (v0.35.8.0) Drains the existing pile of unprefixed entity pages (alice.md, acme.md) that pre-PR-#1010 routing left behind. Folds the cleanup into the existing extract_facts cycle phase via two new lossless engine primitives so the v0.32.2 reconciliation contract owns drift handling instead of a parallel implementation duplicating it. Layers: - engine: refreshPageBody + migrateFactsToCanonical on Postgres + PGLite - resolver: resolvePhantomCanonical + findPrefixCandidates (codex #1/#11) - orchestrator: src/core/cycle/phantom-redirect.ts + phantom-audit JSONL - cycle: sourceId/brainDir threaded; 3 new totals counters - tests: 38 unit + 6 parity + 4 E2E (48 total) pinning all 12 codex findings * fix(test): pin clock in sync_freshness boundary tests (CI flake) CI test (1) failed: `sync_freshness check > exact 72h boundary → warn`. The test set `last_sync_at = Date.now() - 72h`, then checkSyncFreshness called Date.now() again to compute ageMs. Between the two reads the clock advanced (0.43ms in this CI run, microseconds locally) which pushed ageMs above the strict 72h fail threshold and flipped the status from warn to fail. Same shape latent in the 24h boundary test — fixed both. Fix: - checkSyncFreshness gains an optional `opts.nowMs` test-only seam. Production callers omit it and get live wall-clock semantics. - Both boundary tests now capture nowMs once and thread it through both `last_sync_at` and the check, eliminating drift between reads. Verified deterministic: 10 consecutive runs of the 72h boundary test pass on this machine (was occasionally failing before).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Stops the bug class where bare-name references ("Alice", "Jared") silently spawned phantom unprefixed entity pages at the brain root. Three independent layers, all addressed.
Layer 1 — resolver prefix expansion
`src/core/entities/resolve.ts`
tryPrefixExpansionstep matches both<dir>/<token>exactly AND<dir>/<token>-%.people/alice-examplebefore exact-matching a leftover phantom.entities.prefix_expansion_dirsoverrides the defaults['people', 'companies', 'deals', 'topics', 'concepts'].Layer 2 — stub-creation guard
`src/core/facts/fence-write.ts`
Layer 3 — dropped-fact audit
`src/core/facts/backstop.ts` + `src/core/facts/dropped-audit.ts`
Tests
Review trail
Cleanup of existing phantoms — deferred
The plan-eng-review (D7) had added a `gbrain merge-phantoms` operator command to clean up the pile of pre-existing phantom pages. That implementation was built in this PR and codex-reviewed 30 rounds. Each round found a real bug, but the cumulative shape grew its own bug farm — the command was duplicating reconciliation logic that already exists in `src/core/cycle/extract-facts.ts`.
After consulting on cost / elegance, the merge-phantoms command was stripped from this PR. Full retrospective + three alternative-design options + regression checklist for a future implementer are in `docs/designs/MERGE_PHANTOMS.md`.
Origin
Moved from #1009 (closed) so CI gets base-repo secrets.