Skip to content

v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts#1138

Merged
garrytan merged 3 commits into
masterfrom
garrytan/copenhagen-v5
May 18, 2026
Merged

v0.35.8.0 feat(cycle): phantom-page redirect inside extract_facts#1138
garrytan merged 3 commits into
masterfrom
garrytan/copenhagen-v5

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

Drains the existing pile of unprefixed entity pages (alice.md, acme.md) that pre-PR-#1010 routing left behind. Folds the cleanup into the existing extract_facts cycle phase via two new lossless engine primitives so the v0.32.2 reconciliation contract owns drift handling instead of a parallel implementation duplicating it.

The previous attempt (gbrain merge-phantoms) was scrapped after 30 codex review rounds because it built a parallel reconcile path. This release picks up that scrapped work via Option Beta: piggyback on the existing reconcile loop, add two narrow primitives the existing path can't do losslessly, and let extract_facts own drift.

Layers:

  • Engine: refreshPageBody + migrateFactsToCanonical on Postgres + PGLite (parity tests)
  • Resolver: resolvePhantomCanonical (codex feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1: bypasses exact-self-match) + findPrefixCandidates (codex Feature request: Org-mode (.org) ingestion and sync support #11: surfaces ambiguity)
  • Orchestrator: src/core/cycle/phantom-redirect.ts + src/core/facts/phantom-audit.ts (JSONL audit, ISO-week rotation)
  • Cycle wiring: sourceId + brainDir threaded through runPhaseExtractFacts; pre-pass runs after legacy-row guard, before main reconcile; 3 new totals counters
  • Capped at 50 phantoms per cycle (configurable via GBRAIN_PHANTOM_REDIRECT_LIMIT); writer-lock acquired once per pass with 30s bounded retry

Test Coverage

48 new tests pinning all 12 codex findings + 8 cascade-table regression rounds:

[+] src/core/cycle/phantom-redirect.ts (new, 476 lines)
  ├── tryRedirectPhantom()
  │   ├── [★★★ TESTED] happy path (alice → people/alice-example)
  │   ├── [★★★ TESTED] codex #2 body-shape gate (real top-level page rejected)
  │   ├── [★★★ TESTED] D5 ambiguous canonical → audit + skip
  │   ├── [★★★ TESTED] no_canonical → audit + skip
  │   ├── [★★★ TESTED] D10 dry-run → preview only
  │   ├── [★★★ TESTED] round 17 DB-only canonical materialized via serializeMarkdown
  │   ├── [★★★ TESTED] round 14 + codex #7 content_hash refresh; cycle-2 no-op
  │   ├── [★★★ TESTED] codex #3/#12 lossless metadata + dedup on canonical
  │   ├── [★★★ TESTED] codex #4 idempotency: re-run returns migrated=0
  │   ├── [★★★ TESTED] round 19/20 .md unlinked + soft-deleted
  │   ├── [★★★ TESTED] round 9 three-dash markers survive round-trip
  │   └── [★★★ TESTED] codex #12 disk dedup (claim+valid_from)
  └── runPhantomRedirectPass()
      ├── [★★★ TESTED] zero phantoms → empty counters
      ├── [★★★ TESTED] mixed outcomes counted independently
      ├── [★★★ TESTED] audit log captures every outcome
      ├── [★★★ TESTED] P1 cap via GBRAIN_PHANTOM_REDIRECT_LIMIT
      └── [★★★ TESTED] C4 lock contention surfaced

[+] BrainEngine.refreshPageBody (parity Postgres + PGLite)
  ├── [★★★ TESTED] updates compiled_truth + timeline + content_hash
  ├── [★★★ TESTED] no-op when slug is soft-deleted
  └── [★★★ TESTED] source-scoped — doesn't touch other sources

[+] BrainEngine.migrateFactsToCanonical (parity Postgres + PGLite)
  ├── [★★★ TESTED] moves active facts, preserving every column
  ├── [★★★ TESTED] idempotent re-run returns migrated=0
  └── [★★★ TESTED] skips expired rows (audit trail preserved)

[+] resolve.ts: resolvePhantomCanonical + findPrefixCandidates
  ├── [★★★ TESTED] codex #1 bypasses exact-self-match
  ├── [★★★ TESTED] codex #11 multi-dir ambiguity surfaced
  └── [★★★ TESTED] false-positive guard: people/aliceberg ≠ token=alice

[+] E2E (real Postgres)
  ├── [★★★ TESTED] bulk-pile cycle with cap=50, more_pending=true
  ├── [★★★ TESTED] steady-state no-op
  ├── [★★★ TESTED] concurrent-sync lock-busy seal
  └── [★★★ TESTED] round 12: postgres-js text-string embedding survives migration

COVERAGE: 48/48 paths tested (100%)
Tests: 6823 → 6871 (+48 new across 3 files)

Pre-Landing Review

Self-reviewed via /plan-eng-review with codex outside-voice. All 12 codex findings incorporated pre-implementation (see plan file at ~/.claude/plans/system-instruction-you-are-working-snoopy-lantern.md for the full decision trail).

Plan Completion

Every plan item from the Section 1-4 + codex round decision table is implemented:

  • D1: redirect inside extract_facts loop ✓
  • D2: strict zero-residue body-shape gate ✓
  • D3: refreshPageBody with content_hash ✓ (codex feat: GBrain v0.3.0 — contract-first architecture + ClawHub plugin #7)
  • D4: single code path for disk + DB-only phantoms ✓
  • D5: ambiguous → skip + audit ✓
  • D6: rewriteLinks before soft-delete ✓
  • D7: 3 new totals counters ✓
  • D8: doctor check deferred to follow-up ✓ (intentional)
  • D9: separate audit file ✓
  • D10: dry-run honored ✓
  • D11: resolveEntitySlug used (fuzzy match catches Liz/Elizabeth) ✓
  • D12: phantom-first iteration order ✓ (via SQL ORDER BY slug ASC + phantom-pass-then-main-loop)
  • D13: sibling findPrefixCandidates ✓

Known limitations

  • Wiki-link text rewrite: [[alice]] references in other pages' markdown bodies still point at the phantom slug. Follow-up PR; preventive resolver in PR fix(entities): prefix-expansion resolver + stub-guard + dropped-fact audit #1010 ensures new writes go to canonical.
  • Unified write-path lock: gbrain-sync serializes redirect vs performSync but doesn't cover MCP put_page / facts queue / direct writeFactsToFence. Best-effort; follow-up design pass to widen scope.

Test plan

  • Unit tests pass (6871 tests across all shards)
  • Phantom-redirect tests pass (48 tests: 38 unit + 6 parity + 4 E2E)
  • Engine parity tests pass on both Postgres + PGLite
  • Real-Postgres E2E pass (bulk-pile + lock contention + round-12 embedding)
  • CI version-gate accepts v0.35.8.0 (next free slot above master's v0.35.7.0)

🤖 Generated with Claude Code

garrytan added 3 commits May 17, 2026 19:00
Drains the existing pile of unprefixed entity pages (alice.md, acme.md)
that pre-PR-#1010 routing left behind. Folds the cleanup into the existing
extract_facts cycle phase via two new lossless engine primitives so the
v0.32.2 reconciliation contract owns drift handling instead of a parallel
implementation duplicating it.

Layers:
- engine: refreshPageBody + migrateFactsToCanonical on Postgres + PGLite
- resolver: resolvePhantomCanonical + findPrefixCandidates (codex #1/#11)
- orchestrator: src/core/cycle/phantom-redirect.ts + phantom-audit JSONL
- cycle: sourceId/brainDir threaded; 3 new totals counters
- tests: 38 unit + 6 parity + 4 E2E (48 total) pinning all 12 codex findings
…v0.35.5.1)

Resolved version trio to v0.35.8.0 (next free slot above master's v0.35.7.0).
Resolved CHANGELOG by keeping v0.35.8.0 entry above master's v0.35.7.0,
v0.35.6.0, and v0.35.5.1 entries (per CLAUDE.md merge-recovery procedure).
Resolved extract-facts.ts imports by taking BOTH: master's gateway import
(embed + isAvailable) and the phantom-redirect imports — they're complementary,
the gateway import is for the consolidate phase, the phantom-redirect imports
are for the v0.35.8.0 pre-pass.

Regenerated llms-full.txt against the merged CLAUDE.md.
CI test (1) failed: `sync_freshness check > exact 72h boundary → warn`.
The test set `last_sync_at = Date.now() - 72h`, then checkSyncFreshness
called Date.now() again to compute ageMs. Between the two reads the
clock advanced (0.43ms in this CI run, microseconds locally) which
pushed ageMs above the strict 72h fail threshold and flipped the
status from warn to fail.

Same shape latent in the 24h boundary test — fixed both.

Fix:
- checkSyncFreshness gains an optional `opts.nowMs` test-only seam.
  Production callers omit it and get live wall-clock semantics.
- Both boundary tests now capture nowMs once and thread it through
  both `last_sync_at` and the check, eliminating drift between reads.

Verified deterministic: 10 consecutive runs of the 72h boundary test
pass on this machine (was occasionally failing before).
@garrytan garrytan merged commit 61b79e7 into master May 18, 2026
7 checks passed
garrytan added a commit that referenced this pull request May 18, 2026
Master shipped v0.35.8.0 (autopilot phantom-page redirect inside
extract_facts, #1138) ahead of this branch. VERSION trio kept at
0.36.1.0 since this branch's slot is already higher than master's
new tag. CHANGELOG carries both v0.36.1.0 (top) and v0.35.8.0
entries; llms-full.txt regenerated.

src/core/cycle.ts and src/commands/doctor.ts auto-merged cleanly
(both branches added separate sections). Test gate green: 195/195
on cycle.serial + migrate + doctor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant