fix: detect and repair disconnected metadata branches at read/write time#533
fix: detect and repair disconnected metadata branches at read/write time#533
Conversation
When enable previously created an empty orphan entire/checkpoints/v1 and the remote later has real checkpoint data, detect the empty local branch and update it from the remote-tracking branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 15849a699702
When a user hit the empty-orphan bug and then worked normally, the local entire/checkpoints/v1 has real checkpoint data on a disconnected history from the remote. EnsureMetadataBranch now detects this using git merge-base and reconciles by merging both trees (checkpoint shards are unique, so no conflicts). Also handles the simpler case where local is behind remote (fast-forward). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: cee58fd2febc
…ite time Move detection and repair of disconnected local/remote metadata branches from EnsureMetadataBranch (enable) to a sync.Once-guarded reconciliation that runs when commands actually access the metadata branch (explain, rewind, resume, etc.). Uses cherry-pick strategy instead of merge to preserve linear commit history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 639c85c9977a
PR SummaryMedium Risk Overview Moves reconciliation to metadata access paths (once per process). Tightens Written by Cursor Bugbot for commit 86d7433. Configure here. |
There was a problem hiding this comment.
Pull request overview
This PR moves “disconnected metadata branch” detection/repair out of entire enable and into a read/write-time reconciliation path, aiming to automatically fix repos affected by the historical empty-orphan metadata branch bug while preserving linear history.
Changes:
- Adds
ReconcileDisconnectedMetadataBranch(+sync.Once-guardedEnsureMetadataReconciled) to detect disconnectedentire/checkpoints/v1local vsorigin/*and repair via cherry-picking. - Updates several read/write entrypoints to call reconciliation before accessing checkpoint metadata.
- Expands strategy tests to cover new metadata-branch behaviors and adds a dedicated reconciliation test suite.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/metadata_reconcile.go | Implements disconnected-branch detection and cherry-pick-based repair logic. |
| cmd/entire/cli/strategy/metadata_reconcile_test.go | Adds unit tests for reconciliation scenarios (no remote/local, shared ancestry, disconnected, multi-local commits). |
| cmd/entire/cli/strategy/common.go | Adds EnsureMetadataReconciled gate; adjusts EnsureMetadataBranch; calls reconcile before listing checkpoints. |
| cmd/entire/cli/strategy/common_test.go | Adds tests for updated EnsureMetadataBranch semantics and new helper cloneWithConfig. |
| cmd/entire/cli/strategy/manual_commit.go | Ensures reconciliation is invoked before initializing the checkpoint store. |
| cmd/entire/cli/explain.go | Ensures reconciliation is invoked before reading committed checkpoints for explain/rewind points. |
Only treat plumbing.ErrReferenceNotFound as "no branch" when checking local and remote metadata refs. Other errors (e.g. corrupt packfiles) are now surfaced instead of silently swallowed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 95aa08abfa40
git merge-base uses exit code 1 for "no common ancestor" and 128+ for actual errors (corrupt repo, invalid hash, timeout). Only treat exit code 1 as "disconnected"; surface other failures as errors to prevent spurious cherry-pick reconciliation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: e73dd6177226
…tions The previous additions-only diff missed modifications (e.g., multi-session condensation updating metadata.json) and deletions. Now computes the full diff between each commit and its parent: entries with changed hashes are applied, and entries removed in the commit are deleted from the tip tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 562308d98f2b
Previously, if isEmptyMetadataBranch returned an error (e.g., corrupt object, missing commit), EnsureMetadataBranch silently skipped the empty orphan check, potentially leaving the repo in a broken state. Now surfaces the error to callers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: fd02a6f99c6c
Hoist reconcileResult to package level so subsequent callers of EnsureMetadataReconciled see the same error, not nil. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: fc3459657b3d
|
bugbot run |
…l-flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 5d5f08442ce7
Add a test utility to reset the sync.Once reconciliation gate, matching the existing resetProtectedDirsForTest pattern. Prevents stale cached results from leaking across tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: c20cb099d45c
…e trees A corrupted tree object during data repair should abort reconciliation, not silently drop the commit and lose user data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 49496929adf3
A partial FlattenTree read on the parent would produce an incorrect delta (treating unread entries as deleted). Propagate all three error paths: parent commit, parent tree, and FlattenTree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: c2930cbb7765
Prevents silent data loss if the local metadata branch has more than 1000 commits — reconciliation aborts with an error instead of silently truncating the chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: e59e395232b3
The repo parameter was only used on the first call due to sync.Once, making the API misleading. Now opens its own repository internally, matching the ListCheckpoints pattern and simplifying all three callers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 6242e73a4234
Log at debug level when EnsureMetadataBranch detects a non-empty local branch that differs from remote, making it visible in logs that reconciliation was deferred to read/write time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: f6f13afacc0a
Move the debug log into an else clause so it only fires when the local metadata branch has real data that differs from remote. Previously it fell through after the empty-orphan update case, logging misleadingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 851f7418bdb5
The function was never called — no tests trigger EnsureMetadataReconciled through the sync.Once gate. Remove it and the comment referencing it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: f5b938b258c8
Manual testTested against a fresh repo with real remote checkpoints and a disconnected local metadata branch. Setup:
Result: Reconciliation detected the disconnected state, recognized the local branch had no real checkpoint data (just the empty orphan), reset to remote, and |
Manual testing (round 2 — full cherry-pick path)Test 1: Empty orphan reset (frog)Cloned repo with remote checkpoints. Test 2: Cherry-pick with real local checkpoints (frog4)Created a properly disconnected empty orphan (simulating the pre-#511 bug), then used Claude + stock Stock Fix binary — detects disconnected state, cherry-picks local onto remote: After reconciliation, switching to main shows all checkpoints: Both local and remote checkpoint data preserved. Reconciliation runs once, subsequent explains are instant. |
| refName := plumbing.NewBranchReferenceName(paths.MetadataBranchName) | ||
|
|
||
| // Check if remote-tracking branch exists (e.g., after clone/fetch) | ||
| remoteRefName := plumbing.NewRemoteReferenceName("origin", paths.MetadataBranchName) |
There was a problem hiding this comment.
this is probably a problem. origin is a convention. how should we expect to know what the correct upstream is? iterate through all remotes? pick one?
There was a problem hiding this comment.
seems like this is a common pattern. unlikely a problem for most github users, but it's an assumption we might need to not make in future
There was a problem hiding this comment.
My remote is never called origin 🙃 - it's usually "gh" if it's on GitHub
There was a problem hiding this comment.
so that origin pattern is spread around the cli: https://github.com/search?q=repo%3Aentireio%2Fcli+%5C%22origin%5C%22&type=code
is this a bigger problem?
khaong
left a comment
There was a problem hiding this comment.
Overall this is a well-designed repair mechanism for the disconnected metadata branch bug state. The cherry-pick approach is correct, the distinction between disconnected vs diverged is clear, and test coverage is thorough (7 tests covering no-ops, single/multi-commit cherry-picks, and modifications).
A few items worth discussing inline — the most impactful is the permanent error caching via sync.Once.
Review generated by Claude, checked by Alex.
| // branches. Safe to call multiple times — uses sync.Once internally. | ||
| // Opens its own repository to avoid capturing a caller's repo instance forever. | ||
| func EnsureMetadataReconciled() error { | ||
| reconcileOnce.Do(func() { |
There was a problem hiding this comment.
Permanent error caching concern: If ReconcileDisconnectedMetadataBranch fails due to a transient issue (e.g., git merge-base times out, repo temporarily locked), the error is cached permanently for the process lifetime via sync.Once. Every subsequent call returns the same stale error.
Since this runs in long-lived agent sessions, a transient failure during explain would prevent reconciliation from succeeding on a later rewind in the same session.
Worth considering whether a retry-on-error pattern would be more robust here — e.g., only cache on success, or use a sync.OnceFunc-style approach that retries on failure. At minimum, documenting this tradeoff in a comment would help.
There was a problem hiding this comment.
(this one I'm not so sure about, will follow up on Monday)
| // - 1: no common ancestor (disconnected) | ||
| // - 128+: error (corrupt repo, invalid hash, etc.) | ||
| func isDisconnected(repoPath, hashA, hashB string) (bool, error) { | ||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
Nit: ReconcileDisconnectedMetadataBranch creates its own context.Background() here, but callers like getBranchCheckpoints already have a ctx. Threading context through would allow cancellation to propagate and would be more idiomatic Go.
| if setErr := repo.Storer.SetReference(ref); setErr != nil { | ||
| return fmt.Errorf("failed to update metadata branch from remote: %w", setErr) | ||
| } | ||
| fmt.Fprintf(os.Stderr, "✓ Updated local branch '%s' from origin\n", paths.MetadataBranchName) |
There was a problem hiding this comment.
Inconsistent output style: This uses fmt.Fprintf(os.Stderr, "✓ Updated local branch...") while ReconcileDisconnectedMetadataBranch uses the [entire] prefix convention (e.g., [entire] Detected disconnected session metadata...). The ✓ emoji also doesn't match the rest of the CLI's stderr output. Consider using [entire] prefix here too for consistency.
| return nil | ||
| } | ||
|
|
||
| // isEmptyMetadataBranch returns true if the branch ref points to a commit with an empty tree. |
There was a problem hiding this comment.
This intentionally only checks the tip commit's tree. If someone had an empty orphan commit followed by a data commit, the tip wouldn't be empty and this wouldn't match — which is correct for the intended use case (the bug created a single empty orphan that's the tip). Worth a brief doc comment clarifying this only checks the tip, so future readers don't wonder about the multi-commit case.
| // createCherryPickCommit creates a new commit on top of parent, preserving the | ||
| // original commit's message and author. | ||
| func createCherryPickCommit(repo *git.Repository, treeHash, parent plumbing.Hash, original *object.Commit) (plumbing.Hash, error) { | ||
| authorName, authorEmail := GetGitAuthorFromRepo(repo) |
There was a problem hiding this comment.
Nit: authorName/authorEmail are misleading variable names since they're used for the Committer field, not Author (which correctly preserves the original). Consider renaming to committerName/committerEmail for clarity.
| } | ||
| }) | ||
|
|
||
| t.Run("updates empty orphan from remote", func(t *testing.T) { |
There was a problem hiding this comment.
Missing t.Parallel() — this subtest uses temp dirs and go-git plumbing so it should be safe to parallelize. Adding it keeps test execution fast.
Summary
Detects and repairs disconnected
entire/checkpoints/v1metadata branches — a state caused by the empty-orphan bug (fixed going forward by #511). Users who hit that bug and continued working have local checkpoint data on a completely disconnected history from remote (no common ancestor).Key distinction: Diverged (shared ancestor, unique commits on each side) is normal multi-dev flow handled by the push path's tree merge. Disconnected (no common ancestor) is the bug state that this PR repairs.
What changed
New: cherry-pick reconciliation (
strategy/metadata_reconcile.go)When local and remote share no common ancestor, local commits are cherry-picked onto the remote tip:
git merge-base(exit 0 = shared ancestry, exit 1 = disconnected)Checkpoint shards use unique paths (
<id[:2]>/<id[2:]>/), so cherry-picks always apply cleanly. All data is preserved with linear history.Reconciliation runs at read/write time, not
enabletimeRuns once per process (via
sync.Once) when commands actually access the metadata branch — explain, rewind, resume, clean, condensation. Three call sites:ManualCommitStrategy.getCheckpointStore()— covers strategy-mediated reads/writesgetBranchCheckpoints()inexplain.go— bypasses strategy, creates own GitStoreListCheckpoints()instrategy/common.go— opens own repo, reads branch directlyThis is the right place because
enablemay run before the user has fetched remote data.Simplified
EnsureMetadataBranch(runs duringenable)Now only handles:
Removed
syncMetadataBranchandisAncestorCLI— disconnected repair is now handled byEnsureMetadataReconciled.User-facing output
Files
strategy/metadata_reconcile.gostrategy/metadata_reconcile_test.gostrategy/common.goEnsureMetadataReconciled()gate; simplifiedEnsureMetadataBranch; removedsyncMetadataBranch/isAncestorCLIstrategy/manual_commit.gogetCheckpointStore()strategy/common_test.gocli/explain.gogetBranchCheckpoints()Test plan
mise run fmt && mise run lint && mise run test:cigo test ./cmd/entire/cli/strategy/ -run TestReconcileDisconnected -v— all 7 tests passentire explain, verify reconciliation and explain succeeds🤖 Generated with Claude Code