Add commit hook perf test with control baseline and scaling analysis by evisdren · Pull Request #549 · entireio/cli

evisdren · 2026-02-27T20:10:25Z

Summary

Rewrites commit_hook_perf_test.go to compare control commits (no Entire) against commits with hooks active across 100/200/500 sessions
Seeds 75% of ENDED sessions with shadow branch refs (no LastCheckpointID) to match production behavior, where most sessions have unconsumed checkpoint data
Uses full-history clone with 200 seeded branches, packed refs, and unique base commits per session for realistic ref scanning and object resolution overhead
Adds docs/architecture/commit-hook-perf-analysis.md documenting findings

Key findings

PostCommit condensation is the dominant cost, not ref scanning:

Scenario	Sessions	Control	PrepareCommitMsg	PostCommit	Total Overhead
100	100	29ms	815ms	6.5s	7.3s
200	200	20ms	1.7s	14.6s	16.3s
500	500	29ms	4.4s	46.9s	51.3s

The 200-session result (16.3s) matches the real-world user report of ~16s for ~95 sessions, confirming the test methodology faithfully reproduces production overhead.

Cost breakdown per ENDED session (with shadow branch)

Condensation: ~30-50ms — tree building + commit on entire/checkpoints/v1 (dominant)
Ref lookups: ~2-4ms — 2-3 repo.Reference() calls across both hooks (packed-refs linear scan, no caching)
Content detection: ~2-5ms — transcript/overlap check
State I/O: ~0.5-1ms — JSON parse per session file

Highest-ROI optimizations

Batch condensation — condense all sessions in one commit instead of N commits
Session pruning — skip stale ENDED sessions during PostCommit
Batch ref resolution — load all refs into a map for O(1) lookups
Lazy condensation — defer to background process instead of blocking the commit

Test methodology evolution

Version	100 sess	Per-session	Issue
Shallow + shared base	1.74s	~18ms	Packfile too small, repeated ref scan
Full history + shared base	2.00s	~21ms	Same ref scanned N times
Full history + unique bases (cheap ENDED)	337ms	~3ms	ENDED had LastCheckpointID → no-ops
Full history + realistic ENDED (current)	7.3s	~73ms	Matches production

The critical fix was seeding 75% of ENDED sessions with shadow branch refs but no LastCheckpointID, forcing the full expensive path: ref lookup → commit/tree resolution → content detection → PostCommit condensation.

Test plan

go build -tags hookperf ./cmd/entire/cli/strategy/ compiles
go vet -tags hookperf ./cmd/entire/cli/strategy/ passes
go test -v -run TestCommitHookPerformance -tags hookperf -timeout 15m ./cmd/entire/cli/strategy/ passes with results matching real-world reports

🤖 Generated with Claude Code

Rewrites commit_hook_perf_test.go to compare control commits (no Entire) against commits with hooks active across 100/200/500 sessions. Uses real session templates from .git/entire-sessions/, seeds 200 branches with packed refs for realistic ref scanning. Documents findings: ~18ms/session linear scaling dominated by repo.Reference() calls in listAllSessionStates and filterSessionsWithNewContent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: fd2fcba3de23

cursor · 2026-02-27T20:10:30Z

PR Summary

Low Risk
Adds a test-only (build-tagged) performance benchmark plus documentation; no production logic changes, with the main risk being flaky/manual execution due to external git/GitHub and local .git/entire-sessions requirements.

Overview
Introduces a new build-tagged Go perf test (commit_hook_perf_test.go, //go:build hookperf) that benchmarks Entire’s prepare-commit-msg and post-commit hook overhead by comparing a baseline git commit (no Entire settings/hooks) vs. a commit with ManualCommitStrategy.PrepareCommitMsg + PostCommit across 100/200/500 seeded sessions in a locally-cloned repo.

Adds docs/architecture/commit-hook-perf-analysis.md summarizing measured results and attributing the linear per-session cost primarily to repeated go-git ref lookups (e.g., repo.Reference() in session listing/content checks), with a short list of suggested optimization directions.

^{Written by Cursor Bugbot for commit dfdf52a. Configure here.}

Copilot

Pull request overview

Adds a reproducible (tagged) performance test and an accompanying analysis document to quantify and explain the overhead of Entire’s commit hooks as session count scales.

Changes:

Add hookperf-tagged Go test that measures control commits vs PrepareCommitMsg + PostCommit across multiple session counts, using seeded branches/packed refs and real session templates.
Add architecture documentation summarizing results and attributing dominant costs (notably repeated repo.Reference() calls), plus optimization opportunities.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
docs/architecture/commit-hook-perf-analysis.md	Documents measured hook overhead, scaling behavior, and suspected hotspots/optimizations.
cmd/entire/cli/strategy/commit_hook_perf_test.go	Implements the `hookperf` performance test harness (repo cloning, branch seeding, session seeding, timing).

Copilot · 2026-02-27T20:15:37Z

docs/architecture/commit-hook-perf-analysis.md

+| Scenario | Sessions | Control | Prepare | PostCommit | Total | Overhead |
+|----------|----------|---------|---------|------------|-------|----------|
+| 100      | 100      | 18ms    | 878ms   | 867ms      | 1.74s | 1.73s    |
+| 200      | 200      | 32ms    | 1.85s   | 1.74s      | 3.59s | 3.56s    |
+| 500      | 500      | 30ms    | 4.74s   | 4.78s      | 9.52s | 9.49s    |


Markdown tables use a leading "||" on each row, which renders as an extra empty column in GitHub-flavored Markdown. Use a single leading pipe ("|") for the header and each row so the table renders correctly.

Copilot · 2026-02-27T20:15:38Z

docs/architecture/commit-hook-perf-analysis.md

+| Call site | When | Per-session calls |
+|-----------|------|-------------------|
+| `listAllSessionStates()` (line 91) | Both hooks | 1× |
+| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (line 1131) | PrepareCommitMsg | 1× |
+| `postCommitProcessSession()` (line 840) | PostCommit | 1× |
+| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) | 1× |
+


This call-site table also uses a leading "||" on each row, which will render an unintended empty first column. Switch to a single leading pipe per row for proper table formatting.

Copilot · 2026-02-27T20:15:38Z

docs/architecture/commit-hook-perf-analysis.md

+| `store.Load()` (JSON parse) | 1-2ms | 1× | 1-2ms |
+| `tree.File()` traversal | 1-2ms | 1× | 1-2ms |
+| Content overlap check | 3-5ms | 0-1× | 0-5ms |
+| **Total** | | | **~14-24ms (avg ~18ms)** |


The cost breakdown table uses a leading "||" on each row, which introduces an extra empty column in Markdown rendering. Use a single leading "|" for consistent table formatting.

Suggested change

| **Total** | | | **~14-24ms (avg ~18ms)** |

| **Total** | — | — | **~14-24ms (avg ~18ms)** |

Copilot · 2026-02-27T20:15:38Z

docs/architecture/commit-hook-perf-analysis.md

+| `listAllSessionStates()` (line 91) | Both hooks | 1× |
+| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (line 1131) | PrepareCommitMsg | 1× |
+| `postCommitProcessSession()` (line 840) | PostCommit | 1× |
+| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) | 1× |
+
+That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.
+
+Note: PostCommit pre-resolves the shadow ref at line 840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at line 91 always does a fresh lookup for every session.
+
+**Impact: ~8-10ms per session across both hooks combined.**
+
+### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)
+
+`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (line 1159):


The analysis references code locations as "(line N)" without naming the file, which makes the references ambiguous and brittle as the file changes. Include the file path (e.g., manual_commit_hooks.go:1131) alongside the line number(s).

Suggested change

| `listAllSessionStates()` (line 91) | Both hooks | 1× |

| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (line 1131) | PrepareCommitMsg | 1× |

| `postCommitProcessSession()` (line 840) | PostCommit | 1× |

| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) | 1× |

That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.

Note: PostCommit pre-resolves the shadow ref at line 840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at line 91 always does a fresh lookup for every session.

**Impact: ~8-10ms per session across both hooks combined.**

### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)

`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (line 1159):

| `listAllSessionStates()` (manual_commit_hooks.go:91) | Both hooks | 1× |

| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (manual_commit_hooks.go:1131) | PrepareCommitMsg | 1× |

| `postCommitProcessSession()` (manual_commit_hooks.go:840) | PostCommit | 1× |

| `sessionHasNewContent()` in PostCommit (manual_commit_hooks.go:1131) | PostCommit (non-ACTIVE) | 1× |

That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.

Note: PostCommit pre-resolves the shadow ref at manual_commit_hooks.go:840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at manual_commit_hooks.go:91 always does a fresh lookup for every session.

**Impact: ~8-10ms per session across both hooks combined.**

### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)

`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (manual_commit_hooks.go:1159):

Copilot · 2026-02-27T20:15:39Z

cmd/entire/cli/strategy/commit_hook_perf_test.go

+	// Time the commit.
+	start := time.Now()
+	gitRun(t, dir, "commit", "-m", "control commit (no Entire)")
+	return time.Since(start)


This test relies on git commit succeeding in freshly-cloned repos, but it never configures user.name / user.email (and may inherit commit.gpgsign from global config). To make the perf test reproducible, set local repo config (user.name, user.email, commit.gpgsign=false) or set GIT_AUTHOR_* / GIT_COMMITTER_* env vars before running commits.

Copilot · 2026-02-27T20:15:39Z

cmd/entire/cli/strategy/commit_hook_perf_test.go

+// gitRun executes a git command in the given directory and fails the test on error.
+func gitRun(t *testing.T, dir string, args ...string) {
+	t.Helper()
+	//nolint:gosec // test-only helper
+	cmd := exec.Command("git", args...)
+	cmd.Dir = dir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("git %s failed: %v\n%s", strings.Join(args, " "), err, out)
+	}
+}


gitRun inherits the developer machine's global/system git config, which can significantly skew timings or break commits (e.g., core.hooksPath, commit.template, commit.gpgsign, aliases). Consider running git subprocesses with an isolated environment (e.g., set GIT_CONFIG_GLOBAL=/dev/null and GIT_CONFIG_SYSTEM=/dev/null) so results are stable and comparable across machines.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-02-27T20:22:06Z

cmd/entire/cli/strategy/commit_hook_perf_test.go

+			overhead := (prepDur + postDur) - controlDur
+			if overhead < 0 {
+				overhead = 0
+			}


Overhead calculation incorrectly subtracts control commit time

Low Severity

The overhead is computed as (prepDur + postDur) - controlDur, but prepDur and postDur only measure hook execution time — they don't include any git commit time (the commit at line 125 is untimed). Since hooks run in addition to the commit, the actual overhead is simply prepDur + postDur. Subtracting controlDur underestimates overhead by ~20-30ms. The "Total+Hooks" column already shows the correct value, making "Overhead" redundant and slightly misleading.

Additional Locations (1)

cmd/entire/cli/strategy/commit_hook_perf_test.go#L169-L174

Shallow clone (--depth 1) produces a ~900KB packfile vs ~50-100MB for a real repo, understating go-git object resolution costs by ~15%. Switch to --single-branch (full history, one branch) to get a realistic packfile while keeping clone time reasonable (~5s vs timeout on full clone). Updated analysis doc with new numbers: ~21ms/session (was ~18ms). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1c1c8fb25717

… test Previous test used 12 templates with shared BaseCommit (HEAD), causing listAllSessionStates to scan packed-refs for the same nonexistent shadow branch ref hundreds of times — inflating per-session cost from ~3ms to ~21ms. Now each session gets a unique base commit from real repo history (via git log walk), varied FilesTouched, diverse agent types, and unique prompts. Drops template dependency entirely. Results: ~3ms/session (was ~21ms), 500 sessions adds ~1.5s overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: de85e10839ec

The perf test was 50x too low because all ENDED sessions had LastCheckpointID set (trivial no-ops). In production, ~75% of ENDED sessions have shadow branches with data but NO LastCheckpointID, exercising the full expensive path: ref lookup → commit/tree resolution → transcript/overlap check → PostCommit condensation. Changes: - Create alias shadow branch refs for 75% of ENDED sessions - Add perfLargeFileSets (30-80 files) matching production FilesTouched sizes - Include "perf_control.txt" in FilesTouched for staged-file overlap detection - Update analysis doc with corrected numbers and condensation insights Results now match real-world user report (~16s for ~95 sessions): 100 sessions: 7.3s (was 337ms) 200 sessions: 16.3s (was 617ms) 500 sessions: 51.4s (was 1.5s) PostCommit condensation is the dominant cost (~50-80ms/session). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: da2c31e68843

Copilot AI review requested due to automatic review settings February 27, 2026 20:10

evisdren requested a review from a team as a code owner February 27, 2026 20:10

Copilot started reviewing on behalf of evisdren February 27, 2026 20:10 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

cursor bot reviewed Feb 27, 2026

View reviewed changes

evisdren and others added 3 commits February 27, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add commit hook perf test with control baseline and scaling analysis#549

Add commit hook perf test with control baseline and scaling analysis#549
evisdren wants to merge 4 commits intomainfrom
sessionPruning

evisdren commented Feb 27, 2026 •

edited

Loading

Uh oh!

cursor bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

	\| Total \| \| \| ~14-24ms (avg ~18ms) \|
	\| Total \| — \| — \| ~14-24ms (avg ~18ms) \|

Conversation

evisdren commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key findings

Cost breakdown per ENDED session (with shadow branch)

Highest-ROI optimizations

Test methodology evolution

Test plan

Uh oh!

cursor bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Overhead calculation incorrectly subtracts control commit time

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

evisdren commented Feb 27, 2026 •

edited

Loading

cursor bot commented Feb 27, 2026 •

edited

Loading