Skip to content

Add commit hook perf test with control baseline and scaling analysis#549

Open
evisdren wants to merge 4 commits intomainfrom
sessionPruning
Open

Add commit hook perf test with control baseline and scaling analysis#549
evisdren wants to merge 4 commits intomainfrom
sessionPruning

Conversation

@evisdren
Copy link
Contributor

@evisdren evisdren commented Feb 27, 2026

Summary

  • Rewrites commit_hook_perf_test.go to compare control commits (no Entire) against commits with hooks active across 100/200/500 sessions
  • Seeds 75% of ENDED sessions with shadow branch refs (no LastCheckpointID) to match production behavior, where most sessions have unconsumed checkpoint data
  • Uses full-history clone with 200 seeded branches, packed refs, and unique base commits per session for realistic ref scanning and object resolution overhead
  • Adds docs/architecture/commit-hook-perf-analysis.md documenting findings

Key findings

PostCommit condensation is the dominant cost, not ref scanning:

Scenario Sessions Control PrepareCommitMsg PostCommit Total Overhead
100 100 29ms 815ms 6.5s 7.3s
200 200 20ms 1.7s 14.6s 16.3s
500 500 29ms 4.4s 46.9s 51.3s

The 200-session result (16.3s) matches the real-world user report of ~16s for ~95 sessions, confirming the test methodology faithfully reproduces production overhead.

Cost breakdown per ENDED session (with shadow branch)

  • Condensation: ~30-50ms — tree building + commit on entire/checkpoints/v1 (dominant)
  • Ref lookups: ~2-4ms — 2-3 repo.Reference() calls across both hooks (packed-refs linear scan, no caching)
  • Content detection: ~2-5ms — transcript/overlap check
  • State I/O: ~0.5-1ms — JSON parse per session file

Highest-ROI optimizations

  1. Batch condensation — condense all sessions in one commit instead of N commits
  2. Session pruning — skip stale ENDED sessions during PostCommit
  3. Batch ref resolution — load all refs into a map for O(1) lookups
  4. Lazy condensation — defer to background process instead of blocking the commit

Test methodology evolution

Version 100 sess Per-session Issue
Shallow + shared base 1.74s ~18ms Packfile too small, repeated ref scan
Full history + shared base 2.00s ~21ms Same ref scanned N times
Full history + unique bases (cheap ENDED) 337ms ~3ms ENDED had LastCheckpointID → no-ops
Full history + realistic ENDED (current) 7.3s ~73ms Matches production

The critical fix was seeding 75% of ENDED sessions with shadow branch refs but no LastCheckpointID, forcing the full expensive path: ref lookup → commit/tree resolution → content detection → PostCommit condensation.

Test plan

  • go build -tags hookperf ./cmd/entire/cli/strategy/ compiles
  • go vet -tags hookperf ./cmd/entire/cli/strategy/ passes
  • go test -v -run TestCommitHookPerformance -tags hookperf -timeout 15m ./cmd/entire/cli/strategy/ passes with results matching real-world reports

🤖 Generated with Claude Code

Rewrites commit_hook_perf_test.go to compare control commits (no Entire)
against commits with hooks active across 100/200/500 sessions. Uses real
session templates from .git/entire-sessions/, seeds 200 branches with
packed refs for realistic ref scanning. Documents findings: ~18ms/session
linear scaling dominated by repo.Reference() calls in listAllSessionStates
and filterSessionsWithNewContent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: fd2fcba3de23
Copilot AI review requested due to automatic review settings February 27, 2026 20:10
@evisdren evisdren requested a review from a team as a code owner February 27, 2026 20:10
@cursor
Copy link

cursor bot commented Feb 27, 2026

PR Summary

Low Risk
Adds a test-only (build-tagged) performance benchmark plus documentation; no production logic changes, with the main risk being flaky/manual execution due to external git/GitHub and local .git/entire-sessions requirements.

Overview
Introduces a new build-tagged Go perf test (commit_hook_perf_test.go, //go:build hookperf) that benchmarks Entire’s prepare-commit-msg and post-commit hook overhead by comparing a baseline git commit (no Entire settings/hooks) vs. a commit with ManualCommitStrategy.PrepareCommitMsg + PostCommit across 100/200/500 seeded sessions in a locally-cloned repo.

Adds docs/architecture/commit-hook-perf-analysis.md summarizing measured results and attributing the linear per-session cost primarily to repeated go-git ref lookups (e.g., repo.Reference() in session listing/content checks), with a short list of suggested optimization directions.

Written by Cursor Bugbot for commit dfdf52a. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a reproducible (tagged) performance test and an accompanying analysis document to quantify and explain the overhead of Entire’s commit hooks as session count scales.

Changes:

  • Add hookperf-tagged Go test that measures control commits vs PrepareCommitMsg + PostCommit across multiple session counts, using seeded branches/packed refs and real session templates.
  • Add architecture documentation summarizing results and attributing dominant costs (notably repeated repo.Reference() calls), plus optimization opportunities.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
docs/architecture/commit-hook-perf-analysis.md Documents measured hook overhead, scaling behavior, and suspected hotspots/optimizations.
cmd/entire/cli/strategy/commit_hook_perf_test.go Implements the hookperf performance test harness (repo cloning, branch seeding, session seeding, timing).

Comment on lines 8 to 12
| Scenario | Sessions | Control | Prepare | PostCommit | Total | Overhead |
|----------|----------|---------|---------|------------|-------|----------|
| 100 | 100 | 18ms | 878ms | 867ms | 1.74s | 1.73s |
| 200 | 200 | 32ms | 1.85s | 1.74s | 3.59s | 3.56s |
| 500 | 500 | 30ms | 4.74s | 4.78s | 9.52s | 9.49s |
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown tables use a leading "||" on each row, which renders as an extra empty column in GitHub-flavored Markdown. Use a single leading pipe ("|") for the header and each row so the table renders correctly.

Copilot uses AI. Check for mistakes.
Comment on lines 22 to 28
| Call site | When | Per-session calls |
|-----------|------|-------------------|
| `listAllSessionStates()` (line 91) | Both hooks | 1× |
| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (line 1131) | PrepareCommitMsg | 1× |
| `postCommitProcessSession()` (line 840) | PostCommit | 1× |
| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) | 1× |

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call-site table also uses a leading "||" on each row, which will render an unintended empty first column. Switch to a single leading pipe per row for proper table formatting.

Copilot uses AI. Check for mistakes.
| `store.Load()` (JSON parse) | 1-2ms | 1× | 1-2ms |
| `tree.File()` traversal | 1-2ms | 1× | 1-2ms |
| Content overlap check | 3-5ms | 0-1× | 0-5ms |
| **Total** | | | **~14-24ms (avg ~18ms)** |
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost breakdown table uses a leading "||" on each row, which introduces an extra empty column in Markdown rendering. Use a single leading "|" for consistent table formatting.

Suggested change
| **Total** | | | **~14-24ms (avg ~18ms)** |
| **Total** | | | **~14-24ms (avg ~18ms)** |

Copilot uses AI. Check for mistakes.
Comment on lines 24 to 37
| `listAllSessionStates()` (line 91) | Both hooks | 1× |
| `filterSessionsWithNewContent()` → `sessionHasNewContent()` (line 1131) | PrepareCommitMsg | 1× |
| `postCommitProcessSession()` (line 840) | PostCommit | 1× |
| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) | 1× |

That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.

Note: PostCommit pre-resolves the shadow ref at line 840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at line 91 always does a fresh lookup for every session.

**Impact: ~8-10ms per session across both hooks combined.**

### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)

`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (line 1159):
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis references code locations as "(line N)" without naming the file, which makes the references ambiguous and brittle as the file changes. Include the file path (e.g., manual_commit_hooks.go:1131) alongside the line number(s).

Suggested change
| `listAllSessionStates()` (line 91) | Both hooks ||
| `filterSessionsWithNewContent()``sessionHasNewContent()` (line 1131) | PrepareCommitMsg ||
| `postCommitProcessSession()` (line 840) | PostCommit ||
| `sessionHasNewContent()` in PostCommit (line 1131) | PostCommit (non-ACTIVE) ||
That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.
Note: PostCommit pre-resolves the shadow ref at line 840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at line 91 always does a fresh lookup for every session.
**Impact: ~8-10ms per session across both hooks combined.**
### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)
`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (line 1159):
| `listAllSessionStates()` (manual_commit_hooks.go:91) | Both hooks ||
| `filterSessionsWithNewContent()``sessionHasNewContent()` (manual_commit_hooks.go:1131) | PrepareCommitMsg ||
| `postCommitProcessSession()` (manual_commit_hooks.go:840) | PostCommit ||
| `sessionHasNewContent()` in PostCommit (manual_commit_hooks.go:1131) | PostCommit (non-ACTIVE) ||
That's **2 calls per session in PrepareCommitMsg** and **2-3 in PostCommit**. Each call costs ~4-5ms because go-git iterates through refs rather than doing a hash-map lookup. With 200 packed branches, this is measurable.
Note: PostCommit pre-resolves the shadow ref at manual_commit_hooks.go:840 and passes `cachedShadowTree` to `sessionHasNewContent()`, so the second lookup is avoided for sessions that hit that path. But `listAllSessionStates()` at manual_commit_hooks.go:91 always does a fresh lookup for every session.
**Impact: ~8-10ms per session across both hooks combined.**
### 2. Transcript parsing — `countTranscriptItems()` (~2-3ms/session)
`sessionHasNewContent()` reads the transcript from the shadow branch tree and parses every JSONL line to count items (manual_commit_hooks.go:1159):

Copilot uses AI. Check for mistakes.
Comment on lines 250 to 253
// Time the commit.
start := time.Now()
gitRun(t, dir, "commit", "-m", "control commit (no Entire)")
return time.Since(start)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on git commit succeeding in freshly-cloned repos, but it never configures user.name / user.email (and may inherit commit.gpgsign from global config). To make the perf test reproducible, set local repo config (user.name, user.email, commit.gpgsign=false) or set GIT_AUTHOR_* / GIT_COMMITTER_* env vars before running commits.

Copilot uses AI. Check for mistakes.
Comment on lines +324 to +334
// gitRun executes a git command in the given directory and fails the test on error.
func gitRun(t *testing.T, dir string, args ...string) {
t.Helper()
//nolint:gosec // test-only helper
cmd := exec.Command("git", args...)
cmd.Dir = dir
out, err := cmd.CombinedOutput()
if err != nil {
t.Fatalf("git %s failed: %v\n%s", strings.Join(args, " "), err, out)
}
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gitRun inherits the developer machine's global/system git config, which can significantly skew timings or break commits (e.g., core.hooksPath, commit.template, commit.gpgsign, aliases). Consider running git subprocesses with an isolated environment (e.g., set GIT_CONFIG_GLOBAL=/dev/null and GIT_CONFIG_SYSTEM=/dev/null) so results are stable and comparable across machines.

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

overhead := (prepDur + postDur) - controlDur
if overhead < 0 {
overhead = 0
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overhead calculation incorrectly subtracts control commit time

Low Severity

The overhead is computed as (prepDur + postDur) - controlDur, but prepDur and postDur only measure hook execution time — they don't include any git commit time (the commit at line 125 is untimed). Since hooks run in addition to the commit, the actual overhead is simply prepDur + postDur. Subtracting controlDur underestimates overhead by ~20-30ms. The "Total+Hooks" column already shows the correct value, making "Overhead" redundant and slightly misleading.

Additional Locations (1)

Fix in Cursor Fix in Web

evisdren and others added 3 commits February 27, 2026 12:29
Shallow clone (--depth 1) produces a ~900KB packfile vs ~50-100MB for a
real repo, understating go-git object resolution costs by ~15%. Switch to
--single-branch (full history, one branch) to get a realistic packfile
while keeping clone time reasonable (~5s vs timeout on full clone).

Updated analysis doc with new numbers: ~21ms/session (was ~18ms).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 1c1c8fb25717
… test

Previous test used 12 templates with shared BaseCommit (HEAD), causing
listAllSessionStates to scan packed-refs for the same nonexistent shadow
branch ref hundreds of times — inflating per-session cost from ~3ms to
~21ms. Now each session gets a unique base commit from real repo history
(via git log walk), varied FilesTouched, diverse agent types, and unique
prompts. Drops template dependency entirely.

Results: ~3ms/session (was ~21ms), 500 sessions adds ~1.5s overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: de85e10839ec
The perf test was 50x too low because all ENDED sessions had
LastCheckpointID set (trivial no-ops). In production, ~75% of ENDED
sessions have shadow branches with data but NO LastCheckpointID,
exercising the full expensive path: ref lookup → commit/tree resolution
→ transcript/overlap check → PostCommit condensation.

Changes:
- Create alias shadow branch refs for 75% of ENDED sessions
- Add perfLargeFileSets (30-80 files) matching production FilesTouched sizes
- Include "perf_control.txt" in FilesTouched for staged-file overlap detection
- Update analysis doc with corrected numbers and condensation insights

Results now match real-world user report (~16s for ~95 sessions):
  100 sessions: 7.3s  (was 337ms)
  200 sessions: 16.3s (was 617ms)
  500 sessions: 51.4s (was 1.5s)

PostCommit condensation is the dominant cost (~50-80ms/session).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: da2c31e68843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants