Skip to content

Add zstd compression for JSONL transcript storage#514

Open
evisdren wants to merge 3 commits intomainfrom
worktree-cached-spinning-nebula
Open

Add zstd compression for JSONL transcript storage#514
evisdren wants to merge 3 commits intomainfrom
worktree-cached-spinning-nebula

Conversation

@evisdren
Copy link
Contributor

Summary

  • Add zstd compression to JSONL transcripts before storing as git blobs, reducing object sizes 10-15x and dramatically improving git push times for the entire/checkpoints/v1 branch
  • New compression package with Compress/Decompress helpers using klauspost/compress/zstd
  • All write paths (committed + shadow branch) now compress transcripts; all read paths try compressed first with uncompressed fallback for backward compatibility
  • New entire optimize [--apply] migration command to rewrite existing uncompressed data on the metadata branch using ApplyTreeChanges tree surgery
  • Compression and storage benchmarks for throughput, ratio, and simulated push payload

Changes

New files:

  • compression/zstd.go — Compress/Decompress/IsCompressedName/CompressedName
  • compression/zstd_test.go — round-trip, empty, large, concurrent, invalid data tests
  • compression/zstd_bench_test.go — benchmarks at 1KB–25MB
  • checkpoint/compression_bench_test.go — end-to-end write, read, push payload, migration benchmarks
  • optimize.goentire optimize [--apply] command (dry-run by default)

Modified files:

  • paths/paths.goTranscriptCompressedFileName, CompressedSuffix constants
  • agent/chunking.goChunkCompressed/ReassembleCompressed for raw byte splitting
  • checkpoint/committed.go — compressed writes/reads for transcripts + subagents
  • checkpoint/temporary.go — compressed shadow branch writes + fallback reads
  • strategy/common.go, manual_commit_condensation.go, manual_commit_hooks.go — compressed-first reads
  • root.go — register optimize command
  • go.mod — promote klauspost/compress from indirect to direct dependency
  • Test files updated to handle .zst format

Test plan

  • mise run fmt — clean
  • mise run lint — 0 issues
  • mise run test:ci — all unit + integration tests pass
  • Run compression benchmarks: go test -bench=BenchmarkCompress -benchmem ./cmd/entire/cli/compression/...
  • Run storage benchmarks: go test -bench=BenchmarkSimulatedPushPayload -benchmem ./cmd/entire/cli/checkpoint/...
  • Manual: checkpoint + commit → verify full.jsonl.zst on metadata branch tree
  • Manual: read back old uncompressed checkpoints → verify backward-compat

🤖 Generated with Claude Code

Compress JSONL transcripts with zstd before storing as git blobs,
reducing object sizes 10-15x. This dramatically improves git push
times for the entire/checkpoints/v1 branch.

- Add compression package (zstd.go) with Compress/Decompress helpers
- Add ChunkCompressed/ReassembleCompressed for compressed byte splitting
- Compress transcripts in all write paths (committed + shadow branch)
- Add compressed-first reads with uncompressed fallback (backward compat)
- Add `entire optimize` command to migrate existing uncompressed data
- Add compression and storage benchmarks
- Promote klauspost/compress from indirect to direct dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: e65d9bec561a
Copilot AI review requested due to automatic review settings February 26, 2026 03:37
@evisdren evisdren requested a review from a team as a code owner February 26, 2026 03:37
@cursor
Copy link

cursor bot commented Feb 26, 2026

PR Summary

Medium Risk
Changes core checkpoint persistence/read logic and on-disk formats (new .zst paths and chunking semantics); backward-compat fallbacks help, but any mismatch could make transcripts unreadable or break migration.

Overview
Transcripts and subagent JSONL artifacts are now stored zstd-compressed on entire/checkpoints/v1 (and shadow/temporary checkpoints), using full.jsonl.zst / .zst-suffixed agent-*.jsonl and chunking the compressed bytes when blobs exceed MaxChunkSize.

Read paths were updated to prefer compressed content with transparent decompression, while retaining fallbacks for legacy uncompressed/chunked formats; metadata directory ingestion also compresses .jsonl files and adjusts tree paths accordingly. A new entire optimize command performs a dry-run (default) or --apply migration that rewrites existing uncompressed transcript blobs to .zst, and tests/benchmarks were added/updated to validate and measure compression behavior.

Written by Cursor Bugbot for commit ae08fa2. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds zstd compression to JSONL transcript storage to reduce git object sizes 10-15x and improve git push performance for the entire/checkpoints/v1 metadata branch. It introduces a new compression package, updates all write paths to compress transcripts, adds backward-compatible reads with compressed-first fallback, and provides an entire optimize migration command to compress existing uncompressed data.

Changes:

  • New compression package with zstd Compress/Decompress functions and helper utilities
  • Updated checkpoint storage (committed.go, temporary.go) to compress all JSONL transcripts and subagent transcripts before writing
  • New agent chunking functions (ChunkCompressed/ReassembleCompressed) for splitting compressed binary data at byte boundaries
  • Updated all read paths in strategy package to try compressed format first with uncompressed fallback
  • New entire optimize command for migrating existing uncompressed transcripts using tree surgery
  • Updated integration tests to handle both compressed and uncompressed transcript formats

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
compression/zstd.go Core compression/decompression using klauspost/compress/zstd
compression/zstd_test.go Unit tests for compression round-trips and edge cases
compression/zstd_bench_test.go Benchmarks for compression throughput and ratios
checkpoint/compression_bench_test.go End-to-end benchmarks for write/read/migration scenarios
checkpoint/committed.go Compressed writes for transcripts with chunking support; compressed-first reads
checkpoint/temporary.go Compressed writes for shadow branch; fallback reads with decompression
checkpoint/checkpoint_test.go Updated tests to decompress subagent transcripts
agent/chunking.go New ChunkCompressed/ReassembleCompressed for binary data splitting
strategy/common.go Compressed-first reads with fallback for getTaskTranscriptFromTree
strategy/manual_commit_hooks.go Compressed-first reads in sessionHasNewContent
strategy/manual_commit_condensation.go Compressed-first reads in extractSessionData
optimize.go New migration command with dry-run default and tree surgery implementation
root.go Command registration for optimize
paths/paths.go Added TranscriptCompressedFileName and CompressedSuffix constants
integration_test/testenv.go Helper to read compressed or uncompressed transcripts
integration_test/*.go Updated tests to check for both compressed and uncompressed formats
go.mod Promoted klauspost/compress from indirect to direct dependency

Comment on lines +160 to +199
// ChunkCompressed splits compressed (binary) data into chunks at raw byte boundaries.
// Unlike ChunkJSONL which respects line boundaries, this splits at arbitrary byte offsets
// since the consumer will reassemble the raw bytes before decompression.
func ChunkCompressed(data []byte, maxSize int) [][]byte {
if len(data) <= maxSize {
return [][]byte{data}
}

var chunks [][]byte
for len(data) > 0 {
end := maxSize
if end > len(data) {
end = len(data)
}
chunk := make([]byte, end)
copy(chunk, data[:end])
chunks = append(chunks, chunk)
data = data[end:]
}
return chunks
}

// ReassembleCompressed concatenates raw byte chunks back into a single buffer.
func ReassembleCompressed(chunks [][]byte) []byte {
if len(chunks) == 0 {
return nil
}
if len(chunks) == 1 {
return chunks[0]
}
total := 0
for _, c := range chunks {
total += len(c)
}
result := make([]byte, 0, total)
for _, c := range chunks {
result = append(result, c...)
}
return result
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new functions ChunkCompressed and ReassembleCompressed in this file lack test coverage. These functions handle critical binary data chunking for compressed transcripts, and should have tests to verify:

  • Correct chunking at byte boundaries
  • Round-trip preservation (chunk then reassemble yields original data)
  • Edge cases (empty data, single chunk, data size exactly at maxSize, data size just over maxSize)
  • Proper handling when maxSize is very small

Since other chunking functions in this file (ChunkJSONL, ReassembleJSONL) have comprehensive test coverage in chunking_test.go, these new functions should follow the same pattern.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +145
for entryPath, entry := range entries {
if !isUncompressedTranscript(entryPath) {
continue
}

// Read the blob content
blob, err := repo.BlobObject(entry.Hash)
if err != nil {
continue
}

reader, err := blob.Reader()
if err != nil {
continue
}

content := make([]byte, blob.Size)
n, readErr := io.ReadFull(reader, content)
_ = reader.Close()
if readErr != nil && n == 0 {
continue
}
content = content[:n]

originalSize := int64(len(content))

// Compress the content
compressed, err := compression.Compress(content)
if err != nil {
continue
}

compressedSize := int64(len(compressed))
totalOriginalSize += originalSize
totalCompressedSize += compressedSize
filesCompressed++

if apply {
// Create compressed blob
blobHash, err := checkpoint.CreateBlobFromContent(repo, compressed)
if err != nil {
continue
}

// Delete old uncompressed entry
changes = append(changes, checkpoint.TreeChange{
Path: entryPath,
Entry: nil, // delete
})

// Add new compressed entry
compressedPath := entryPath + paths.CompressedSuffix
changes = append(changes, checkpoint.TreeChange{
Path: compressedPath,
Entry: &object.TreeEntry{Mode: filemode.Regular, Hash: blobHash},
})
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling in the blob reading loop silently continues on any error, which means if blob reading fails or compression fails for legitimate reasons (e.g., I/O errors, corrupted data), the command will skip those files without informing the user. This could mask real problems.

At minimum, failed compressions should be logged (they're operational issues, not user content). Consider:

  • Logging warnings for failed blob reads (may indicate repo corruption)
  • Logging warnings for compression failures (unexpected but should be visible)
  • Possibly collecting and reporting error counts at the end of the operation

This is particularly important because the dry-run output shows savings estimates but doesn't mention any files that were skipped due to errors, which could give an inaccurate picture.

Copilot uses AI. Check for mistakes.
Comment on lines +1095 to +1104
// Try compressed format first
compressedPath := sessionDir + "/" + paths.TranscriptCompressedFileName
if file, fileErr := tree.File(compressedPath); fileErr == nil {
content, contentErr := file.Contents()
if contentErr == nil {
decompressed, decompressErr := compression.Decompress([]byte(content))
if decompressErr == nil {
return decompressed, nil
}
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only tries to read the base compressed file (full.jsonl.zst) but doesn't handle chunked compressed files (e.g., full.jsonl.zst, full.jsonl.zst.001, full.jsonl.zst.002).

When a compressed transcript exceeds MaxChunkSize (50MB), it's split into chunks. However, this read path will fail to reassemble those chunks - it will only read the first chunk file.

The code should use the same chunked reading logic that exists in readTranscriptFromTree in committed.go (specifically the readCompressedTranscript helper function), which properly handles both single and chunked compressed files.

Suggested change
// Try compressed format first
compressedPath := sessionDir + "/" + paths.TranscriptCompressedFileName
if file, fileErr := tree.File(compressedPath); fileErr == nil {
content, contentErr := file.Contents()
if contentErr == nil {
decompressed, decompressErr := compression.Decompress([]byte(content))
if decompressErr == nil {
return decompressed, nil
}
}
// Try compressed format first (supports single-file and chunked transcripts)
baseCompressedPath := sessionDir + "/" + paths.TranscriptCompressedFileName
var compressedPaths []string
filesIter := tree.Files()
err = filesIter.ForEach(func(f *object.File) error {
if strings.HasPrefix(f.Name, baseCompressedPath) {
compressedPaths = append(compressedPaths, f.Name)
}
return nil
})
if err != nil {
return nil, fmt.Errorf("failed to iterate tree files: %w", err)
}
if len(compressedPaths) > 0 {
// Ensure chunks are read in order: base file, then .001, .002, ...
sort.Strings(compressedPaths)
var combined []byte
for _, path := range compressedPaths {
file, fileErr := tree.File(path)
if fileErr != nil {
return nil, fmt.Errorf("failed to read compressed transcript chunk %s: %w", path, fileErr)
}
content, contentErr := file.Contents()
if contentErr != nil {
return nil, fmt.Errorf("failed to read contents of compressed transcript chunk %s: %w", path, contentErr)
}
combined = append(combined, []byte(content)...)
}
decompressed, decompressErr := compression.Decompress(combined)
if decompressErr == nil {
return decompressed, nil
}
// If decompression fails, fall back to trying uncompressed formats below.

Copilot uses AI. Check for mistakes.
Comment on lines +1068 to +1076
// Try compressed format first, then uncompressed, then legacy
if file, fileErr := tree.File(metadataDir + "/" + paths.TranscriptCompressedFileName); fileErr == nil {
hasTranscriptFile = true
if content, contentErr := file.Contents(); contentErr == nil {
if decompressed, decompressErr := compression.Decompress([]byte(content)); decompressErr == nil {
transcriptLines = countTranscriptItems(state.AgentType, string(decompressed))
}
}
} else if file, fileErr := tree.File(metadataDir + "/" + paths.TranscriptFileName); fileErr == nil {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only tries to read the base compressed file but doesn't handle chunked compressed files. When a compressed transcript exceeds MaxChunkSize (50MB), it's stored as multiple chunk files (full.jsonl.zst, full.jsonl.zst.001, etc.), but this code will only read the first chunk.

The same issue exists in strategy/common.go:1095-1105. The proper solution is to iterate through tree entries to find all chunks with the compressed transcript base name, sort them, read them all, reassemble the compressed data, then decompress.

Copilot uses AI. Check for mistakes.
Comment on lines 599 to +610
// Fall back to direct file access (for backwards compatibility)
// Try compressed first
compressedPath := metadataDir + "/" + paths.TranscriptCompressedFileName
if file, fileErr := tree.File(compressedPath); fileErr == nil {
content, contentErr := file.Contents()
if contentErr == nil {
decompressed, decompressErr := compression.Decompress([]byte(content))
if decompressErr == nil {
return decompressed, nil
}
}
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback path (lines 600-610) only reads the base compressed file and doesn't handle chunks. If readTranscriptFromTree fails but a chunked compressed transcript exists, this fallback will only read the first chunk.

However, since line 593 uses readTranscriptFromTree which properly handles chunked compressed files, this fallback should only be reached in edge cases. Still, for consistency and correctness, the fallback should either:

  1. Also handle chunks (iterate through possible chunk files)
  2. Be documented as intentionally handling only non-chunked files
  3. Log a warning if it's used (indicating an unexpected code path)

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +185
now := plumbing.NewHashReference(refName, plumbing.ZeroHash) // placeholder
_ = now

commitObj := &object.Commit{
Author: object.Signature{Name: authorName, Email: authorEmail},
Committer: object.Signature{Name: authorName, Email: authorEmail},
Message: fmt.Sprintf("Optimize: compress %d transcript files\n", filesCompressed),
TreeHash: newTreeHash,
ParentHashes: []plumbing.Hash{
ref.Hash(),
},
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code creates a placeholder reference that is never used. Lines 174-175 create a now variable with plumbing.NewHashReference and immediately discard it with _ = now. This serves no purpose.

The commit signature should use time.Now() for the timestamp. Looking at other commit creation code in the codebase (e.g., temporary.go:780-786), the pattern is to create a time.Time value and use it in the object.Signature.When field. However, this code doesn't set the When field at all, which means the commit will have a zero timestamp.

Add proper timestamp handling:

now := time.Now()
commitObj := &object.Commit{
    Author:    object.Signature{Name: authorName, Email: authorEmail, When: now},
    Committer: object.Signature{Name: authorName, Email: authorEmail, When: now},
    // ...
}

Copilot uses AI. Check for mistakes.
Comment on lines +354 to +358
chunks := agent.ChunkCompressed(compressed, agent.MaxChunkSize)
baseName := paths.TranscriptCompressedFileName
if compressErr != nil {
baseName = paths.TranscriptFileName
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When compression fails and the code falls back to storing uncompressed data, it still uses agent.ChunkCompressed which is designed for binary data and splits at arbitrary byte boundaries. This is incorrect for uncompressed JSONL data.

For uncompressed JSONL, the code should use agent.ChunkTranscript (as was done before this change) which respects line boundaries and maintains JSONL validity. Using ChunkCompressed on uncompressed JSONL will split JSON objects mid-line, making the chunks invalid JSONL.

The fix should:

  • Use agent.ChunkTranscript when compressErr != nil
  • Use agent.ChunkCompressed only when compression succeeded
  • Detect agent type from the transcript content when using ChunkTranscript

Copilot uses AI. Check for mistakes.
Comment on lines +1331 to +1342
// readTranscriptContent reads the transcript content for a checkpoint, trying compressed first.
func (env *TestEnv) readTranscriptContent(checkpointID string) string {
env.T.Helper()

// Try compressed format first
compressedPath := SessionFilePath(checkpointID, paths.TranscriptCompressedFileName)
if compressedContent, found := env.ReadFileFromBranch(paths.MetadataBranchName, compressedPath); found {
decompressed, err := compression.Decompress([]byte(compressedContent))
if err == nil {
return string(decompressed)
}
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function only reads the base compressed file and doesn't handle chunked compressed transcripts. If a test creates a transcript larger than MaxChunkSize (50MB), it will be stored as multiple chunks but this helper will only read the first chunk.

While current tests likely don't create transcripts large enough to trigger chunking, this could cause subtle test failures in the future if someone adds a test with large transcripts. Consider either:

  1. Adding chunk-handling logic (iterate through numbered suffixes)
  2. Documenting that this helper doesn't support chunked transcripts
  3. Making the helper use the same readTranscriptFromTree logic that the production code uses

Copilot uses AI. Check for mistakes.
Comment on lines +470 to +477
// Fall back to shadow branch copy — try compressed first, then uncompressed
if file, fileErr := tree.File(metadataDir + "/" + paths.TranscriptCompressedFileName); fileErr == nil {
if content, contentErr := file.Contents(); contentErr == nil {
fullTranscript = content
if decompressed, decompressErr := compression.Decompress([]byte(content)); decompressErr == nil {
fullTranscript = string(decompressed)
}
}
} else if file, fileErr := tree.File(metadataDir + "/" + paths.TranscriptFileNameLegacy); fileErr == nil {
if content, contentErr := file.Contents(); contentErr == nil {
fullTranscript = content
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only tries to read the base compressed file but doesn't handle chunked compressed files. When a compressed transcript exceeds MaxChunkSize (50MB), it's stored as multiple chunk files, but this code will only read the first chunk.

The same issue exists in strategy/common.go:1095-1105 and strategy/manual_commit_hooks.go:1068-1076. A helper function should be created to properly read compressed transcripts (with chunk handling) that can be shared across all these call sites.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +39
var dryRunFlag bool

cmd := &cobra.Command{
Use: "optimize",
Short: "Optimize stored checkpoint data",
Long: `Compress existing uncompressed transcript data on the entire/checkpoints/v1 branch.

New checkpoints are automatically compressed with zstd. This command migrates
older uncompressed data to the compressed format, reducing storage size and
improving push/pull performance.

Default: dry run that shows what would be compressed and estimated savings.
With --apply, actually compresses the data.`,
RunE: func(cmd *cobra.Command, _ []string) error {
return runOptimize(cmd.OutOrStdout(), !dryRunFlag)
},
}

cmd.Flags().BoolVar(&dryRunFlag, "apply", false, "Actually compress data (default: dry run)")
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the command syntax is entire optimize [--apply] but line 39 shows the flag help text says "Actually compress data (default: dry run)" with the flag name as --apply. This is confusing because:

  1. The boolean variable is named dryRunFlag but the flag is named apply
  2. Most CLI tools use --dry-run as a flag to prevent changes, not --apply to enable them
  3. The default behavior (dry run) is non-standard - usually the command does the action by default and --dry-run prevents it

While the implementation is functionally correct (the negation on line 35 handles the inversion), this design is counterintuitive for users familiar with standard CLI patterns. Consider changing to --dry-run flag with default false, or keeping --apply but renaming the variable to applyFlag for clarity.

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 6 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Path: compressedPath,
Entry: &object.TreeEntry{Mode: filemode.Regular, Hash: blobHash},
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize command corrupts chunked transcripts with wrong filenames

High Severity

The optimize command compresses each chunk file individually, appending .zst to the original filename (e.g., full.jsonl.001full.jsonl.001.zst). However, the read path expects compressed chunk names in the format full.jsonl.zst.001. The misnamed files are orphaned from the compressed reader and worse, ParseChunkIndex("full.jsonl.001.zst", "full.jsonl") successfully parses index 1, so the uncompressed fallback reader picks them up and returns compressed binary as if it were JSONL text. For chunked transcripts, this causes data corruption on read after optimization.

Additional Locations (1)

Fix in Cursor Fix in Web

agentBlobHash, agentBlobErr := CreateBlobFromContent(s.repo, compressed)
if agentBlobErr == nil {
agentPath := taskPath + "agent-" + opts.AgentID + ".jsonl"
agentPath := taskPath + "agent-" + opts.AgentID + ".jsonl" + paths.CompressedSuffix
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionFilePaths.Transcript references non-existent uncompressed file path

Medium Severity

filePaths.Transcript is set to paths.TranscriptFileName (full.jsonl), but writeTranscript now stores the file as paths.TranscriptCompressedFileName (full.jsonl.zst). The CheckpointSummary metadata JSON will contain a path that doesn't correspond to any actual file in the git tree, breaking any consumer that follows this path to locate the transcript.

Fix in Cursor Fix in Web

ParentHashes: []plumbing.Hash{
ref.Hash(),
},
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize commit created with zero-value timestamp

Medium Severity

The object.Signature for the optimize commit omits the When field, resulting in a git commit with a year-0001 timestamp. The existing createCommit method in the codebase properly sets When: time.Now(). The nearby dead code now := plumbing.NewHashReference(...) / _ = now at lines 174–175 suggests timestamp handling was intended but not completed.

Fix in Cursor Fix in Web

// is updated via the compressedTreePath output parameter (if non-nil).
func createRedactedBlobFromFile(repo *git.Repository, filePath, treePath string) (plumbing.Hash, filemode.FileMode, error) {
hash, mode, _, err := createRedactedBlobFromFileWithCompression(repo, filePath, treePath)
return hash, mode, err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapper compresses JSONL but discards effective path

Medium Severity

createRedactedBlobFromFile wraps createRedactedBlobFromFileWithCompression but discards the effective tree path (the third return value). This means .jsonl files get their content compressed, but the caller (copyMetadataDir) stores the compressed blob under the original .jsonl path without the .zst suffix. The parallel functions addDirectoryToEntriesWithAbsPath and addDirectoryToChanges were correctly updated to use the WithCompression variant, but copyMetadataDir was not.

Additional Locations (1)

Fix in Cursor Fix in Web

// Create new commit
authorName, authorEmail := checkpoint.GetGitAuthorFromRepo(repo)
now := plumbing.NewHashReference(refName, plumbing.ZeroHash) // placeholder
_ = now
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead placeholder code left in optimize command

Low Severity

now := plumbing.NewHashReference(refName, plumbing.ZeroHash) creates a reference object that is immediately discarded with _ = now. This appears to be leftover development scaffolding — the variable name now and the // placeholder comment suggest it was meant to hold a timestamp but was never completed.

Fix in Cursor Fix in Web

// CompressedName returns the filename with a .zst suffix appended.
func CompressedName(name string) string {
return name + ".zst"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused exported functions in compression package

Low Severity

IsCompressedName and CompressedName are exported functions that are never called anywhere in the codebase outside their own test file. The codebase uses paths.CompressedSuffix and strings.HasSuffix directly instead of these helpers.

Fix in Cursor Fix in Web

evisdren and others added 2 commits February 27, 2026 12:09
Push perf test (build tag: pushperf) measures real GitHub push times
for entire/checkpoints/v1 using real transcript data from the source
repo, with GIT_TRACE2_EVENT profiling to break down each phase
(ref negotiation, pack+send, remote processing).

Growth model test (build tag: growthmodel) projects data volumes,
push times, and GitHub size limit timelines across team sizes
(10/50/250/1000 devs) and time horizons (1-12 months).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summarizes growth model projections, push profiling results, GitHub
size limit timelines, platform-level storage estimates, and per-developer
unit economics. Includes potential mitigations for data scaling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants