perf(storage): release frame input buffers per goroutine#2869
Conversation
Move buffer ownership into the compression goroutine: the goroutine that reads bufPtr also returns it to inputPool via defer when it exits. Drops frame.input, part.releaseInputBuffers, and the explicit release calls in the read/upload loops. The previous checkpoint design held every part's input buffers until p.compress.Wait() returned in the upload loop, so buffers stayed pinned across the GCS upload latency. The pool kept missing and calling New, leaving allocations on par with or worse than no pool at all. Per-goroutine release caps the pool's working set at workers+1 regardless of stream length, so allocations and peak heap actually shrink. Adds compress_upload_pool_demo_test.go (delete before merge) that mirrors the pre-PR-2863 and PR #2863 designs verbatim and runs them side-by-side with the real compressStream from this branch on 256 MiB and 1 GiB inputs with a simulated 500 ms per-part upload latency. ``` go test -run TestPoolLifecycleDemo -v -timeout=10m -count=1 ./packages/shared/pkg/storage/ 2>&1 === RUN TestPoolLifecycleDemo === RUN TestPoolLifecycleDemo/256MiB compress_upload_pool_demo_test.go:473: input=256 MiB, frame=2048 KiB, part=50 MiB, workers=4, upload_delay=500ms compress_upload_pool_demo_test.go:524: main (no pool) : total_alloc= 668.0 MiB mallocs= 2037 heap_inuse_after= 837.9 MiB compress_upload_pool_demo_test.go:524: tomas (PR #2863, checkpoint) : total_alloc= 689.5 MiB mallocs= 2281 heap_inuse_after= 953.7 MiB compress_upload_pool_demo_test.go:524: this branch (per-goroutine) : total_alloc= 370.2 MiB mallocs= 1430 heap_inuse_after= 638.6 MiB compress_upload_pool_demo_test.go:527: --- compress_upload_pool_demo_test.go:528: tomas vs main: total_alloc +21.6 MiB mallocs +244 heap_inuse +115.9 MiB compress_upload_pool_demo_test.go:532: branch vs main: total_alloc -297.8 MiB mallocs -607 heap_inuse -199.2 MiB compress_upload_pool_demo_test.go:536: branch vs tomas:total_alloc -319.4 MiB mallocs -851 heap_inuse -315.1 MiB === RUN TestPoolLifecycleDemo/1024MiB compress_upload_pool_demo_test.go:473: input=1024 MiB, frame=2048 KiB, part=50 MiB, workers=4, upload_delay=500ms compress_upload_pool_demo_test.go:524: main (no pool) : total_alloc= 2341.9 MiB mallocs= 4794 heap_inuse_after= 2986.0 MiB compress_upload_pool_demo_test.go:524: tomas (PR #2863, checkpoint) : total_alloc= 2417.5 MiB mallocs= 5895 heap_inuse_after= 3462.5 MiB compress_upload_pool_demo_test.go:524: this branch (per-goroutine) : total_alloc= 1334.3 MiB mallocs= 4367 heap_inuse_after= 2396.1 MiB compress_upload_pool_demo_test.go:527: --- compress_upload_pool_demo_test.go:528: tomas vs main: total_alloc +75.6 MiB mallocs +1101 heap_inuse +476.5 MiB compress_upload_pool_demo_test.go:532: branch vs main: total_alloc -1007.7 MiB mallocs -427 heap_inuse -589.9 MiB compress_upload_pool_demo_test.go:536: branch vs tomas:total_alloc -1083.3 MiB mallocs -1528 heap_inuse -1066.4 MiB --- PASS: TestPoolLifecycleDemo (6.48s) --- PASS: TestPoolLifecycleDemo/256MiB (2.20s) --- PASS: TestPoolLifecycleDemo/1024MiB (4.29s) PASS ok github.com/e2b-dev/infra/packages/shared/pkg/storage 6.956s ```
❌ 3 Tests Failed:
View the full list of 6 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7352e31615
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
616cf63
into
valenta/storage-upload-copy-reduction
Move buffer ownership into the compression goroutine: the goroutine that reads
bufPtralso returns it toinputPoolvia defer when it exits. Dropsframe.input,part.releaseInputBuffers, and the explicit release calls in the read/upload loops.The previous checkpoint design held every part's input buffers until p.compress.Wait() returned in the upload loop, so buffers stayed pinned across the GCS upload latency. The pool kept missing and calling New, leaving allocations on par with or worse than no pool at all. Now, per-goroutine release caps the pool's working set at workers+1 regardless of stream length, so allocations and peak heap actually shrink.
Adds compress_upload_pool_demo_test.go (delete before merge) that mirrors the pre-PR-2863 and PR #2863 designs verbatim and runs them side-by-side with the real compressStream from this branch on 256 MiB and 1 GiB inputs with a simulated 500 ms per-part upload latency.