Skip to content

Add Rapid bucket template cache#2876

Draft
ValentaTomas wants to merge 13 commits into
mainfrom
valenta/rapid-bucket-cache
Draft

Add Rapid bucket template cache#2876
ValentaTomas wants to merge 13 commits into
mainfrom
valenta/rapid-bucket-cache

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 30, 2026

Adds a disabled-by-default Rapid bucket read-through cache for template artifacts. Rapid cache hits/writes update Redis asynchronously for chunk-level cleanup hints, while reads still use deterministic bucket paths and fall back to canonical storage.

Tested: go test ./packages/shared/pkg/storage ./packages/orchestrator/cmd/clean-rapid-cache

Add a feature-gated bucket-backed read-through cache for template artifacts as an alternative to the shared NFS cache.
@cla-bot cla-bot Bot added the cla-signed label May 30, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

Medium Risk
Changes template fetch paths and shared storage behavior; misconfiguration or aggressive cleanup could affect cache correctness or delete objects prematurely, though the feature is off by default and cleanup defaults to dry-run.

Overview
Adds an opt-in Rapid bucket read-through cache for template artifact reads: chunk reads try a dedicated GCS bucket first, fall back to canonical storage, and populate the bucket asynchronously while recording access in Redis for eviction. The orchestrator enables this via a new LaunchDarkly flag (default off) and prefers it over the existing NFS template cache when both could apply. GCP deploys a hourly Nomad batch job running clean-rapid-cache (index-driven deletes plus prefix scan, dry-run by default) and injects RAPID_BUCKET_CACHE_BUCKET_NAME into orchestrator env when a bucket is configured.

Reviewed by Cursor Bugbot for commit b61e373. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In packages/shared/pkg/storage/storage_cache_rapid.go, calling c.wg.Go will cause a compilation error because sync.WaitGroup does not have a Go method. The background cache write-back goroutines do not need to be synchronized or waited on, so the function should be run directly in a goroutine and the unused wg field removed from the rapidCachedSeekable struct.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
2703 3 2700 7
View the full list of 5 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestEgressFirewallPrivateIPRangesAlwaysBlocked

Flake rate in main: 42.11% (Passed 734 times, Failed 534 times)

Stack Traces | 0s run time
=== RUN   TestEgressFirewallPrivateIPRangesAlwaysBlocked
=== PAUSE TestEgressFirewallPrivateIPRangesAlwaysBlocked
=== CONT  TestEgressFirewallPrivateIPRangesAlwaysBlocked
--- FAIL: TestEgressFirewallPrivateIPRangesAlwaysBlocked (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16

Flake rate in main: 42.07% (Passed 727 times, Failed 528 times)

Stack Traces | 0.73s run time
=== RUN   TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16
=== PAUSE TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16
=== CONT  TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16
    sandbox_network_out_test.go:466: Command [curl] output: event:{start:{pid:1360}}
    sandbox_network_out_test.go:466: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:78
        	            				.../api/sandboxes/sandbox_network_out_test.go:466
        	Error:      	"failed to execute command curl in sandbox itu0n7ny29hgxsmevmux5: invalid_argument: protocol error: incomplete envelope: unexpected EOF" does not contain "failed with exit code"
        	Test:       	TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16
        	Messages:   	Expected connection failure message
--- FAIL: TestEgressFirewallPrivateIPRangesAlwaysBlocked/link_local_169.254.0.0/16 (0.73s)
Executing command curl in sandbox i6l2yvkn8ii054wzr8eg2
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 42.80% (Passed 747 times, Failed 559 times)

Stack Traces | 94.7s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: i3zpti8nxqhrlh7bxti0b
    sandbox_list_test.go:327: Created sandbox 2/12: id2bx4iaralibm71m6cmg
    sandbox_list_test.go:327: Created sandbox 3/12: iqdoqkj66293r3ylcyr6c
    sandbox_list_test.go:327: Created sandbox 4/12: imowygcdeb3t08ulc5xnr
    sandbox_list_test.go:327: Created sandbox 5/12: it7qeuprdghjs2pkw00g1
    sandbox_list_test.go:327: Created sandbox 6/12: i11trsz9d5yrxu9og020b
    sandbox_list_test.go:327: Created sandbox 7/12: izcm4p742w7wobzqlqurm
    sandbox_list_test.go:327: Created sandbox 8/12: idwyoyczo8u8ig97be2py
    sandbox_list_test.go:327: Created sandbox 9/12: itee0bb2vpfnol2l13mre
    sandbox_list_test.go:327: Created sandbox 10/12: i0ovss785dtkj22m7yopg
    sandbox_list_test.go:327: Created sandbox 11/12: ipnge7c9hya0t6bp2uxxn
    sandbox_list_test.go:327: Created sandbox 12/12: imu7zkczoj83hz625i65m
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:340
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	"[]" should have 12 item(s), but has 0
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:330
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxListPaginationRunningLargerLimit
--- FAIL: TestSandboxListPaginationRunningLargerLimit (94.73s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 57.71% (Passed 740 times, Failed 1010 times)

Stack Traces | 63.9s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
Executing command bash in sandbox ig6hhljrmc76r6o3rpkaf (user: root)
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (63.88s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 57.83% (Passed 730 times, Failed 1001 times)

Stack Traces | 210s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1275}}
Executing command bash in sandbox ijhrr6zfh8k0hwajxuwxt (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory before tmpfs mount: 191 MB\nFree memory before tmpfs mount: 793 MB\nMemory to use in integrity test (60% of free, min 64MB): 475 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"475+0 records in\n475+0 records out\n498073600 bytes (498 MB, 475 MiB) copied, 7.85829 s, 63.4 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=475\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 7.60\n\tPercent of CPU this job got: 96%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:07.90\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2688\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 346\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 29\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 670 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i9wq6ov5jf8xbol2xcvs3
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1292}}
Executing command bash in sandbox i5nkawrxor1mc9mcmgrwz (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"775867632525fa68341add31b1cc6e257978ce3bffa7900ea626263fd1906927\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox i9wq6ov5jf8xbol2xcvs3
Executing command bash in sandbox i5nkawrxor1mc9mcmgrwz (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1295}}
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
Executing command bash in sandbox i9wq6ov5jf8xbol2xcvs3 (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i9wq6ov5jf8xbol2xcvs3: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (210.05s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Rapid cache not invalidated on delete
    • Added async deletion of rapid-cache/ prefixed objects in DeleteObjectsWithPrefix to match NFS cache cleanup behavior.
  • ✅ Resolved by another fix: Rapid cache orphaned on template delete
    • This bug is resolved by the same fix applied to bug 1c5b6c3a which adds cache invalidation to DeleteObjectsWithPrefix.
  • ✅ Fixed: Compressed rapid hits skip size check
    • Added size validation in openCompressed to check cached object size matches expected frame length, with automatic invalidation of mismatched entries.

Create PR

Or push these changes by commenting:

@cursor push ee088a1d9b
Preview (ee088a1d9b)
diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -22,6 +22,12 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
+	// Delete from cache bucket asynchronously (cached objects are under "rapid-cache/" prefix)
+	go func(ctx context.Context) {
+		cachePrefix := "rapid-cache/" + prefix
+		_ = p.cache.DeleteObjectsWithPrefix(ctx, cachePrefix)
+	}(context.WithoutCancel(ctx))
+
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)
 }
 
@@ -88,14 +94,26 @@
 
 	cachePath := makeFrameFilename(c.path, r)
 	if raw, err := c.openCache(ctx, cachePath, int64(r.Length)); err == nil {
-		dec, err := newDecompressingReadCloser(raw, frameTable.CompressionType())
-		if err != nil {
-			raw.Close()
+		// Validate cached object size matches expected frame length
+		size, sizeErr := c.getCacheSize(ctx, cachePath)
+		if sizeErr == nil && size == int64(r.Length) {
+			dec, err := newDecompressingReadCloser(raw, frameTable.CompressionType())
+			if err != nil {
+				raw.Close()
 
-			return nil, err
+				return nil, err
+			}
+
+			return dec, nil
 		}
-
-		return dec, nil
+		// Size mismatch or size check failed: close reader and delete bad cache entry
+		raw.Close()
+		if sizeErr == nil {
+			// Confirmed size mismatch: delete asynchronously to allow refetch
+			go func(ctx context.Context) {
+				_ = c.cache.DeleteObjectsWithPrefix(ctx, cachePath)
+			}(context.WithoutCancel(ctx))
+		}
 	}
 
 	raw, err := c.inner.OpenRangeReader(ctx, r.Offset, int64(r.Length), nil)
@@ -130,6 +148,15 @@
 	return obj.OpenRangeReader(ctx, 0, length, nil)
 }
 
+func (c *rapidCachedSeekable) getCacheSize(ctx context.Context, path string) (int64, error) {
+	obj, err := c.cache.OpenSeekable(ctx, path, UnknownSeekableObjectType)
+	if err != nil {
+		return 0, err
+	}
+
+	return obj.Size(ctx)
+}
+
 func (c *rapidCachedSeekable) writeCache(ctx context.Context, path string, data []byte) error {
 	blob, err := c.cache.OpenBlob(ctx, path, UnknownObjectType)
 	if err != nil {

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Comment thread packages/orchestrator/pkg/sandbox/template/cache.go
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Validate cached object sizes, clean cache prefixes with canonical deletes, and use plain goroutines for best-effort writeback.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Rapid cache skips read validation
    • Added validateReadParams method to rapidCachedSeekable to ensure offset alignment and prevent cache key collisions from unaligned reads.
  • ✅ Fixed: Cache delete blocks primary delete
    • Changed DeleteObjectsWithPrefix to ignore cache deletion failures by discarding the error and always proceeding to delete from primary storage.

Create PR

Or push these changes by commenting:

@cursor push 65a2bd76f7
Preview (65a2bd76f7)
diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -21,9 +21,7 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
-	if err := p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix); err != nil {
-		return err
-	}
+	_ = p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix)
 
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)
 }
@@ -64,6 +62,10 @@
 		return c.openCompressed(ctx, off, frameTable)
 	}
 
+	if err := c.validateReadParams(length, off); err != nil {
+		return nil, err
+	}
+
 	cachePath := fmt.Sprintf("%s/%012d-%d.bin", c.path, off/MemoryChunkSize, length)
 	if rc, err := c.openCache(ctx, cachePath, length); err == nil {
 		return rc, nil
@@ -114,6 +116,23 @@
 	return dec, nil
 }
 
+func (c *rapidCachedSeekable) validateReadParams(buffSize, offset int64) error {
+	if buffSize == 0 {
+		return ErrBufferTooSmall
+	}
+	if buffSize > MemoryChunkSize {
+		return ErrBufferTooLarge
+	}
+	if offset%MemoryChunkSize != 0 {
+		return ErrOffsetUnaligned
+	}
+	if (offset%MemoryChunkSize)+buffSize > MemoryChunkSize {
+		return ErrMultipleChunks
+	}
+
+	return nil
+}
+
 func (c *rapidCachedSeekable) Size(ctx context.Context) (int64, error) {
 	return c.inner.Size(ctx)
 }

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
@ValentaTomas ValentaTomas requested a review from jakubno May 31, 2026 02:18
Validate uncompressed rapid cache reads and avoid blocking canonical deletes on cache cleanup failures.
Add a disabled-by-default periodic GCS cleaner for rapid-cache objects so the cache can be bounded without relying on bucket lifecycle rules.
Make the rapid cache cleaner explicit about ignoring client close errors.
Satisfy orchestrator lint formatting for the dry-run branch.
Record Rapid cache chunk recency asynchronously so cleanup can evict cold chunks while reads still use deterministic bucket paths.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Cleanup deletes recently used cache
    • Added IsRecent check during bucket scan to skip objects that exist in Redis index, preventing deletion of hot cache entries whose GCS Updated timestamp is old but Redis score is recent.
  • ✅ Fixed: Successful index pass skips bucket scan
    • Modified clean function to always run both cleanFromIndex and cleanFromBucket passes, ensuring orphaned objects (in GCS but not in Redis) are discovered and removed even when the index pass deletes objects.
  • ✅ Fixed: Build delete leaves Redis index
    • Added EvictPrefix method to RapidCacheIndex interface and implemented it using Redis ZSCAN to bulk-remove matching entries from chunks ZSET and decrement build counters when DeleteObjectsWithPrefix is called.

Create PR

Or push these changes by commenting:

@cursor push 25445707df
Preview (25445707df)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -71,15 +71,21 @@
 		_ = client.Close()
 	}()
 
-	if deleted, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index); err != nil {
+	deletedIndex, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index)
+	if err != nil {
 		return err
-	} else if deleted > 0 {
-		log.Printf("summary dry_run=%t deleted=%d source=redis", dryRun, deleted)
+	}
 
+	if deletedIndex > 0 {
+		log.Printf("summary dry_run=%t deleted=%d source=redis", dryRun, deletedIndex)
+	}
+
+	remaining := maxDeletions - deletedIndex
+	if remaining <= 0 {
 		return nil
 	}
 
-	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun)
+	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, remaining, dryRun, index)
 }
 
 func cleanFromIndex(ctx context.Context, client *gcs.Client, bucket string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) (int, error) {
@@ -117,7 +123,7 @@
 	return deleted, nil
 }
 
-func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool) error {
+func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) error {
 	var scanned, matched, deleted int
 	objects := client.Bucket(bucket).Objects(ctx, &gcs.Query{Prefix: prefix})
 	for {
@@ -133,6 +139,13 @@
 		if !attrs.Updated.Before(cutoff) {
 			continue
 		}
+
+		if !dryRun {
+			if recent, err := index.IsRecent(ctx, attrs.Name); err == nil && recent {
+				continue
+			}
+		}
+
 		matched++
 		if deleted >= maxDeletions {
 			break
@@ -145,6 +158,7 @@
 		if err := client.Bucket(bucket).Object(attrs.Name).Delete(ctx); err != nil {
 			return fmt.Errorf("delete cache object: %w", err)
 		}
+		_ = index.Evict(ctx, attrs.Name, attrs.Size)
 		deleted++
 	}
 

diff --git a/packages/shared/pkg/storage/storage_cache_rapid.go b/packages/shared/pkg/storage/storage_cache_rapid.go
--- a/packages/shared/pkg/storage/storage_cache_rapid.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid.go
@@ -27,6 +27,7 @@
 }
 
 func (p *rapidCacheProvider) DeleteObjectsWithPrefix(ctx context.Context, prefix string) error {
+	_ = p.index.EvictPrefix(ctx, prefix)
 	_ = p.cache.DeleteObjectsWithPrefix(ctx, "rapid-cache/"+prefix)
 
 	return p.inner.DeleteObjectsWithPrefix(ctx, prefix)

diff --git a/packages/shared/pkg/storage/storage_cache_rapid_index.go b/packages/shared/pkg/storage/storage_cache_rapid_index.go
--- a/packages/shared/pkg/storage/storage_cache_rapid_index.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid_index.go
@@ -16,6 +16,8 @@
 	Admit(ctx context.Context, path string, size int64) error
 	Evict(ctx context.Context, path string, size int64) error
 	Candidates(ctx context.Context, before time.Time, limit int64) ([]string, error)
+	IsRecent(ctx context.Context, path string) (bool, error)
+	EvictPrefix(ctx context.Context, prefix string) error
 }
 
 type noopRapidCacheIndex struct{}
@@ -28,6 +30,8 @@
 func (noopRapidCacheIndex) Candidates(context.Context, time.Time, int64) ([]string, error) {
 	return nil, nil
 }
+func (noopRapidCacheIndex) IsRecent(context.Context, string) (bool, error) { return false, nil }
+func (noopRapidCacheIndex) EvictPrefix(context.Context, string) error      { return nil }
 
 type redisRapidCacheIndex struct {
 	redis redis.UniversalClient
@@ -110,6 +114,61 @@
 	}).Result()
 }
 
+func (i *redisRapidCacheIndex) IsRecent(ctx context.Context, path string) (bool, error) {
+	score := i.redis.ZScore(ctx, i.keys.chunks, path)
+	if score.Err() == redis.Nil {
+		return false, nil
+	}
+	if score.Err() != nil {
+		return false, score.Err()
+	}
+
+	return true, nil
+}
+
+func (i *redisRapidCacheIndex) EvictPrefix(ctx context.Context, prefix string) error {
+	script := redis.NewScript(`
+local cursor = "0"
+local prefix = ARGV[1]
+local chunks_key = KEYS[1]
+local build_bytes_key = KEYS[2]
+local build_chunks_key = KEYS[3]
+
+repeat
+  local scan_result = redis.call('ZSCAN', chunks_key, cursor, 'MATCH', prefix .. '*', 'COUNT', 100)
+  cursor = scan_result[1]
+  local members = scan_result[2]
+  
+  for i = 1, #members, 2 do
+    local path = members[i]
+    redis.call('ZREM', chunks_key, path)
+    
+    local path_without_prefix = string.gsub(path, '^rapid%-cache/', '')
+    local build_id = string.match(path_without_prefix, '^([^/]+)')
+    if build_id then
+      local obj_result = redis.call('HGET', build_bytes_key, build_id)
+      if obj_result then
+        redis.call('HINCRBY', build_chunks_key, build_id, -1)
+        local chunks_left = redis.call('HGET', build_chunks_key, build_id)
+        if tonumber(chunks_left) <= 0 then
+          redis.call('HDEL', build_bytes_key, build_id)
+          redis.call('HDEL', build_chunks_key, build_id)
+        end
+      end
+    end
+  end
+until cursor == "0"
+
+return 0
+`)
+
+	return script.Run(ctx, i.redis, []string{
+		i.keys.chunks,
+		i.keys.buildBytes,
+		i.keys.buildChunks,
+	}, "rapid-cache/"+prefix).Err()
+}
+
 func rapidCacheBuildID(path string) string {
 	path = strings.TrimPrefix(path, rapidCachePrefix)
 	buildID, _ := SplitPath(path)

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Evict zero size skews Redis stats
    • Added size > 0 check in Evict to prevent decrementing build counters when evicting ghost entries with zero size.
  • ✅ Fixed: Bucket deletes skip Redis eviction
    • Added index.Evict call in cleanFromBucket after GCS deletion to synchronize Redis index with bucket state.

Create PR

Or push these changes by commenting:

@cursor push cffa73b102
Preview (cffa73b102)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -79,7 +79,7 @@
 		return nil
 	}
 
-	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun)
+	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions, dryRun, index)
 }
 
 func cleanFromIndex(ctx context.Context, client *gcs.Client, bucket string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) (int, error) {
@@ -117,7 +117,7 @@
 	return deleted, nil
 }
 
-func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool) error {
+func cleanFromBucket(ctx context.Context, client *gcs.Client, bucket string, prefix string, cutoff time.Time, maxDeletions int, dryRun bool, index storage.RapidCacheIndex) error {
 	var scanned, matched, deleted int
 	objects := client.Bucket(bucket).Objects(ctx, &gcs.Query{Prefix: prefix})
 	for {
@@ -145,6 +145,7 @@
 		if err := client.Bucket(bucket).Object(attrs.Name).Delete(ctx); err != nil {
 			return fmt.Errorf("delete cache object: %w", err)
 		}
+		_ = index.Evict(ctx, attrs.Name, attrs.Size)
 		deleted++
 	}
 

diff --git a/packages/shared/pkg/storage/storage_cache_rapid_index.go b/packages/shared/pkg/storage/storage_cache_rapid_index.go
--- a/packages/shared/pkg/storage/storage_cache_rapid_index.go
+++ b/packages/shared/pkg/storage/storage_cache_rapid_index.go
@@ -92,7 +92,7 @@
 	buildID := rapidCacheBuildID(path)
 	pipe := i.redis.Pipeline()
 	pipe.ZRem(ctx, i.keys.chunks, path)
-	if buildID != "" {
+	if buildID != "" && size > 0 {
 		pipe.HIncrBy(ctx, i.keys.buildBytes, buildID, -size)
 		pipe.HIncrBy(ctx, i.keys.buildChunks, buildID, -1)
 	}

You can send follow-ups to the cloud agent here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go
Comment thread packages/shared/pkg/storage/storage_cache_rapid_index.go
Avoid deferred cleanup before fatal exit and keep Redis cleanup fallback explicit.
Use Redis touch scores to avoid deleting recently used chunks during bucket fallback cleanup and clean stale Redis entries after deletes.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Rapid cache delete errors ignored
    • Changed DeleteObjectsWithPrefix to return cache deletion errors instead of discarding them, ensuring both cache and inner deletion failures are properly propagated.
  • ✅ Fixed: LastAccess Redis errors ignored
    • Modified cleanFromBucket to check and return errors from LastAccess, preventing Redis failures from being treated as missing index entries that could lead to incorrect deletions.

Create PR

Or push these changes by commenting:

@cursor push d8236c266b

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/pkg/storage/storage_cache_rapid.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go
Apply gofumpt formatting to the rapid cache index helper.
Surface rapid prefix delete failures and avoid bucket fallback deletes when Redis recency cannot be checked.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Redis error skips bucket cleanup
    • Changed cleanFromBucket to fall back to GCS timestamp when Redis LastAccess fails, treating errors the same as missing keys.
  • ✅ Fixed: Index deletes skip access recheck
    • Added LastAccess recheck in cleanFromIndex before deletion to prevent removing objects that were touched after Candidates snapshot.
  • ✅ Fixed: Index cleanup error skips bucket scan
    • Changed clean function to log cleanFromIndex errors and continue to cleanFromBucket instead of returning immediately.

Create PR

Or push these changes by commenting:

@cursor push b58d31345b
Preview (b58d31345b)
diff --git a/packages/orchestrator/cmd/clean-rapid-cache/main.go b/packages/orchestrator/cmd/clean-rapid-cache/main.go
--- a/packages/orchestrator/cmd/clean-rapid-cache/main.go
+++ b/packages/orchestrator/cmd/clean-rapid-cache/main.go
@@ -73,7 +73,7 @@
 
 	deleted, err := cleanFromIndex(ctx, client, bucket, cutoff, maxDeletions, dryRun, index)
 	if err != nil {
-		return err
+		log.Printf("cleanFromIndex failed (continuing with bucket scan): %v", err)
 	}
 
 	return cleanFromBucket(ctx, client, bucket, prefix, cutoff, maxDeletions-deleted, dryRun, index)
@@ -99,6 +99,10 @@
 		if err != nil {
 			return deleted, fmt.Errorf("read cache object metadata: %w", err)
 		}
+		lastAccess, ok, err := index.LastAccess(ctx, path)
+		if err == nil && ok && lastAccess >= cutoff.Unix() {
+			continue
+		}
 		if dryRun {
 			deleted++
 
@@ -135,12 +139,9 @@
 			continue
 		}
 		lastAccess, ok, err := index.LastAccess(ctx, attrs.Name)
-		if err != nil {
+		if err == nil && ok && lastAccess >= cutoff.Unix() {
 			continue
 		}
-		if ok && lastAccess >= cutoff.Unix() {
-			continue
-		}
 		matched++
 		if deleted >= maxDeletions {
 			break

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 1043f73. Configure here.

Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go
Comment thread packages/orchestrator/cmd/clean-rapid-cache/main.go Outdated
Make Redis index cleanup best-effort, recheck recency before index deletes, and let bucket fallback proceed when Redis is unavailable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant