Skip to content

Add rootfs reboot fallback for snapshots#2874

Closed
ValentaTomas wants to merge 13 commits into
mainfrom
cova/explore-rootfs-memfile-dedup
Closed

Add rootfs reboot fallback for snapshots#2874
ValentaTomas wants to merge 13 commits into
mainfrom
cova/explore-rootfs-memfile-dedup

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 30, 2026

Add a rootfs reboot fallback so selected snapshot flows can skip memory restore while preserving normal resume behavior by default.

ValentaTomas and others added 9 commits May 26, 2026 01:41
cachedSeekable.StoreFile previously skipped NFS write-through whenever
compression was enabled, so even the orchestrator that just built a
template would miss NFS on the first read and hit GCS. Add a per-frame
FrameSink PutOption and hook it from the compressed StoreFile path so
each compressed frame is teed into a .frm file at its C-space offset as
it is produced, matching the layout used by the read-miss writeback.

Gated by the existing write-to-cache-on-writes feature flag.
Match the uncompressed write-through path: read MaxCacheWriterConcurrencyFlag
and gate concurrent writeToCache calls through a per-upload semaphore. Same
operator knob, same fallback-to-1 warning, no timeout, still fire-and-forget
via goCtx so the upload doesn't wait on NFS.

Also widen FrameSink to take context.Context so the callback is a plain method
signature rather than something that has to be wrapped in a closure just to
carry ctx.
Upload.Run now adds CompressUseCaseContext(u.useCase) to the ctx that flows
through the whole upload, so downstream flag reads can target template builds
("build") and snapshot pauses ("pause") independently. Enables turning on
EnableWriteThroughCacheFlag for template builds only.

Uses the same LD kind already wired through resolveCompressConfig.
TestCachedSeekable_FrameSinkPopulatesNFS exercises the sink directly but
skips the StoreFile gating branch. Add two tests modeled on the existing
uncompressed TestCachedFileObjectProvider_WriteFromFileSystem:

- _WriteThrough: routes through StoreFile with compressed cfg + FF on,
  verifies every frame in the returned FrameTable lands at its expected
  .frm path on the temp NFS dir. Mock inner runs compressStream with
  the sink pulled from opts, mirroring fs/GCS backends.
- _FlagOff_NoSink: asserts no FrameSink is attached when the
  EnableWriteThroughCacheFlag is false.
@cla-bot cla-bot Bot added the cla-signed label May 30, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

High Risk
Touches core sandbox create, pause, checkpoint, and storage upload behavior; incorrect flag handling could lose memory state or boot guests without expected RAM restore.

Overview
Adds a disk-only snapshot path and a rootfs reboot resume path so customers can persist filesystem state without VM memory, then start again from that rootfs (cold boot) instead of restoring RAM. Public API gains optional reboot on connect/resume and memory on pause/snapshot; the orchestrator honors gRPC metadata to skip memfile capture/upload and to boot from rootfs when requested or when memory artifacts are missing. Checkpoint after a memoryless snapshot resumes via the same rootfs path. Supporting changes include guest sync before disk-only pause, ext4 root flags tuned for journal replay on reboot, and offline dedup measurement tooling plus design notes (not production dedup yet).

Reviewed by Cursor Bugbot for commit 21bbc9f. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In packages/orchestrator/pkg/server/sandboxes.go, the constructed SandboxCreateRequest in the non-memory snapshot checkpoint path is missing the Sandbox configuration field, which will cause a nil pointer dereference when createSandboxFromRootfs attempts to retrieve the sandbox configuration via req.GetSandbox().

Comment thread packages/orchestrator/pkg/server/sandboxes.go
Reuse the stored sandbox config when checkpoint resumes through the rootfs reboot path.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Resume races rootfs reboot
    • Added check for memorySnapshotEnabled before falling back to rootfs reboot, preventing race condition when memory artifacts are still uploading.

Create PR

Or push these changes by commenting:

@cursor push 665ade9913
Preview (665ade9913)
diff --git a/packages/orchestrator/pkg/server/sandboxes.go b/packages/orchestrator/pkg/server/sandboxes.go
--- a/packages/orchestrator/pkg/server/sandboxes.go
+++ b/packages/orchestrator/pkg/server/sandboxes.go
@@ -208,10 +208,12 @@
 			req.GetSandbox(),
 		)
 		if errors.Is(err, storage.ErrObjectNotExist) {
-			telemetry.ReportEvent(ctx, "memory snapshot files missing, rebooting from rootfs")
-			rebootFromRootfs = true
-			childSpan.SetAttributes(attribute.Bool("sandbox.reboot_from_rootfs", true))
-			sbx, err = s.createSandboxFromRootfs(ctx, template, config, runtime, req)
+			if !memorySnapshotEnabled(ctx) {
+				telemetry.ReportEvent(ctx, "memory snapshot disabled, rebooting from rootfs")
+				rebootFromRootfs = true
+				childSpan.SetAttributes(attribute.Bool("sandbox.reboot_from_rootfs", true))
+				sbx, err = s.createSandboxFromRootfs(ctx, template, config, runtime, req)
+			}
 		}
 	}
 	if err != nil {

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 31db71f. Configure here.

Comment thread packages/orchestrator/pkg/server/sandboxes.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2707 4 2703 5
View the top 1 failed test(s) by shortest run time
::TestMain
Stack Traces | 0s run time
FAIL	github..../orchestrator/cmd/sample-dedup-gains [build failed]
View the full list of 3 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 57.75% (Passed 736 times, Failed 1006 times)

Stack Traces | 67.3s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (67.29s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 57.86% (Passed 726 times, Failed 997 times)

Stack Traces | 187s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1257}}
Executing command bash in sandbox id0g3mucsgexpafiqpmid (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 190 MB\nFree memory before tmpfs mount: 794 MB\nMemory to use in integrity test (60% of free, min 64MB): 476 MB\n"}}
Executing command bash in sandbox id0g3mucsgexpafiqpmid (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"476+0 records in\n476+0 records out\n499122176 bytes (499 MB, 476 MiB) copied, 2.12372 s, 235 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=476\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.09\n\tPercent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.12\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2648\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 343\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 14\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 670 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i81r8i4t1rjrmga4n2ox8
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1274}}
Executing command bash in sandbox ik88k60e4yx1r1hcrqgpq (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"e6e2e90ee4c7e490f94de4eb0976259e8c126cd268566e8712807dd66cbcc9ac\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox i81r8i4t1rjrmga4n2ox8
Executing command bash in sandbox ik88k60e4yx1r1hcrqgpq (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1277}}
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
Executing command bash in sandbox i81r8i4t1rjrmga4n2ox8 (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i81r8i4t1rjrmga4n2ox8: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (187.24s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxObjectNotFound

Flake rate in main: 42.21% (Passed 731 times, Failed 534 times)

Stack Traces | 0.01s run time
=== RUN   TestSandboxObjectNotFound
=== PAUSE TestSandboxObjectNotFound
=== CONT  TestSandboxObjectNotFound
    sandbox_object_not_found_test.go:65: 
        	Error Trace:	.../tests/orchestrator/sandbox_object_not_found_test.go:65
        	Error:      	Not equal: 
        	            	expected: 0x9
        	            	actual  : 0xd
        	Test:       	TestSandboxObjectNotFound
        	Messages:   	status code should be FailedPrecondition
--- FAIL: TestSandboxObjectNotFound (0.01s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@ValentaTomas ValentaTomas marked this pull request as ready for review May 30, 2026 22:42
@ValentaTomas
Copy link
Copy Markdown
Member Author

Superseded by the smaller clean-branch PR #2875.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants