Skip to content

perf(orchestrator): add memfile dedup density threshold#2862

Open
ValentaTomas wants to merge 4 commits into
mainfrom
valenta/memfile-dedup-density-threshold
Open

perf(orchestrator): add memfile dedup density threshold#2862
ValentaTomas wants to merge 4 commits into
mainfrom
valenta/memfile-dedup-density-threshold

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 29, 2026

Adds disabled-by-default controls for memfile dedup fetch fragmentation. When enabled, dedup can promote the cheapest parent-hit pages into the current diff to keep distinct non-empty backing fetch windows under a configured budget, without ever storing empty pages.

Config keys: maxFetchWindowsPerBlock, maxPromotedParentPagesPerBlock, and fetchRunWindowPages (all 0 = disabled/default window).

@cla-bot cla-bot Bot added the cla-signed label May 29, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented May 29, 2026

PR Summary

Medium Risk
Changes pause/snapshot memfile diff classification and what bytes are stored when dedup budgeting is enabled; defaults are off but mis-tuned flags could alter diff size and restore I/O patterns.

Overview
Adds optional fetch-window budgeting to memfile diff dedup so snapshot export can cap how fragmented parent-backed reads are. When memfile-diff-dedup is on and the new limits are non-zero, compare classifies each page (empty / parent match / must store in diff), counts distinct fetch windows per block (parent build+window vs current diff windows), then promotes selected parent-equal pages into the diff output until window count fits maxFetchWindowsPerBlock, bounded by maxPromotedParentPagesPerBlock. Window size defaults from compress frame size when fetchRunWindowPages is 0.

DedupBudget is threaded through sync dedup, memfd async dedup, and FC export memory on pause. Telemetry records promoted blocks/pages. Defaults keep prior behavior (all budget fields 0 = no promotion).

Reviewed by Cursor Bugbot for commit 7eed0aa. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

❌ 5 Tests Failed:

Tests completed Failed Passed Skipped
2704 5 2699 7
View the full list of 5 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 64.34% (Passed 745 times, Failed 1344 times)

Stack Traces | 21.3s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (21.33s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/8_remove_allow_keep_deny

Flake rate in main: 42.07% (Passed 727 times, Failed 528 times)

Stack Traces | 0.18s run time
=== RUN   TestUpdateNetworkConfig/8_remove_allow_keep_deny
Executing command curl in sandbox i4ve6itjob2u7w624dfy8
    sandbox_network_update_test.go:355: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:355: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:78
        	            				.../api/sandboxes/sandbox_network_update_test.go:87
        	            				.../api/sandboxes/sandbox_network_update_test.go:355
        	Error:      	"failed to execute command curl in sandbox i4ve6itjob2u7w624dfy8: invalid_argument: protocol error: incomplete envelope: unexpected EOF" does not contain "failed with exit code"
        	Test:       	TestUpdateNetworkConfig/8_remove_allow_keep_deny
        	Messages:   	Expected connection failure message
--- FAIL: TestUpdateNetworkConfig/8_remove_allow_keep_deny (0.18s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestCommandKillNextApp

Flake rate in main: 42.86% (Passed 736 times, Failed 552 times)

Stack Traces | 314s run time
=== RUN   TestCommandKillNextApp
=== PAUSE TestCommandKillNextApp
=== CONT  TestCommandKillNextApp
Executing command /bin/bash in sandbox ihespun4sp8aalf1zmpvv
    process_test.go:28: Command [npx] output: event:{start:{pid:1264}}
    process_test.go:28: Command [npx] output: event:{data:{stderr:"npm "}}
    process_test.go:28: Command [npx] output: event:{data:{stderr:"WARN exec The following package was not found and will be installed: create-next-app@16.2.6\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Creating a new Next.js app in .../home/user/nextapp.\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Using npm.\n\nInitializing project with template: app-tw \n\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\nInstalling dependencies:\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- next\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- react\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- react-dom\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\nInstalling devDependencies:\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- @tailwindcss/postcss\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- @types/node\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- @types/react\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- @types/react-dom\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- eslint\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- eslint-config-next\n- tailwindcss\n- typescript\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\n"}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:29: 
        	Error Trace:	.../tests/envd/process_test.go:29
        	Error:      	Received unexpected error:
        	            	failed to execute command npx in sandbox ibua0nwhpgaxjroqk5xhf: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestCommandKillNextApp
--- FAIL: TestCommandKillNextApp (313.86s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 57.71% (Passed 740 times, Failed 1010 times)

Stack Traces | 67.3s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (67.33s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 57.83% (Passed 730 times, Failed 1001 times)

Stack Traces | 202s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1271}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 191 MB\nFree memory before tmpfs mount: 793 MB\nMemory to use in integrity test (60% of free, min 64MB): 475 MB\n"}}
Executing command bash in sandbox i1mkj4jev76k43i28bsqh (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"475+0 records in\n475+0 records out\n498073600 bytes (498 MB, 475 MiB) copied, 2.21519 s, 225 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=475\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.20\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.22\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2732\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 345\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 24\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tPage size ("}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"byte"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s): "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 672 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i3dwmygcnn3ac3i211afz
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1287}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"a7a3b474cecf5dd317eac9fa2b564881d86684e42ea399cea93ac5d1896d7757\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox i3dwmygcnn3ac3i211afz
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1290}}
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
Executing command bash in sandbox i3dwmygcnn3ac3i211afz (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i3dwmygcnn3ac3i211afz: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (202.40s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

There are no critical findings or correctness issues identified in this pull request.

@ValentaTomas ValentaTomas marked this pull request as ready for review May 29, 2026 22:06
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d2a1dec8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from 6d2a1de to dc32af5 Compare May 29, 2026 23:13
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from dc32af5 to 45404d1 Compare May 29, 2026 23:47
Copy link
Copy Markdown
Contributor

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing/understanding)

This approach scans in 1 direction and "locally" - the decision on whether to do a parent frame fetch (or store the page locally) is made solely on a single child's relationship to it. If there are parent frames with many children references, they might still be tossed out by the single-child based criteria.

For now, I am asking to move deduplication into its own file, with a more apparent hook in chunk.go for deduplication. This would make it a little easier to compare strategies if a different one appears in the future.

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Copy link
Copy Markdown
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code flow is really complex. I would prefer simplifying (not a blocker though).

Please add more unit tests for the added functionality.

Comment thread packages/orchestrator/pkg/sandbox/fc/memory.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch 2 times, most recently from eb72ac0 to 59c2c0d Compare May 30, 2026 23:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59c2c0d958

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas force-pushed the valenta/memfile-dedup-density-threshold branch from 59c2c0d to e4f45c3 Compare May 31, 2026 00:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4f45c3798

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
@ValentaTomas ValentaTomas requested review from dobrac and levb May 31, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants