Skip to content

feat(storage): rationalize read path OTEL#2831

Open
levb wants to merge 5 commits into
lev-storage-rationalizefrom
lev-gcs-read-ttfb-metric
Open

feat(storage): rationalize read path OTEL#2831
levb wants to merge 5 commits into
lev-storage-rationalizefrom
lev-gcs-read-ttfb-metric

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented May 27, 2026

Adds to #2570 which should be merged first.

Adds an orchestrator.read.* metric family with consistent attributes
(file_type/source/codec/outcome) covering each stage of a read, plus
per-layer chunker and build-file timers, so dashboards can attribute
latency end-to-end from sandbox-visible read to backend fetch.

Does not remove any of the prior metrics, this will be done separately
after the dashboards are updated.

Metrics:

  • orchestrator.file.read_at build.File.ReadAt - per-fault unit, aggregates all underlying mappings into one record
  • orchestrator.chunk.slice Chunker.Slice - per per-mapping unit, source=mmap on cache hit else the backend that served
  • orchestrator.read.open - OpenRangeReader (open / TTFB)
  • orchestrator.read.read - source-read wall time, raw bytes
  • orchestrator.read.decompress - decompress wall time + uncompressed bytes
  • orchestrator.read.fetch - total fetch wall time + uncompressed bytes delivered
  • orchestrator.read.writeback - NFS cache writeback wall + bytes
  • orchestrator.read.pipeline.efficiency - fetch / (open+read+decompress)
  • orchestrator.read.cache - NFS hit/miss/writeback events
  • orchestrator.read.inflight - concurrent fetches gauge

Spans:

  • chunk.fetch - runFetch goroutine span

@cla-bot cla-bot Bot added the cla-signed label May 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
2682 6 2676 5
View the full list of 6 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 43.04% (Passed 679 times, Failed 513 times)

Stack Traces | 40.6s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: il1uqi6k4femmyvjzdbnh
    sandbox_list_test.go:327: Created sandbox 2/12: iynz85pq2vhm7nzecml89
    sandbox_list_test.go:327: Created sandbox 3/12: ir5r9qqxlodvyk03tedyb
    sandbox_list_test.go:327: Created sandbox 4/12: it5ya0x9c8athritr3m24
    sandbox_list_test.go:327: Created sandbox 5/12: ixvor8ihcichylcxruiuo
    sandbox_list_test.go:327: Created sandbox 6/12: i6jk55e9zqzxlzf12ib9n
    sandbox_list_test.go:327: Created sandbox 7/12: i0awsvcidaxt4thshkb3q
    sandbox_list_test.go:327: Created sandbox 8/12: id7dkca4vtmeq3zlyj3fh
    sandbox_list_test.go:327: Created sandbox 9/12: iq8hpyvevn1xtq4phclry
    sandbox_list_test.go:327: Created sandbox 10/12: i1frtcrlwuoglke60rrd2
    sandbox_list_test.go:327: Created sandbox 11/12: iqbi6y089q890ixpdr3ra
    sandbox_list_test.go:327: Created sandbox 12/12: innxrnuo2epbpt27wsx16
--- FAIL: TestSandboxListPaginationRunningLargerLimit (40.64s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list

Flake rate in main: 43.39% (Passed 668 times, Failed 512 times)

Stack Traces | 0.01s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list
=== PAUSE TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list
=== CONT  TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list
    sandbox_list_test.go:339: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:339
        	Error:      	"[{0x159b9e5fe980 6532622b 2 1982 2026-05-28 22:44:38.57704133 +0000 UTC 0.6.1 512 0x159b9ec94200 innxrnuo2epbpt27wsx16 2026-05-28 22:44:08.57704133 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c3a8} {0x159b9e5fe9c0 6532622b 2 1982 2026-05-28 22:44:29.719252847 +0000 UTC 0.6.1 512 0x159b9ec94230 iqbi6y089q890ixpdr3ra 2026-05-28 22:43:59.719252847 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c3d8} {0x159b9e5fea00 6532622b 2 1982 2026-05-28 22:44:28.34354202 +0000 UTC 0.6.1 512 0x159b9ec94238 i1frtcrlwuoglke60rrd2 2026-05-28 22:43:58.34354202 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c408} {0x159b9e5fea40 6532622b 2 1982 2026-05-28 22:44:27.33804921 +0000 UTC 0.6.1 512 0x159b9ec94240 iq8hpyvevn1xtq4phclry 2026-05-28 22:43:57.33804921 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c438} {0x159b9e5fea80 6532622b 2 1982 2026-05-28 22:44:26.548858683 +0000 UTC 0.6.1 512 0x159b9ec94248 id7dkca4vtmeq3zlyj3fh 2026-05-28 22:43:56.548858683 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c468} {0x159b9e5feac0 6532622b 2 1982 2026-05-28 22:44:25.984175721 +0000 UTC 0.6.1 512 0x159b9ec94250 i0awsvcidaxt4thshkb3q 2026-05-28 22:43:55.984175721 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c498} {0x159b9e5feb00 6532622b 2 1982 2026-05-28 22:44:24.086771098 +0000 UTC 0.6.1 512 0x159b9ec94258 i6jk55e9zqzxlzf12ib9n 2026-05-28 22:43:54.086771098 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c4c8} {0x159b9e5feb40 6532622b 2 1982 2026-05-28 22:44:23.719559754 +0000 UTC 0.6.1 512 0x159b9ec94260 ixvor8ihcichylcxruiuo 2026-05-28 22:43:53.719559754 +0000 UTC running 2j6ly824owf4awgai1xo 0x159b9e89c4f8}]" should have 12 item(s), but has 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list
--- FAIL: TestSandboxListPaginationRunningLargerLimit/check_all_sandboxes_list (0.01s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit/check_paginated_list

Flake rate in main: 43.39% (Passed 668 times, Failed 512 times)

Stack Traces | 0.08s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit/check_paginated_list
=== PAUSE TestSandboxListPaginationRunningLargerLimit/check_paginated_list
=== CONT  TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:375: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:375
        	Error:      	Should NOT be empty, but was 
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:363: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:363
        	Error:      	Not equal: 
        	            	expected: "it5ya0x9c8athritr3m24"
        	            	actual  : "innxrnuo2epbpt27wsx16"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-it5ya0x9c8athritr3m24
        	            	+innxrnuo2epbpt27wsx16
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
        	Messages:   	page starting at 8 should start with sandbox it5ya0x9c8athritr3m24, token 
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:363: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:363
        	Error:      	Not equal: 
        	            	expected: "iynz85pq2vhm7nzecml89"
        	            	actual  : "i1frtcrlwuoglke60rrd2"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-iynz85pq2vhm7nzecml89
        	            	+i1frtcrlwuoglke60rrd2
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
        	Messages:   	page starting at 10 should start with sandbox iynz85pq2vhm7nzecml89, token MjAyNi0wNS0yOFQyMjo0Mzo1OS43MTkyNTI4NDdaX19pcWJpNnkwODlxODkwaXhwZHIzcmE=
    sandbox_list_test.go:368: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:368
        	Error:      	Not equal: 
        	            	expected: 12
        	            	actual  : 8
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
    sandbox_list_test.go:373: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:373
        	Error:      	Should be empty, but was MjAyNi0wNS0yOFQyMjo0Mzo1Ny4zMzgwNDkyMVpfX2lxOGhweXZldm4xeHRxNHBoY2xyeQ==
        	Test:       	TestSandboxListPaginationRunningLargerLimit/check_paginated_list
--- FAIL: TestSandboxListPaginationRunningLargerLimit/check_paginated_list (0.08s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 58.15% (Passed 673 times, Failed 935 times)

Stack Traces | 71.2s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (71.17s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 58.28% (Passed 663 times, Failed 926 times)

Stack Traces | 117s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1253}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 196 MB\nFree memory before tmpfs mount: 788 MB\nMemory to use in integrity test (80% of free, min 64MB): 630 MB\n"}}
Executing command bash in sandbox i2n9f6qkwsqzmr3k841ck (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"630+0 records in\n630+0 records out\n660602880 bytes (661 MB, 630 MiB) copied, 2.72625 s, 242 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=630\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.72\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.73\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2728\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 344\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 10\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 828 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox iq492v8qrnhh9hqz74law
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1270}}
Executing command bash in sandbox imzcqeik68772z666sgxo (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"9a763894c3ecc6f74321936202db17b3319c1995290c85e22583d5184c06dc03\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox iq492v8qrnhh9hqz74law
Executing command bash in sandbox imzcqeik68772z666sgxo (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1273}}
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
Executing command bash in sandbox iq492v8qrnhh9hqz74law (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox iq492v8qrnhh9hqz74law: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (116.64s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxAutoResumeViaProxy

Flake rate in main: 44.18% (Passed 667 times, Failed 528 times)

Stack Traces | 15.4s run time
=== RUN   TestSandboxAutoResumeViaProxy
=== PAUSE TestSandboxAutoResumeViaProxy
=== CONT  TestSandboxAutoResumeViaProxy
    auto_resume_test.go:97: [Status code: 502] Response body: {"sandboxId":"i0fx4xvowi4d5ym02n1m5","message":"The sandbox is running but port is not open","port":8000,"code":502}
    auto_resume_test.go:116: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:116
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestSandboxAutoResumeViaProxy
--- FAIL: TestSandboxAutoResumeViaProxy (15.39s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Adds an orchestrator.read.* metric family with consistent attributes
(file_type/source/codec/outcome) covering each stage of a read, plus
per-layer chunker and build-file timers, so dashboards can attribute
latency end-to-end from sandbox-visible read to backend fetch.

Does not remove any of the prior metrics, this will be done separately
after the dashboards are updated.

Metrics:
  - orchestrator.file.read_at               build.File.ReadAt — per-fault
                                            unit, aggregates all underlying
                                            mappings into one record
  - orchestrator.chunk.slice                Chunker.Slice — per per-mapping
                                            unit, source=mmap on cache hit
                                            else the backend that served
  - orchestrator.read.open                  OpenRangeReader (open / TTFB)
  - orchestrator.read.read                  source-read wall, compressed bytes
  - orchestrator.read.decompress            decompress CPU + uncompressed bytes
  - orchestrator.read.fetch                 total fetch wall + bytes delivered
  - orchestrator.read.writeback             NFS cache writeback wall + bytes
  - orchestrator.read.pipeline.efficiency   fetch / (open+read+decompress)
  - orchestrator.read.cache                 NFS hit/miss/writeback events
  - orchestrator.read.inflight              concurrent fetches gauge

Spans:
  - chunk.fetch                             runFetch goroutine span
@levb levb force-pushed the lev-gcs-read-ttfb-metric branch from f1d1973 to f7a0b35 Compare May 28, 2026 14:31
@levb levb marked this pull request as ready for review May 28, 2026 16:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8b2cff410

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/shared/pkg/storage/storage_aws.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/storage.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2035d4a86c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/shared/pkg/storage/storage_cache_seekable_compressed.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant