improvement: fall back to disk storage when shared memory cannot allocate#9435
improvement: fall back to disk storage when shared memory cannot allocate#9435
Conversation
`SharedMemoryStorage.store()` had no error handling around `shared_memory.SharedMemory(create=True, size=...)`, so an OSError (e.g. /dev/shm full) would crash the kernel and drop the virtual file. Add a `FallbackStorage` composite that tries each backend in order and falls back on `OSError`, plus a cross-process-safe `DiskStorage` backend; wire EDIT mode and the manager's ad-hoc cross-process reader to use both tiers.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
1 issue found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_runtime/virtual_file/storage.py">
<violation number="1" location="marimo/_runtime/virtual_file/storage.py:397">
P2: When all backends are stale, `last_err` stays `None` and `assert last_err is not None` raises an uninformative `AssertionError` (or, with `python -O`, `raise None` causes a `TypeError`). Replace the assertion with an explicit error that describes the failure mode.</violation>
</file>
Architecture diagram
sequenceDiagram
participant K as Kernel Subprocess
participant FS_K as FallbackStorage (Kernel)
participant SMS as Shared Memory (/dev/shm)
participant DS as DiskStorage (/tmp)
participant S as Server Main Process
participant MGR as VirtualFileStorageManager (Server)
participant FS_S as NEW: Ad-hoc FallbackReader
Note over K, DS: Write Path (Kernel)
K->>FS_K: store(key, bytes)
FS_K->>SMS: store(key, bytes)
alt OSError (e.g. /dev/shm full)
SMS-->>FS_K: Raise OSError
FS_K->>FS_K: Catch error & log warning
FS_K->>DS: NEW: store(key, bytes)
Note right of DS: Atomic write & rename
DS-->>FS_K: Success
else Success
SMS-->>FS_K: Success
end
Note over DS, S: Read Path (Cross-Process)
S->>MGR: read(key)
MGR->>FS_S: NEW: _cross_process_storage().read()
FS_S->>SMS: CHANGED: has(key)
Note right of SMS: Probes segment name by key
alt Key found in SHM
SMS-->>FS_S: True
FS_S->>SMS: read(key)
SMS-->>FS_S: bytes
else Key not in SHM
SMS-->>FS_S: False
FS_S->>DS: NEW: has(key)
Note right of DS: Check if file exists in /tmp/marimo-vfs
DS-->>FS_S: True
FS_S->>DS: NEW: read(key)
DS-->>FS_S: bytes
end
FS_S-->>MGR: bytes
MGR-->>S: bytes (served via /@file)
Note over K, DS: Cleanup
K->>FS_K: shutdown()
FS_K->>SMS: shutdown()
FS_K->>DS: shutdown()
Note right of DS: Only removes keys owned by this kernel instance
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
Pull request overview
Adds a resilient, cross-process-capable virtual file storage stack for EDIT mode by introducing a disk-backed fallback when shared memory allocation fails (e.g., /dev/shm full), and wiring cross-process readers to probe both tiers.
Changes:
- Add
DiskStorage(temp-dir backed) andFallbackStorage(tiered composite) virtual file storage backends. - Update
SharedMemoryStorage.has()to support cross-process existence probing by segment name. - Use the fallback stack in kernel EDIT mode and in
VirtualFileStorageManager’s ad-hoc cross-process reader; add comprehensive tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
marimo/_runtime/virtual_file/storage.py |
Implements DiskStorage + FallbackStorage, adds cross-process probing, and updates manager cross-process reads to probe both tiers. |
marimo/_runtime/context/kernel_context.py |
Switches EDIT mode storage to FallbackStorage([SharedMemoryStorage(), DiskStorage()]). |
marimo/_runtime/virtual_file/__init__.py |
Exports the new storage backends from the virtual_file package. |
tests/_runtime/test_storage.py |
Adds unit tests covering disk storage behavior and fallback routing/probing semantics. |
| return False | ||
| try: | ||
| shm = shared_memory.SharedMemory(name=key) | ||
| except FileNotFoundError: |
- Catch OSError in read_virtual_file{,_chunked} and surface as 404
instead of letting it crash the streaming response with a 500.
- Isolate per-backend failures in FallbackStorage.shutdown so one
bad backend doesn't block cleanup of the others.
- Reject path-traversal keys (slashes, "..", null bytes) in
DiskStorage to prevent escape via the /@file/{...:path} endpoint.
- Broaden SharedMemoryStorage.has() to catch OSError/ValueError so
hostile keys return False instead of crashing FallbackStorage probes.
- Replace the brittle `assert last_err is not None` in
FallbackStorage.store with an explicit OSError covering the
all-backends-stale case.
There was a problem hiding this comment.
3 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tests/_runtime/test_storage.py">
<violation number="1" location="tests/_runtime/test_storage.py:614">
P2: Use a valid marimo-shaped filename here; `any-name` is rejected before storage access, so this test never exercises the OSError fallback.</violation>
<violation number="2" location="tests/_runtime/test_storage.py:627">
P2: Use a valid marimo-shaped filename here; `any-name` is rejected before storage access, so this test never exercises the chunked OSError fallback.</violation>
<violation number="3" location="tests/_runtime/test_storage.py:767">
P3: Guard symlink creation here and skip when the platform/environment does not allow symlinks.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| self._base_dir = ( | ||
| Path(base_dir) | ||
| if base_dir is not None | ||
| else Path(tempfile.gettempdir()) / "marimo-vfs" |
There was a problem hiding this comment.
I think it might be verifying that this gets mapped to tmpfs on molab, otherwise this will be very slow when actually used. Currently maps to /tmp (expected) but our mounts should this to be on disk
dmadisetti
left a comment
There was a problem hiding this comment.
Left a comment but not addressable here pinged @mchav
- Move _is_valid_vfile_name check out of the read_virtual_file_chunked generator body so an invalid filename raises before StreamingResponse starts sending headers (was causing "Caught handled exception, but response already started" on every platform). - Update test_vfile_large_streaming to use a filename matching the random_filename pattern enforced by the validator. - Skip test_disk_storage_read_chunked_survives_mid_stream_unlink on Windows: the test relies on POSIX inode semantics (open fd survives unlink) which Windows doesn't allow.
Tests were using plain filenames like "data.json" and "any-name" that don't pass the _VALID_VFILE_NAME regex, causing 404s before reaching the storage layer.
SharedMemoryStorage.store()had no error handling aroundshared_memory.SharedMemory(create=True, size=...), so an OSError (e.g. /dev/shm full) would crash the kernel and drop the virtual file. Add aFallbackStoragecomposite that tries each backend in order and falls back onOSError, plus a cross-process-safeDiskStoragebackend; wire EDIT mode and the manager's ad-hoc cross-process reader to use both tiers.