Skip to content

improvement: fall back to disk storage when shared memory cannot allocate#9435

Draft
mscolnick wants to merge 6 commits intomainfrom
ms/fallback-virtual
Draft

improvement: fall back to disk storage when shared memory cannot allocate#9435
mscolnick wants to merge 6 commits intomainfrom
ms/fallback-virtual

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

@mscolnick mscolnick commented May 1, 2026

SharedMemoryStorage.store() had no error handling around shared_memory.SharedMemory(create=True, size=...), so an OSError (e.g. /dev/shm full) would crash the kernel and drop the virtual file. Add a FallbackStorage composite that tries each backend in order and falls back on OSError, plus a cross-process-safe DiskStorage backend; wire EDIT mode and the manager's ad-hoc cross-process reader to use both tiers.

`SharedMemoryStorage.store()` had no error handling around
`shared_memory.SharedMemory(create=True, size=...)`, so an OSError
(e.g. /dev/shm full) would crash the kernel and drop the virtual
file. Add a `FallbackStorage` composite that tries each backend in
order and falls back on `OSError`, plus a cross-process-safe
`DiskStorage` backend; wire EDIT mode and the manager's ad-hoc
cross-process reader to use both tiers.
Copilot AI review requested due to automatic review settings May 1, 2026 17:05
@vercel
Copy link
Copy Markdown

vercel Bot commented May 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment May 2, 2026 0:57am

Request Review

@mscolnick mscolnick requested a review from dmadisetti May 1, 2026 17:06
@mscolnick mscolnick added the enhancement New feature or request label May 1, 2026
@mscolnick mscolnick changed the title fix: fall back to disk storage when shared memory cannot allocate improvement: fall back to disk storage when shared memory cannot allocate May 1, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_runtime/virtual_file/storage.py">

<violation number="1" location="marimo/_runtime/virtual_file/storage.py:397">
P2: When all backends are stale, `last_err` stays `None` and `assert last_err is not None` raises an uninformative `AssertionError` (or, with `python -O`, `raise None` causes a `TypeError`). Replace the assertion with an explicit error that describes the failure mode.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant K as Kernel Subprocess
    participant FS_K as FallbackStorage (Kernel)
    participant SMS as Shared Memory (/dev/shm)
    participant DS as DiskStorage (/tmp)
    participant S as Server Main Process
    participant MGR as VirtualFileStorageManager (Server)
    participant FS_S as NEW: Ad-hoc FallbackReader

    Note over K, DS: Write Path (Kernel)
    K->>FS_K: store(key, bytes)
    FS_K->>SMS: store(key, bytes)
    
    alt OSError (e.g. /dev/shm full)
        SMS-->>FS_K: Raise OSError
        FS_K->>FS_K: Catch error & log warning
        FS_K->>DS: NEW: store(key, bytes)
        Note right of DS: Atomic write & rename
        DS-->>FS_K: Success
    else Success
        SMS-->>FS_K: Success
    end

    Note over DS, S: Read Path (Cross-Process)
    S->>MGR: read(key)
    MGR->>FS_S: NEW: _cross_process_storage().read()
    
    FS_S->>SMS: CHANGED: has(key)
    Note right of SMS: Probes segment name by key
    
    alt Key found in SHM
        SMS-->>FS_S: True
        FS_S->>SMS: read(key)
        SMS-->>FS_S: bytes
    else Key not in SHM
        SMS-->>FS_S: False
        FS_S->>DS: NEW: has(key)
        Note right of DS: Check if file exists in /tmp/marimo-vfs
        DS-->>FS_S: True
        FS_S->>DS: NEW: read(key)
        DS-->>FS_S: bytes
    end
    
    FS_S-->>MGR: bytes
    MGR-->>S: bytes (served via /@file)

    Note over K, DS: Cleanup
    K->>FS_K: shutdown()
    FS_K->>SMS: shutdown()
    FS_K->>DS: shutdown()
    Note right of DS: Only removes keys owned by this kernel instance
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread marimo/_runtime/virtual_file/storage.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a resilient, cross-process-capable virtual file storage stack for EDIT mode by introducing a disk-backed fallback when shared memory allocation fails (e.g., /dev/shm full), and wiring cross-process readers to probe both tiers.

Changes:

  • Add DiskStorage (temp-dir backed) and FallbackStorage (tiered composite) virtual file storage backends.
  • Update SharedMemoryStorage.has() to support cross-process existence probing by segment name.
  • Use the fallback stack in kernel EDIT mode and in VirtualFileStorageManager’s ad-hoc cross-process reader; add comprehensive tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
marimo/_runtime/virtual_file/storage.py Implements DiskStorage + FallbackStorage, adds cross-process probing, and updates manager cross-process reads to probe both tiers.
marimo/_runtime/context/kernel_context.py Switches EDIT mode storage to FallbackStorage([SharedMemoryStorage(), DiskStorage()]).
marimo/_runtime/virtual_file/__init__.py Exports the new storage backends from the virtual_file package.
tests/_runtime/test_storage.py Adds unit tests covering disk storage behavior and fallback routing/probing semantics.

Comment thread marimo/_runtime/virtual_file/storage.py
Comment thread marimo/_runtime/virtual_file/storage.py Outdated
return False
try:
shm = shared_memory.SharedMemory(name=key)
except FileNotFoundError:
mscolnick added 2 commits May 1, 2026 10:16
- Catch OSError in read_virtual_file{,_chunked} and surface as 404
  instead of letting it crash the streaming response with a 500.
- Isolate per-backend failures in FallbackStorage.shutdown so one
  bad backend doesn't block cleanup of the others.
- Reject path-traversal keys (slashes, "..", null bytes) in
  DiskStorage to prevent escape via the /@file/{...:path} endpoint.
- Broaden SharedMemoryStorage.has() to catch OSError/ValueError so
  hostile keys return False instead of crashing FallbackStorage probes.
- Replace the brittle `assert last_err is not None` in
  FallbackStorage.store with an explicit OSError covering the
  all-backends-stale case.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/_runtime/test_storage.py">

<violation number="1" location="tests/_runtime/test_storage.py:614">
P2: Use a valid marimo-shaped filename here; `any-name` is rejected before storage access, so this test never exercises the OSError fallback.</violation>

<violation number="2" location="tests/_runtime/test_storage.py:627">
P2: Use a valid marimo-shaped filename here; `any-name` is rejected before storage access, so this test never exercises the chunked OSError fallback.</violation>

<violation number="3" location="tests/_runtime/test_storage.py:767">
P3: Guard symlink creation here and skip when the platform/environment does not allow symlinks.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread tests/_runtime/test_storage.py Outdated
Comment thread tests/_runtime/test_storage.py Outdated
Comment thread tests/_runtime/test_storage.py
self._base_dir = (
Path(base_dir)
if base_dir is not None
else Path(tempfile.gettempdir()) / "marimo-vfs"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be verifying that this gets mapped to tmpfs on molab, otherwise this will be very slow when actually used. Currently maps to /tmp (expected) but our mounts should this to be on disk

dmadisetti
dmadisetti previously approved these changes May 1, 2026
Copy link
Copy Markdown
Collaborator

@dmadisetti dmadisetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment but not addressable here pinged @mchav

- Move _is_valid_vfile_name check out of the read_virtual_file_chunked
  generator body so an invalid filename raises before StreamingResponse
  starts sending headers (was causing "Caught handled exception, but
  response already started" on every platform).
- Update test_vfile_large_streaming to use a filename matching the
  random_filename pattern enforced by the validator.
- Skip test_disk_storage_read_chunked_survives_mid_stream_unlink on
  Windows: the test relies on POSIX inode semantics (open fd survives
  unlink) which Windows doesn't allow.
mscolnick added 2 commits May 1, 2026 17:32
Tests were using plain filenames like "data.json" and "any-name" that
don't pass the _VALID_VFILE_NAME regex, causing 404s before reaching
the storage layer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants