Skip to content

fix(runtime): restore sys.modules['__main__'] after run-mode kernel#9313

Closed
jacobcbeaudin wants to merge 2 commits intomarimo-team:mainfrom
jacobcbeaudin:fix/restore-main-module-run-mode
Closed

fix(runtime): restore sys.modules['__main__'] after run-mode kernel#9313
jacobcbeaudin wants to merge 2 commits intomarimo-team:mainfrom
jacobcbeaudin:fix/restore-main-module-run-mode

Conversation

@jacobcbeaudin
Copy link
Copy Markdown
Contributor

What

Closes #9293.

Run-mode (thread) kernels share sys.modules with the host process. patches.patch_main_module replaces sys.modules['__main__'] with a synthetic module and has no restore path, so after a SessionMode.RUN session ends, the host is left with the synthetic main. Any later code that reads sys.modules['__main__'] (notably multiprocessing.spawn.get_preparation_data during subprocess bootstrap, and ForkingPickler during target pickling) gets the wrong module. Three existing test-layer workarounds in this repo (test_asgi.py setUp/tearDown, test_session_manager.py::_preserve_main_module, conftest.py::_save_and_restore_main from #9286) currently paper over the leak.

How

  • marimo/_runtime/patches.py: adds save_main_module / restore_main_module, refcounted with a threading.Lock. First save captures the host's original __main__; last release restores it. Unbalanced save leaks the refcount for the process lifetime; unbalanced restore is a no-op. Debug-level logs on capture and restore so forensic traces can locate the save/restore points on a polluted host.
  • marimo/_session/managers/kernel.py:
    • KernelManagerImpl.__init__ initializes a per-instance flag and threading.Lock that together make the wrappers idempotent against double close_kernel and safe under concurrent callers.
    • Private methods _save_host_main_module, _restore_host_main_module, _stop_and_join_run_mode_kernel encapsulate the run-mode bookkeeping. start_kernel / close_kernel become short call sites that delegate to these methods.
    • close_kernel now bounds the kernel thread join at 5 seconds; on timeout it logs a warning and skips the __main__ restore rather than touching process state under a live cell.
    • start_kernel exception path balances the save if thread setup fails.
  • tests/_runtime/test_patches.py: 6 unit tests covering the refcount primitive (basic cycle, overlapping savers, over-release no-op, concurrent thread safety, absent __main__ edge case, end-to-end host spawn after save/patch/restore).
  • tests/_runtime/_patches_spawn_target.py: tiny helper module hosting a top-level noop_target function used by the end-to-end spawn test.
  • tests/_session/managers/test_kernel.py: 7 unit tests covering the application-layer wrappers (flag transition, idempotent save, save-without-restore leak contract, idempotent restore, concurrent-restore safety, restore-without-save no-op, full cycle integration).

Scope

  • EDIT-mode and IPC-mode kernel paths are unchanged (their mutation is contained in a subprocess that exits).
  • The pyodide patch_main_module call site is unchanged (WASM environment with no persistent host to pollute).
  • The process branch of close_kernel is unchanged (subprocess is terminated; host state restore is not meaningful).

Existing test-layer workarounds (test_asgi.py, test_session_manager.py::_preserve_main_module, conftest.py::_save_and_restore_main from #9286) are left in place; they can be retired in a follow-up once this lands.

Testing

Run on Linux (python:3.12.3-slim, uv sync --group dev --group test):

  • tests/_runtime/test_patches.py: all pass (11 pre-existing + 6 new; 1 skipped for polars).
  • tests/_session/managers/test_kernel.py: 7 new tests pass.
  • tests/_session/managers/: 23 pre-existing tests still pass.
  • tests/_messaging/test_thread_local_proxy.py: 21 tests pass.
  • tests/_server/export/test_exporter.py::test_run_until_completion_with_stop, ::test_run_until_completion_with_console_output: pass.

Not run: full tests/ suite, tests/_islands/, tests/_server/test_asgi.py, tests/_server/test_session_manager.py, frontend-dependent HTML export tests (need marimo/_static/index.html which is not produced by plain uv sync).

Notes for reviewers

  • Refcount pattern mirrors marimo/_messaging/thread_local_streams.py for consistency; I kept the module-level globals style rather than introducing a class wrapper.

  • Alternative implementations considered and rejected:

    • Try/finally wrapping launch_kernel's body (~290-line indent change; LIFO only).
    • patch_main_module_context around the kernel main loop (same indent cost; LIFO only).

    The refcount approach handles overlapping server sessions correctly, which the existing TODO inside patch_main_module already flags as a concern.

  • Happy to adjust naming, split tests, drop defensive ones, or reshape structure on review.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 21, 2026 9:09pm

Request Review

@jacobcbeaudin jacobcbeaudin force-pushed the fix/restore-main-module-run-mode branch from b28aff4 to 05db250 Compare April 21, 2026 20:56
Copy link
Copy Markdown
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jacobcbeaudin, the CI issues are resolved, so we (marimo team) no longer need this fix urgently.

Can you help me understand your motivation for attempting to fix this? Are you hitting this bug in real usage of marimo, or in tests?

If you are hitting it in real usage I will consider fixing it myself. This is not an easy area of marimo to contribute to.

@jacobcbeaudin
Copy link
Copy Markdown
Contributor Author

@akshayka thanks for the review, and understood on the complexity of this area. Motivation for the fix is on the issue: #9293 (comment).

Given #9304 is in flight and your point about this being a delicate area, closing this PR. If useful after #9304 lands, happy to contribute the two orthogonal pieces (end-to-end spawn-after-cycle test and the bounded close_kernel join) as separate small follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

patch_main_module leaks mutated sys.modules['__main__'] to the host

2 participants