Skip to content

Correlate scratchpad completion with run_id#9350

Open
manzt wants to merge 2 commits intomainfrom
fix/scratchpad-run-id-correlation
Open

Correlate scratchpad completion with run_id#9350
manzt wants to merge 2 commits intomainfrom
fix/scratchpad-run-id-correlation

Conversation

@manzt
Copy link
Copy Markdown
Collaborator

@manzt manzt commented Apr 23, 2026

Closes #9302
Fixes #9255, and flips the 4 xfail integration tests from #9342 flip to passing.

The ScratchCellListener used to fire its done sentinel on the scratch cell's idle status (+ 50ms grace for flushing stderr/stdout). Anything broadcast after that grace was silently dropped from the SSE stream.

These changes add an optional run_id to ExecuteScratchpadCommand and CompletedRunNotification. The api endpoint and MCP mint a UUID and pass it to both the command and the ScratchCellListener; the listener now fires its sentinel only on the matching CompletedRunNotification.

Note: The reason we couldn't just observe CompletedRunNotification directly (#9302) is because the CompletedRunNotificationn from session.instantiate trips up the listener early (separate run).


Summary by cubic

Correlate each scratchpad run with a run_id and wait for the matching CompletedRunNotification before finishing. This prevents early success in /api/kernel/execute, streams compile-time errors, and simplifies the done SSE.

  • Bug Fixes

    • Add optional run_id to ExecuteScratchpadCommand and CompletedRunNotification; /api/kernel/execute and the MCP code server mint a UUID and ScratchCellListener waits for the matching completion, ignoring others.
    • Always emit CompletedRunNotification in a finally so listeners don’t hang if run_scratchpad raises.
    • Stream compile-time errors to stderr and keep console output, so SyntaxError diagnostics are visible before done.
    • Add integration tests for ctx.create_cell validation and the skip_validation=True path to cover early RuntimeError reporting and graph-state failures.
  • Migration

    • done SSE now returns { success, output } only; the error field was removed. On failure output is { mimetype: "text/plain", data: "" }.
    • OpenAPI and generated TS types updated: CompletedRunNotification.run_id?, ExecuteScratchpadCommand.runId?.

Written for commit a01e152. Summary will update on new commits.

Copilot AI review requested due to automatic review settings April 23, 2026 21:55
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 23, 2026 10:42pm

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label Apr 23, 2026
@manzt manzt added the enhancement New feature or request label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves scratchpad execution streaming by correlating “completion” with a specific scratchpad run via a run_id, so the SSE listener doesn’t terminate early and drop downstream reactive errors (fixing #9255 and enabling previously-xfail integration scenarios to pass).

Changes:

  • Add optional run_id correlation to ExecuteScratchpadCommand and CompletedRunNotification, and plumb it through the HTTP execute endpoint + MCP execute_code.
  • Update ScratchCellListener to treat a matching CompletedRunNotification(run_id=...) as the completion sentinel (instead of scratch-cell idle), and ensure completion is always broadcast via finally.
  • Standardize the terminal SSE done payload to {success, output} and update unit/integration tests + OpenAPI schema accordingly.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
marimo/_server/scratchpad.py Switch listener completion sentinel to CompletedRunNotification filtered by run_id; change done event shape to {success, output}.
marimo/_runtime/runtime.py Always broadcast CompletedRunNotification(run_id=...) for scratchpad via try/finally to prevent listeners from blocking forever.
marimo/_runtime/commands.py Add `run_id: str
marimo/_messaging/notification.py Add optional run_id field to CompletedRunNotification.
marimo/_server/api/endpoints/execution.py Mint a UUID run_id per /api/kernel/execute call and pass it to both the command and listener.
marimo/_mcp/code_server/main.py Mint/pass run_id for MCP execute_code so it waits for the correct completion event.
tests/_server/test_scratchpad.py Update listener construction (requires run_id) and update expectations for the new done payload shape + completion sentinel.
tests/_server/test_scratchpad_integration.py Flip previously-xfail scenarios to passing and update SSE snapshots to the new behavior.
packages/openapi/api.yaml Document/add run_id on CompletedRunNotification and runId on ExecuteScratchpad* schemas.
packages/openapi/src/api.ts Regenerated TS types to include the new run_id / runId fields.
marimo/_schemas/generated/notifications.yaml Regenerated notification schema reflecting CompletedRunNotification.run_id (and related schema updates).

Comment on lines +214 to +218
``success`` is false when the scratch cell itself errored OR any
downstream cell captured by the listener errored. The actual error
detail was already streamed via ``stderr`` events earlier in the
response — ``done`` carries only the success bit plus the scratch
cell's rendered output on success (empty on failure).
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_done_event no longer includes any error payload for failures and assumes the traceback/detail was already streamed via preceding stderr events. That assumption doesn't hold for scratchpad compile errors (e.g. MarimoSyntaxError from _try_compiling_cell), which are broadcast via CellNotificationUtils.broadcast_error as output=MARIMO_ERROR without emitting any stderr console output. With the current SSE shape, /api/kernel/execute callers may only see {success: false, output: {…}} with no error detail. Consider emitting a synthetic stderr SSE event when the scratch cell output channel is MARIMO_ERROR (or reintroducing a minimal error/errors field in the done payload) so failures always include actionable diagnostics.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +28
``{success: false, output: {mimetype: "text/plain", data: ""}}``;
error detail arrives earlier in the stream as ``stderr`` SSE events.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated module docstring states that failure cases always deliver error detail earlier via stderr SSE events and that the terminal done event is uniformly {success, output}. There isn't an integration test covering a scratchpad compile-time failure (e.g. a SyntaxError / MarimoSyntaxError from _try_compiling_cell), which historically may not emit console stderr. Adding a scenario like session.execute("x =") would validate the new contract and guard against silent failures.

Suggested change
``{success: false, output: {mimetype: "text/plain", data: ""}}``;
error detail arrives earlier in the stream as ``stderr`` SSE events.
``{success: false, output: {mimetype: "text/plain", data: ""}}``.
When the kernel emits error detail before ``done``, these snapshots
assert it through earlier SSE events such as ``stderr``.

Copilot uses AI. Check for mistakes.
Fixes #9255. The ``ScratchCellListener`` used to fire its done
sentinel on the scratch cell's ``idle`` status, relying on a 50ms
grace for trailing output. Anything broadcast after that grace (most
commonly an ``mo.state`` setter whose reactive descendants are
flushed *after* the scratch runner returns, per
``Kernel.run_scratchpad``) was silently dropped from the SSE stream —
``/api/kernel/execute`` would return ``success: true`` while a
downstream cell was in an exception state.

``ExecuteScratchpadCommand`` and ``CompletedRunNotification`` gain an
optional ``run_id``. ``/api/kernel/execute`` and the MCP code server
mint a UUID and pass it to both the command and the
``ScratchCellListener``; the listener now fires its sentinel only on
the matching ``CompletedRunNotification``. Unrelated completions
(from the ``session.instantiate`` call the endpoint makes first, or
from concurrent browser activity) are ignored instead of tripping
the listener early.

``handle_execute_scratchpad`` broadcasts its completion in a
``try/finally`` so a raising ``run_scratchpad`` can't leave the
listener blocked indefinitely.

The ``done`` SSE event is reshaped to a single ``{success, output}``
form. The ``error`` field is removed — the traceback is already in
preceding ``stderr`` events, so duplicating it on ``done`` was
redundant. On failure, ``output`` is ``{mimetype: "text/plain",
data: ""}``. ``execute-code.sh`` is unaffected: it reads
``.output.data // empty`` for the success path and ``.success`` for
the exit code.

The four xfail integration tests from #9342 flip to passing.
Adds two integration tests that cover the `ctx.create_cell` validation
surface: the default dry-run compile (raises early on multiply-defined
names with the `skip_validation` hint) and the `skip_validation=True`
bypass.
@manzt manzt force-pushed the fix/scratchpad-run-id-correlation branch from 6165fd9 to a01e152 Compare April 23, 2026 22:40
Comment on lines +625 to +683


def test_ctx_create_cell_multiply_defined(session: _Session) -> None:
"""``ctx.create_cell`` introducing a duplicate definition errors early.

The notebook already has ``x = 10`` in ``cell_a``. code_mode's
dry-run compile detects the new cell would multiply-define ``x``
and raises ``RuntimeError`` before any real mutation — the new
cell is never registered and ``x`` stays ``10``.
"""
session.setup_cells(["cell_a"], ["x = 10"])

lines = session.execute(
"import marimo._code_mode as cm\n"
"async with cm.get_context() as ctx:\n"
' ctx.create_cell("x = 20")',
)

assert lines == snapshot(
[
"event: stderr",
'data: {"data": "Traceback (most recent call last):\\n File \\"<marimo>/marimo/_runtime/executor.py\\", line N, in execute_cell_async\\n await eval(cell.body, glbls)\\n File \\"<tmp>\\", line 2, in <module>\\n async with cm.get_context() as ctx:\\n File \\"<marimo>/marimo/_code_mode/_context.py\\", line N, in __aexit__\\n self._dry_run_compile(ops)\\n File \\"<marimo>/marimo/_code_mode/_context.py\\", line N, in _dry_run_compile\\n raise RuntimeError(\\n )\\nRuntimeError: Multiply-defined names:\\n - \'x\' is already defined in cell \'cell_a\' (cell_a)\\n\\nTo skip validation, use: async with cm.get_context(skip_validation=True) as ctx\\n"}',
"",
"event: done",
'data: {"success": false, "output": {"mimetype": "text/plain", "data": ""}}',
"",
]
)


def test_ctx_create_cell_skip_validation(session: _Session) -> None:
"""``skip_validation=True`` bypasses the dry-run compile check.

Same setup as above, but the caller opts out of validation — no
``RuntimeError`` is raised and the new cell is registered, as
evidenced by the ``created cell`` stdout summary. The kernel's own
graph-validity pass still flags the resulting multiply-defined
state (visible to the listener as a child error, hence
``success: false``), but no traceback reaches stderr because the
new cell never runs — the error is a pure graph-state marker.
"""
session.setup_cells(["cell_a"], ["x = 10"])

lines = session.execute(
"import marimo._code_mode as cm\n"
"async with cm.get_context(skip_validation=True) as ctx:\n"
' ctx.create_cell("x = 20")',
)

assert lines == snapshot(
[
"event: stdout",
'data: {"data": "created cell \'<cid>\'\\n"}',
"",
"event: done",
'data: {"success": false, "output": {"mimetype": "text/plain", "data": ""}}',
"",
]
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscolnick some drive-by, additional code mode tests.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Bundle Report

Changes will increase total bundle size by 25.11MB (100.0%) ⬆️⚠️, exceeding the configured threshold of 5%.

Bundle name Size Change
marimo-esm 25.11MB 25.11MB (100%) ⬆️⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

marimo pair: Claude Code doesn't see cell errors automatically

3 participants