fix(frontend): make disconnect indicator clickable to reconnect#9410
Open
bestvibes wants to merge 2 commits intomarimo-team:mainfrom
Open
fix(frontend): make disconnect indicator clickable to reconnect#9410bestvibes wants to merge 2 commits intomarimo-team:mainfrom
bestvibes wants to merge 2 commits intomarimo-team:mainfrom
Conversation
Pulls in upstream fix partykit/partykit#322: `_connectLock` was leaked in the max-retries early-return of `_connect()`, so once partysocket exhausted its retry budget every subsequent `reconnect()` call was a silent no-op because the lock was permanently held. This made any manual or programmatic reconnect attempt fail after a long outage, forcing a full page reload to recover. Patch-version bump, no API surface changes.
When marimo runs over an SSH-forwarded tunnel and the WebSocket drops (tunnel reset, laptop sleep, NAT idle reaping), the only way to recover today is a full page reload. The kernel survives the WS drop in edit mode and the server already supports session resume on a fresh connection — only the frontend was forcing the reload. - Disconnect indicator is now a `<button>` when the connection is in a recoverable CLOSED state. Click invokes a new `reconnect()` exposed by `useMarimoKernelConnection` that resets the one-shot reconnect gate and kicks off `ws.reconnect()`. - `reconnect()` probes the runtime via `runtimeManager.isHealthy()` before reopening the socket so a click while the tunnel is dead fails in ~1s instead of burning partysocket's full retry budget. - The `onClose` default branch now transitions to CLOSED + KERNEL_DISCONNECTED when partysocket has exhausted `maxRetries`, instead of leaving the UI stuck on the spinner indefinitely. - Close-event classification extracted into a pure helper (`classifyCloseEvent`) with unit tests covering each terminal reason and the retry-budget exhaustion path. Closes marimo-team#9385
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
Contributor
There was a problem hiding this comment.
No issues found across 13 files
Architecture diagram
sequenceDiagram
participant User as User (click)
participant Status as StatusOverlay (status.tsx)
participant Hook as useMarimoKernelConnection
participant CloseHandler as close-handler.ts
participant Transport as ReconnectingWebSocket (partysocket)
participant Runtime as Runtime Manager (/health)
participant Backend as Backend Server
Note over User,Backend: Manual Reconnect Flow (NEW)
User->>Status: click disconnect icon (<button>)
Status->>Status: NEW: button is rendered, onReconnect passed
Status->>Hook: NEW: reconnect()
Hook->>Hook: NEW: shouldTryReconnecting = true
Hook->>Hook: NEW: setConnection(CONNECTING)
Hook->>Runtime: NEW: isHealthy() → GET /health
alt Health check succeeds
Runtime-->>Hook: 200 OK
Note over Runtime,Hook: CHANGED: setDOMBaseUri() side effect
Hook->>Transport: NEW: ws.reconnect()
Transport->>Transport: NEW: partysocket bugfix (1.1.13) unlocks _connectLock
Transport->>Backend: WebSocket upgrade
Backend-->>Transport: onOpen
Transport-->>Hook: onOpen callback
Hook->>Hook: shouldTryReconnecting = false
Hook-->>Status: setConnection(OPEN)
else Health check fails (server unreachable)
Runtime-->>Hook: 4xx/5xx/timeout
Hook->>Hook: NEW: setConnection(CLOSED, KERNEL_DISCONNECTED)
Hook-->>Status: button stays disabled, icon shown
end
Note over Transport,Hook: Automatic Retry with Budget Tracking (CHANGED)
Transport->>Backend: WebSocket close event
Transport-->>Hook: onClose(event)
Hook->>CloseHandler: classifyCloseEvent(event, {retryCount, maxRetries})
alt Terminal reason (MARIMO_ALREADY_CONNECTED, MARIMO_SHUTDOWN, etc.)
CloseHandler-->>Hook: { kind: "terminal", status: CLOSED, closeTransport }
Hook->>Transport: ws.close() to prevent further reconnects
Hook-->>Status: setConnection(CLOSED with appropriate reason)
else Retry budget exhausted (retryCount >= MAX_RETRIES=10)
CloseHandler-->>Hook: { kind: "gave-up", status: CLOSED, KERNEL_DISCONNECTED }
Hook-->>Status: CHANGED: setConnection(CLOSED) instead of CONNECTING
Note over Hook,Status: Spinner replaced by clickable disconnect icon
else Transient close (retries remaining)
CloseHandler-->>Hook: { kind: "retry", status: CONNECTING }
Hook->>Transport: tryReconnecting(code, reason)
Transport->>Backend: automatic partysocket retry
end
Note over Status,User: Disconnect Icon Behavior (CHANGED)
alt Connection is CLOSED + KERNEL_DISCONNECTED + onReconnect provided
Status-->>User: NEW: <button> with "click to reconnect" tooltip
else Connection is CLOSED + no onReconnect
Status-->>User: disabled <button>
else Connection is CLOSED + ALREADY_RUNNING (canTakeover)
Status-->>User: LockedIcon (no reconnect button)
end
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Summary
Closes #9385
This pull request was primarily authored by a coding agent.
Problem
When marimo is accessed over an SSH-forwarded tunnel (or any flaky link), the kernel WebSocket drops on tunnel reset or laptop sleep. The browser shows the broken-link indicator in the top-left, and the only recovery is a full page reload — disruptive because it resets scroll position, cell focus, and any unsaved client state, despite the kernel surviving the WS drop in edit mode and the server already supporting session resume on a fresh connection (
marimo/_server/api/endpoints/ws/ws_session_connector.py).The frontend was forcing the reload because of two compounding issues:
useMarimoKernelConnection.shouldTryReconnectingonly resets on a successfulonOpen. After partysocket's 10-retry budget is exhausted, the gate stays closed andws.reconnect()is never called again from our code.partysocket@1.1.10,_connectLockleaks in the_connect()max-retries early-return — even if the gate were reset,ws.reconnect()would silently no-op because the lock is permanently held. Fixed upstream in partykit#322, released as partysocket 1.1.13.onClosedefault branch always sets state toCONNECTING, even after partysocket has exhausted its retry budget — the spinner outlives the actual retry loop with no escape.The broken-link icon also gives the visual impression of being clickable but is a no-op
<div>today.What changed
This is two commits — a dependency bump (precondition) followed by the frontend fix.
chore(frontend): bump partysocket 1.1.10 → 1.1.13Patch-version bump pulling in the upstream
_connectLockfix. No API-surface changes — verified thatretryCount,reconnect(), andmaxRetriessignatures are identical between 1.1.10 and 1.1.13. The lockfile diff is exactly 6 lines, all about partysocket.fix(frontend): make disconnect indicator clickable to reconnect<button>(status.tsx). Native button gives Enter/Space activation and accessibility; previously it was a<div>with a tooltip. When the connection is in a recoverable CLOSED state, clicking calls a newreconnect()exposed byuseMarimoKernelConnection. When the connection isCLOSEDfor any other reason, the button isdisabled.reconnect()probes the runtime first via the existingruntimeManager.isHealthy()(/health). If the server is genuinely unreachable (e.g. the SSH tunnel is dead), we transition straight back toCLOSED + KERNEL_DISCONNECTEDin ~1s instead of burning partysocket's full retry budget on the silent click. The probe also resets the one-shot reconnect gate so subsequent automatic retries work.CLOSEDwhen partysocket has given up. In theonClosedefault branch, whenws.retryCount >= MAX_RETRIES, transition toCLOSED + KERNEL_DISCONNECTEDinstead ofCONNECTING. This makes the disconnect icon reappear and become clickable again, so the user can manually retry without reloading. Without this, the spinner stayed up forever even after partysocket had stopped trying.event.reasoninonCloseis extracted into a pureclassifyCloseEventhelper (close-handler.ts) so the regression behavior can be unit-tested. The hook is now thin glue: classify → setConnection → optionally close transport or calltryReconnecting.close-handler.test.ts(15 cases) covers each terminal reason plus the retry-budget-exhaustion path.status.test.tsx(3 cases) covers the click → callback wiring and the disabled state.Heads-up for review
runtimeManager.isHealthy()has a side effect (setDOMBaseUri()on success). It was previously only triggered during initial startup; now it can fire on each manual reconnect. Almost certainly a no-op in practice (URL hasn't changed) but flagging in case you want it gated.IConnectionTransport.retryCountwas added to the transport interface so the close handler can detect partysocket give-up. This leaks a partysocket-specific concept into a transport-agnostic interface; an alternative would be to have the transport emit a synthetic close event with a "gave-up" reason. Open to that direction if preferred — current approach was chosen for minimal diff."kernel not found"/ tooltip text reuse existing copy; only new string is the tooltip suffix"— click to reconnect".📋 Pre-Review Checklist
✅ Merge Checklist
I have read the CLA Document and I hereby sign the CLA