fix: Reap Dead Tmux Servers from SSE Poll Set#235
Merged
Conversation
…handling The daemon polled killed tmux sockets forever (a steady WARN drumbeat every ~2.5s) because the SSE poll loop only dropped a server on last-client-disconnect, never on socket death. Separately, the frontend never re-queried /api/servers after mount, so a user viewing a gone server saw frozen data with no indication it was gone. Fix (4 parts): - Add a shared tmux.IsServerGone sentinel helper; tmuxctl's matchesServerDeadText now delegates to it so the dead-server detection set is defined in exactly one place (Constitution III). - Make sseHub.poll reap dead servers: collect them during the snapshot iteration, then after the loop emit one server-gone SSE event and delete the server from the poll set and ALL per-server maps under a single write lock (never mid-range, never across FetchSessions). - Wire the frontend to react to server-gone: tear down the stream and re-query /api/servers so the absent server flips to the existing not-found view. - Add an onerror fallback that also re-queries servers, catching catastrophic socket deaths the backend could not signal. No new HTTP endpoints or verbs (Constitution IX preserved). Tests added on both ends.
There was a problem hiding this comment.
Pull request overview
Fixes runaway SSE polling against deleted tmux sockets by introducing shared dead-server detection, reaping dead servers from the SSE poll set (with an additive server-gone SSE event), and teaching the frontend to refresh /api/servers when a server disappears so the existing route guard can flip to the not-found view.
Changes:
- Backend: add
tmux.IsServerGone(err)and refactortmuxctlto delegate dead-server detection to the shared helper. - Backend: update SSE polling to detect dead-server fetch errors, emit a one-time
server-goneevent, and delete all per-server hub state so the server is no longer polled. - Frontend: handle
server-gone(and onerror fallback) by tearing down the stream and re-fetching/api/servers; add unit tests and update docs/memory.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/plan.md | Captures requirements/tasks/acceptance criteria for dead-server reaping + frontend refresh behavior. |
| fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/intake.md | Documents root cause analysis and the chosen “reap + server-gone event + refresh” design. |
| fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/.status.yaml | Tracks change pipeline status/metrics for this fix. |
| fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/.history.jsonl | Records pipeline stage transitions and commands for this change. |
| app/backend/internal/tmux/tmux.go | Adds shared IsServerGone(err) + sentinel list as single source of truth. |
| app/backend/internal/tmux/tmux_test.go | Adds unit coverage for IsServerGone sentinel matching. |
| app/backend/internal/tmuxctl/client.go | Removes local dead-server sentinels; delegates detection to tmux.IsServerGone. |
| app/backend/api/sse.go | Reaps dead servers during polling, emits server-gone, and clears per-server hub state. |
| app/backend/api/sse_test.go | Adds hub-level test verifying reap + server-gone emission + state cleanup. |
| app/frontend/src/contexts/session-context.tsx | Adds server-gone listener + onerror fallback to refresh servers and tear down stale streams. |
| app/frontend/src/contexts/session-context.test.tsx | Adds tests for server-gone handling and onerror → refresh fallback. |
| docs/memory/run-kit/ui-patterns.md | Documents server-gone → refreshServers → route-guard not-found flip and onerror fallback behavior. |
| docs/memory/run-kit/tmux-sessions.md | Documents new SSE poll-set lifecycle (connect vs. reap) and shared dead-server sentinel ownership. |
| docs/memory/run-kit/index.md | Updates memory index entries to reflect the new SSE reap + server-gone behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Meta
Pipeline: intake ✓ → apply ✓ → review ✓ → hydrate ✓ → ship → review-pr
Impact: +276/−21 code (excluding
fab/,docs/) · +802/−25 totalSummary
After a tmux server is killed, the rk daemon never stops polling its now-deleted socket — a steady WARN drumbeat every ~2.5s (
SSE poll error ... No such file or directory). The SSE poll loop only drops a server from its poll set on last-client-disconnect, never on socket death, so the dead server lingers inh.clientsand gets re-polled forever. Separately, the frontend never re-queries/api/serversafter mount, so a user viewing a gone server sees frozen session data with no indication the server is gone.Root cause
sseHub.pollderives its per-tick work-list fromh.clients. A server enters that map when a browser opensGET /api/sessions/stream?server=<name>and only leaves viaremoveClienton last-client-disconnect. Killing the tmux server does not disconnect the browser'sEventSource, so the dead server stays in the map; every tickFetchSessionsshells out totmux -L <name> list-sessions, the socket is gone, tmux exits 1, and the loop logs andcontinues — there is no server-liveness re-check after the initial connect. The frontend's existing pool-diff +resolveServerViewguard already turn a vanished server into a not-found view, but the frontend never re-queries the server list, so that path never fires.The fix (4 parts)
internal/tmux/tmux.go): new exportedIsServerGone(err error) boolbacked by a singleserverGoneTextsubstring set.tmuxctl.matchesServerDeadTextnow delegates to it, so dead-server detection is defined in exactly one place (Constitution III — Wrap, Don't Reinvent).api/sse.go): whenFetchSessionsreturns atmux.IsServerGoneerror, collect the server into a loop-localdeadServersslice during the snapshot iteration; after the loop, under a single write lock, emit oneevent: server-goneto that server's clients and delete it fromh.clientsand all per-server maps (cache,previousJSON,previousRealSessions,orderBootstrapAttempts,previousOrderJSON,perServerGen,eventDrivenServers). Mutation never happens mid-range over the snapshot, and the write lock is never held acrossFetchSessions. Re-registration is free — a reconnecting client re-adds the server viaaddClient, which re-spawns the goroutine when!h.polling.server-gonehandler (session-context.tsx): tear down the stream (clear timer, closeEventSource, drop the pool entry + state slice), then callfetchServers()to re-query/api/servers. The now-absent server drops from the list andresolveServerViewflips a viewer to the existingServerNotFoundview — no new UI component.session-context.tsx):markDisconnectednow also callsfetchServers(), catching catastrophic socket deaths the backend couldn't signal (e.g. daemon mid-restart). Theserver-goneevent is the sub-second fast path; the onerror refresh is the guaranteed-eventual (~3s) path. Both are idempotent.No new HTTP endpoints or verbs —
server-goneis an additive SSE event on the existingGET /api/sessions/streamchannel (Constitution IX preserved).Changes
internal/tmux/)api/sse.go)server-goneinSessionProvider(session-context.tsx)session-context.tsx)Testing
just test-backend— green (newIsServerGoneunit tests intmux_test.go; new SSE reap/server-gone-emission tests insse_test.go)just test-frontend— green (newserver-gone+ onerror-fallback tests insession-context.test.tsx)tsc --noEmit— green