Skip to content

fix: Reap Dead Tmux Servers from SSE Poll Set#235

Merged
sahil-noon merged 2 commits into
mainfrom
260603-gs2t-reap-dead-tmux-servers-sse
Jun 3, 2026
Merged

fix: Reap Dead Tmux Servers from SSE Poll Set#235
sahil-noon merged 2 commits into
mainfrom
260603-gs2t-reap-dead-tmux-servers-sse

Conversation

@sahil-noon
Copy link
Copy Markdown
Collaborator

Meta

ID Type Confidence Plan Review
gs2t fix 3.8/5.0 9/9 tasks, 18/18 acceptance ✓ ✓ 1 cycle

Pipeline: intake ✓ → apply ✓ → review ✓ → hydrate ✓ → ship → review-pr

Impact: +276/−21 code (excluding fab/, docs/) · +802/−25 total

Summary

After a tmux server is killed, the rk daemon never stops polling its now-deleted socket — a steady WARN drumbeat every ~2.5s (SSE poll error ... No such file or directory). The SSE poll loop only drops a server from its poll set on last-client-disconnect, never on socket death, so the dead server lingers in h.clients and gets re-polled forever. Separately, the frontend never re-queries /api/servers after mount, so a user viewing a gone server sees frozen session data with no indication the server is gone.

Root cause

sseHub.poll derives its per-tick work-list from h.clients. A server enters that map when a browser opens GET /api/sessions/stream?server=<name> and only leaves via removeClient on last-client-disconnect. Killing the tmux server does not disconnect the browser's EventSource, so the dead server stays in the map; every tick FetchSessions shells out to tmux -L <name> list-sessions, the socket is gone, tmux exits 1, and the loop logs and continues — there is no server-liveness re-check after the initial connect. The frontend's existing pool-diff + resolveServerView guard already turn a vanished server into a not-found view, but the frontend never re-queries the server list, so that path never fires.

The fix (4 parts)

  • Shared sentinel helper (internal/tmux/tmux.go): new exported IsServerGone(err error) bool backed by a single serverGoneText substring set. tmuxctl.matchesServerDeadText now delegates to it, so dead-server detection is defined in exactly one place (Constitution III — Wrap, Don't Reinvent).
  • Reap in the poll loop (api/sse.go): when FetchSessions returns a tmux.IsServerGone error, collect the server into a loop-local deadServers slice during the snapshot iteration; after the loop, under a single write lock, emit one event: server-gone to that server's clients and delete it from h.clients and all per-server maps (cache, previousJSON, previousRealSessions, orderBootstrapAttempts, previousOrderJSON, perServerGen, eventDrivenServers). Mutation never happens mid-range over the snapshot, and the write lock is never held across FetchSessions. Re-registration is free — a reconnecting client re-adds the server via addClient, which re-spawns the goroutine when !h.polling.
  • Frontend server-gone handler (session-context.tsx): tear down the stream (clear timer, close EventSource, drop the pool entry + state slice), then call fetchServers() to re-query /api/servers. The now-absent server drops from the list and resolveServerView flips a viewer to the existing ServerNotFound view — no new UI component.
  • onerror fallback (session-context.tsx): markDisconnected now also calls fetchServers(), catching catastrophic socket deaths the backend couldn't signal (e.g. daemon mid-restart). The server-gone event is the sub-second fast path; the onerror refresh is the guaranteed-eventual (~3s) path. Both are idempotent.

No new HTTP endpoints or verbs — server-gone is an additive SSE event on the existing GET /api/sessions/stream channel (Constitution IX preserved).

Changes

  • Backend §1 — Shared dead-server detection helper (internal/tmux/)
  • Backend §2 — Reap dead servers in the SSE poll loop (api/sse.go)
  • Frontend §3 — Handle server-gone in SessionProvider (session-context.tsx)
  • Frontend §4 — onerror fallback (session-context.tsx)

Testing

  • just test-backend — green (new IsServerGone unit tests in tmux_test.go; new SSE reap/server-gone-emission tests in sse_test.go)
  • just test-frontend — green (new server-gone + onerror-fallback tests in session-context.test.tsx)
  • tsc --noEmit — green

sahil87 added 2 commits June 3, 2026 10:47
…handling

The daemon polled killed tmux sockets forever (a steady WARN drumbeat
every ~2.5s) because the SSE poll loop only dropped a server on
last-client-disconnect, never on socket death. Separately, the frontend
never re-queried /api/servers after mount, so a user viewing a gone
server saw frozen data with no indication it was gone.

Fix (4 parts):
- Add a shared tmux.IsServerGone sentinel helper; tmuxctl's
  matchesServerDeadText now delegates to it so the dead-server detection
  set is defined in exactly one place (Constitution III).
- Make sseHub.poll reap dead servers: collect them during the snapshot
  iteration, then after the loop emit one server-gone SSE event and
  delete the server from the poll set and ALL per-server maps under a
  single write lock (never mid-range, never across FetchSessions).
- Wire the frontend to react to server-gone: tear down the stream and
  re-query /api/servers so the absent server flips to the existing
  not-found view.
- Add an onerror fallback that also re-queries servers, catching
  catastrophic socket deaths the backend could not signal.

No new HTTP endpoints or verbs (Constitution IX preserved). Tests added
on both ends.
@sahil-noon sahil-noon requested a review from Copilot June 3, 2026 05:40
@sahil-noon sahil-noon marked this pull request as ready for review June 3, 2026 05:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes runaway SSE polling against deleted tmux sockets by introducing shared dead-server detection, reaping dead servers from the SSE poll set (with an additive server-gone SSE event), and teaching the frontend to refresh /api/servers when a server disappears so the existing route guard can flip to the not-found view.

Changes:

  • Backend: add tmux.IsServerGone(err) and refactor tmuxctl to delegate dead-server detection to the shared helper.
  • Backend: update SSE polling to detect dead-server fetch errors, emit a one-time server-gone event, and delete all per-server hub state so the server is no longer polled.
  • Frontend: handle server-gone (and onerror fallback) by tearing down the stream and re-fetching /api/servers; add unit tests and update docs/memory.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/plan.md Captures requirements/tasks/acceptance criteria for dead-server reaping + frontend refresh behavior.
fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/intake.md Documents root cause analysis and the chosen “reap + server-gone event + refresh” design.
fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/.status.yaml Tracks change pipeline status/metrics for this fix.
fab/changes/260603-gs2t-reap-dead-tmux-servers-sse/.history.jsonl Records pipeline stage transitions and commands for this change.
app/backend/internal/tmux/tmux.go Adds shared IsServerGone(err) + sentinel list as single source of truth.
app/backend/internal/tmux/tmux_test.go Adds unit coverage for IsServerGone sentinel matching.
app/backend/internal/tmuxctl/client.go Removes local dead-server sentinels; delegates detection to tmux.IsServerGone.
app/backend/api/sse.go Reaps dead servers during polling, emits server-gone, and clears per-server hub state.
app/backend/api/sse_test.go Adds hub-level test verifying reap + server-gone emission + state cleanup.
app/frontend/src/contexts/session-context.tsx Adds server-gone listener + onerror fallback to refresh servers and tear down stale streams.
app/frontend/src/contexts/session-context.test.tsx Adds tests for server-gone handling and onerror → refresh fallback.
docs/memory/run-kit/ui-patterns.md Documents server-gone → refreshServers → route-guard not-found flip and onerror fallback behavior.
docs/memory/run-kit/tmux-sessions.md Documents new SSE poll-set lifecycle (connect vs. reap) and shared dead-server sentinel ownership.
docs/memory/run-kit/index.md Updates memory index entries to reflect the new SSE reap + server-gone behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sahil-noon sahil-noon merged commit c280b92 into main Jun 3, 2026
6 checks passed
@sahil-noon sahil-noon deleted the 260603-gs2t-reap-dead-tmux-servers-sse branch June 3, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants