ds4-server: SSE keepalive during decode by Allen091080 · Pull Request #245 · antirez/ds4

Allen091080 · 2026-05-25T05:13:53Z

Summary

Small follow-up to f91c12b (prefill keepalive via prefill_display).

The prefill side is now well-covered. Decode can still go quiet for tens of seconds when:

the model is mid-thinking — <think>...</think> is open and no visible text has been flushed;
the model is accumulating a large tool_use input JSON, which is held back until the block closes.

sse_chunk only fires when there is actual streamable text, so the socket sees nothing during those stretches. Past the client's TCP idle threshold (10-60 s on most HTTP libraries), the next sse_chunk call records client stream write failed and the turn errors out — same failure shape that prompted issue #222 on the prefill side, just on the other end of the turn.

Fix

In the decode loop, when j->req.stream is set, emit a : decode\n\n SSE comment line at most every 15 s:

if (j->req.stream) {
    double now_kp = now_sec();
    if (now_kp - decode_last_keepalive >= 15.0) {
        static const char ka[] = ": decode\n\n";
        if (!send_all(j->fd, ka, sizeof(ka) - 1)) {
            finish = "error";
            snprintf(err, sizeof(err),
                     "client stream write failed during decode heartbeat");
            break;
        }
        decode_last_keepalive = now_kp;
    }
}

15 s mirrors the prefill cadence and sits inside common 30-60 s client idle thresholds. Failed write ends the turn via the existing client stream write failed path — no new failure mode for callers.

Scope (intentionally minimal)

No watchdog thread.
No _exit.
No struct fields added.
Only one new local variable (decode_last_keepalive).
Does not cover GPU/Metal kernel hangs inside ds4_session_* — out of scope.

Verification

Machine: MacBook Pro M5 Max, 128 GiB RAM
Backend: Metal
Model: DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf (q2-imatrix)
Server: ./ds4-server --host 0.0.0.0 --port 8000 --ctx 500000 --kv-disk-dir … --kv-disk-space-mb 204800

make clean, no new warnings.
./ds4_test --server passes (server: OK / ds4 tests: ok).
Streamed completions where the model thinks for 30+ s now show : decode comment lines on the wire every 15 s, no client disconnect.

Test plan

CI runs ./ds4_test --server
Manual: chat request that forces a long <think> phase or large tool_use input; observe : decode comment lines on the SSE wire and no client stream write failed errors at the end.

The prefill keepalive added in f027269 and refined in f91c12b (`prefill_display` events) keeps the connection alive while the model is processing input. Once decode starts, the connection can still go quiet for tens of seconds at a time: * the model is mid-thinking — `<think>...</think>` is open and no visible text has been flushed to the client yet; * the model is accumulating a large tool_use input JSON, which is held back until the block closes. `sse_chunk` only fires when there is actual streamable text, so during those stretches no bytes go to the client. Once a client-side TCP idle-timeout (10-60 s on most HTTP libraries) elapses, the socket is torn down and the next `sse_chunk` call records `client stream write failed`, ending the turn with an error. Add a small wall-clock keepalive in the decode loop: when `j->req.stream` is set, emit a `: decode\n\n` SSE comment line at most every 15 seconds. The 15 s cadence matches the prefill keepalive and sits comfortably inside common 30-60 s client idle thresholds. A failed write here ends the turn with the same `client stream write failed` reason the regular event writer uses, so callers see no new failure mode. This is intentionally a small follow-up to f91c12b — no watchdog thread, no `_exit`, no new state outside the local variable `decode_last_keepalive`. It only addresses decode silence, not GPU stalls inside `ds4_session_*` calls. Verified on macOS Metal, q2-imatrix GGUF: - clean `make` build, no new warnings; - `./ds4_test --server` passes; - streamed completions during long thinking phases now see a `: decode\n\n` comment every 15 s on the wire instead of silence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ds4-server: SSE keepalive during decode#245

ds4-server: SSE keepalive during decode#245
Allen091080 wants to merge 1 commit into
antirez:mainfrom
Allen091080:decode-stream-keepalive

Allen091080 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Allen091080 commented May 25, 2026

Summary

Fix

Scope (intentionally minimal)

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant