Follow-up from #99. The reconnect backoff/blackout decision logic in relayLoop (src/spider.zig) has no unit coverage, so regressions like the one fixed in #99 (productive sessions escalating into blackouts) can only be caught by running against live relays.
Problem
The productive / quick-disconnect / blackout classification is interleaved with interruptibleSleep, network calls, and milliTimestamp(), so it cannot be tested in isolation. #99 reached readLoop with the wrong event count and silently relocated the bug; a pure decision function would have caught both.
Fix direction
Extract a pure function, e.g.:
fn classifyOutcome(success: bool, last_session_events: u64, connection_duration: i64) Action
returning the next reconnect-delay / sleep / blackout decision, and unit-test the boundaries:
- productive short session (events > 0, duration < QUICK_DISCONNECT_MS) -> reset delay, no escalation
- unproductive quick disconnect (events == 0, duration < QUICK_DISCONNECT_MS) -> escalate
- long uptime, no events -> reset delay
- connection failure -> escalate
- Nth consecutive failure -> blackout boundary (currently ~5 failures with MAX_RECONNECT_DELAY_MS = 5m)
Notes
Follow-up from #99. The reconnect backoff/blackout decision logic in
relayLoop(src/spider.zig) has no unit coverage, so regressions like the one fixed in #99 (productive sessions escalating into blackouts) can only be caught by running against live relays.Problem
The productive / quick-disconnect / blackout classification is interleaved with
interruptibleSleep, network calls, andmilliTimestamp(), so it cannot be tested in isolation. #99 reachedreadLoopwith the wrong event count and silently relocated the bug; a pure decision function would have caught both.Fix direction
Extract a pure function, e.g.:
returning the next reconnect-delay / sleep / blackout decision, and unit-test the boundaries:
Notes
stored-based productivity counting; the event count must include catch-up + negentropy + live read-loop events (see Fix spider backoff escalating productive syncs into long blackouts #99).error.ReadFailedroot cause), but with Spider: error.ReadFailed drops wss upstream connections after ~20s #100 fixed the long-uptime path becomes the common case, which these tests should also cover.