fix(esc): cancel streaming API call when abort signal trips#144
Open
ericleepi314 wants to merge 1 commit into
Open
fix(esc): cancel streaming API call when abort signal trips#144ericleepi314 wants to merge 1 commit into
ericleepi314 wants to merge 1 commit into
Conversation
Even after PRs #135/#140/#141/#143 wired ESC through subagents, headless SIGINT, the non-optional ``tool_context.abort_controller`` contract, and ripgrep mid-search, the user-visible ESC latency was still 20+ seconds when the model emitted a multi-tool_use response. Root cause: the provider's streaming HTTP read had no way to observe the abort controller. ``query.py`` passed no abort plumbing to ``chat_stream_response``; the only existing interrupt point was the ``on_text_chunk`` callback, which never fires for a turn that emits tool_use blocks without intervening text. The model could spend ~20s generating eight parallel ``Write`` blocks, and only after the stream returned naturally would the outer query loop check the abort signal and yield "Interrupted by user" for all eight. Thread an ``AbortSignal`` through ``chat_stream_response``. In the Anthropic provider, register a listener that calls ``stream.response.close()`` (the same close pattern the existing ``StreamWatchdog`` uses for idle timeout). Closing the underlying HTTP response causes the SDK's blocking socket read to raise in the consumer thread; the provider catches it, detects the abort via the signal state (not the exception class — different SDK versions raise different classes), and re-raises ``AbortError``. Three defenses against the registration race: * Pre-call fast-path bails when the signal was already tripped at call entry — skips the round-trip entirely. * Register-then-recheck after entering the stream context — closes the sub-microsecond window between an ``aborted`` read and a ``add_listener`` append where ``_fire`` could snapshot the listener list and silently drop our newly-appended listener. * Post-with-block recheck — surfaces a signal that fires between ``__exit__`` and ``return``. Plumbing: * ``query.py``: ``_call_model_sync`` grows ``abort_signal`` keyword and forwards to the provider; call site passes ``params.abort_controller.signal``. New ``except AbortError: raise`` in ``_call_model_sync`` and ``except AbortError: pass`` in the query loop route the cancel to the existing post-API abort-handling block in one place. * ``agent_loop.py``: ``_call_provider_for_turn`` grows ``cancel_signal`` and forwards as ``abort_signal=`` to the provider; ``run_agent_loop`` passes through. * ``minimax_provider.py`` / ``openai_compatible.py``: pre-call fast-path for parity (mid-stream listener for these providers is a future PR — same underlying anthropic SDK for Minimax makes it a small lift). Five regression tests pin the contract: pre-aborted fast-path skips ``_ensure_client``, mid-stream abort closes the stream and raises ``AbortError`` within <1s (vs the 20s+ symptom this PR fixes), unaborted streams return normally, ``abort_signal=None`` is a no-op for legacy callers, and the listener detaches after normal completion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
provider.chat_stream_responsehad no way to observe a tripped abort controller — the only interrupt point wason_text_chunk, which never fires for tool_use-only turnsAbortSignalthroughchat_stream_response; the Anthropic provider registers a listener that callsstream.response.close()(same pattern as the existing idle-timeoutStreamWatchdog), translating the resulting iterator raise toAbortErrorThe user-reported symptom
User pressed ESC during a turn where the model emitted 8 parallel
Writeblocks. The abort controller tripped immediately, butclient.messages.stream(...)happily read to EOF (~20s of model generation). Only then didquery.py's post-API abort check fire and yield "Interrupted by user" for all 8 tool_use blocks. With this fix, the listener forcibly closes the streaming HTTP response within ~50ms of the abort firing.Changes
src/providers/base.py— newabort_signal: AbortSignal | None = Noneparameter onchat_stream_response; docstring explains why text-chunk-based interruption isn't sufficient (tool_use-only turns produce no text chunks to observe).src/providers/anthropic_provider.py— three abort defenses:AbortErrorbefore any network work (skips the round-trip on turn-boundary aborts).abort_signal.add_listener(_close_stream_on_abort, once=True)immediately after entering the stream context. Listener callsstream.response.close(). Register-then-recheck ordering closes the sub-microsecond race where_firecould snapshot the listener list and drop a newly-appended listener.__exit__and return.except Exceptionblock consultsabort_signal.aborted(not exception class — SDK versions vary) and re-raisesAbortErrorwhen true.finallyblock detaches the listener so long-lived controllers (the REPL engine's, reused across many turns) don't accumulate dead listeners.src/providers/minimax_provider.py+src/providers/openai_compatible.py— pre-call fast-path only. Mid-stream listener for these providers is a future PR (Minimax shares the anthropic SDK, so it should be a small lift).src/query/query.py:_call_model_syncgrowsabort_signalkeyword and forwards to the provider.except AbortError: raisein_call_model_syncprevents the cancel from routing through the substring-classification error-message gauntlet (also stops the misleading[DIAG] EXCEPTIONlog).except AbortError: passin the query loop falls through to the existingif params.abort_controller.signal.aborted:block — cancellation processing happens in one place.src/tool_system/agent_loop.py—_call_provider_for_turngrowscancel_signalkeyword and forwards asabort_signal=;run_agent_looppasses through. Same fix benefits the TUI path (which usesrun_agent_loopviaAgentBridge).tests/test_provider_abort_signal.py(new) — 5 regression tests:test_pre_aborted_signal_short_circuits_before_request—_ensure_clientis never called for a pre-tripped signaltest_mid_stream_abort_closes_stream_and_raises_abort_error— synthetic 500ms stream + mid-stream abort returns in <1s (vs the 20s symptom this fixes);stream.response.close()is calledtest_uncancelled_stream_returns_normally— no regression for never-tripped signalstest_no_abort_signal_param_preserves_legacy_callers—abort_signal=Nonedefault keeps SDK consumers workingtest_listener_detached_after_normal_completion— pins that the listener is removed after normal exitTest plan
_call_model_sync)Out of scope (deferred to follow-ups)
🤖 Generated with Claude Code