Skip to content

fix(esc): cancel streaming API call when abort signal trips#144

Open
ericleepi314 wants to merge 1 commit into
mainfrom
fix/esc-cancel-streaming-api
Open

fix(esc): cancel streaming API call when abort signal trips#144
ericleepi314 wants to merge 1 commit into
mainfrom
fix/esc-cancel-streaming-api

Conversation

@ericleepi314
Copy link
Copy Markdown
Collaborator

Summary

The user-reported symptom

User pressed ESC during a turn where the model emitted 8 parallel Write blocks. The abort controller tripped immediately, but client.messages.stream(...) happily read to EOF (~20s of model generation). Only then did query.py's post-API abort check fire and yield "Interrupted by user" for all 8 tool_use blocks. With this fix, the listener forcibly closes the streaming HTTP response within ~50ms of the abort firing.

Changes

  • src/providers/base.py — new abort_signal: AbortSignal | None = None parameter on chat_stream_response; docstring explains why text-chunk-based interruption isn't sufficient (tool_use-only turns produce no text chunks to observe).
  • src/providers/anthropic_provider.py — three abort defenses:
    1. Pre-call fast-path: pre-tripped signal raises AbortError before any network work (skips the round-trip on turn-boundary aborts).
    2. Stream-close listener: registered via abort_signal.add_listener(_close_stream_on_abort, once=True) immediately after entering the stream context. Listener calls stream.response.close(). Register-then-recheck ordering closes the sub-microsecond race where _fire could snapshot the listener list and drop a newly-appended listener.
    3. Post-with-block recheck: catches a signal that fires between __exit__ and return.
    4. Exception translation: except Exception block consults abort_signal.aborted (not exception class — SDK versions vary) and re-raises AbortError when true.
    5. Cleanup: finally block detaches the listener so long-lived controllers (the REPL engine's, reused across many turns) don't accumulate dead listeners.
  • src/providers/minimax_provider.py + src/providers/openai_compatible.py — pre-call fast-path only. Mid-stream listener for these providers is a future PR (Minimax shares the anthropic SDK, so it should be a small lift).
  • src/query/query.py:
    • _call_model_sync grows abort_signal keyword and forwards to the provider.
    • New except AbortError: raise in _call_model_sync prevents the cancel from routing through the substring-classification error-message gauntlet (also stops the misleading [DIAG] EXCEPTION log).
    • New except AbortError: pass in the query loop falls through to the existing if params.abort_controller.signal.aborted: block — cancellation processing happens in one place.
  • src/tool_system/agent_loop.py_call_provider_for_turn grows cancel_signal keyword and forwards as abort_signal=; run_agent_loop passes through. Same fix benefits the TUI path (which uses run_agent_loop via AgentBridge).
  • tests/test_provider_abort_signal.py (new) — 5 regression tests:
    • test_pre_aborted_signal_short_circuits_before_request_ensure_client is never called for a pre-tripped signal
    • test_mid_stream_abort_closes_stream_and_raises_abort_error — synthetic 500ms stream + mid-stream abort returns in <1s (vs the 20s symptom this fixes); stream.response.close() is called
    • test_uncancelled_stream_returns_normally — no regression for never-tripped signals
    • test_no_abort_signal_param_preserves_legacy_callersabort_signal=None default keeps SDK consumers working
    • test_listener_detached_after_normal_completion — pins that the listener is removed after normal exit

Test plan

  • 5 new regression tests pass
  • 85 directly related tests pass (abort, query, streaming executor, subagent, esc-cancel propagation, tool context default)
  • 4409 broader related tests pass
  • Critic subagent review: APPROVE (after applying race-window fix, pre-call fast-path on other providers, explicit AbortError handler in _call_model_sync)
  • Manual: kick off a prompt that triggers a multi-Write parallel response, press ESC during the model's generation — should unwind in ~50ms, not 20+s

Out of scope (deferred to follow-ups)

  • Mid-stream cancellation for Minimax / GLM / OpenAI providers — pre-call fast-path is in place; the response-close listener pattern needs to be ported (Minimax uses the same anthropic SDK so it's a small lift; OpenAI/GLM need their own listener around the SDK's iterator).
  • MCP tools — out-of-process JSON-RPC; transport-level cancellation is a cross-cutting change across the MCP client.

🤖 Generated with Claude Code

Even after PRs #135/#140/#141/#143 wired ESC through subagents, headless
SIGINT, the non-optional ``tool_context.abort_controller`` contract, and
ripgrep mid-search, the user-visible ESC latency was still 20+ seconds
when the model emitted a multi-tool_use response. Root cause: the
provider's streaming HTTP read had no way to observe the abort
controller. ``query.py`` passed no abort plumbing to
``chat_stream_response``; the only existing interrupt point was the
``on_text_chunk`` callback, which never fires for a turn that emits
tool_use blocks without intervening text. The model could spend ~20s
generating eight parallel ``Write`` blocks, and only after the stream
returned naturally would the outer query loop check the abort signal
and yield "Interrupted by user" for all eight.

Thread an ``AbortSignal`` through ``chat_stream_response``. In the
Anthropic provider, register a listener that calls
``stream.response.close()`` (the same close pattern the existing
``StreamWatchdog`` uses for idle timeout). Closing the underlying HTTP
response causes the SDK's blocking socket read to raise in the
consumer thread; the provider catches it, detects the abort via the
signal state (not the exception class — different SDK versions raise
different classes), and re-raises ``AbortError``.

Three defenses against the registration race:
* Pre-call fast-path bails when the signal was already tripped at
  call entry — skips the round-trip entirely.
* Register-then-recheck after entering the stream context — closes
  the sub-microsecond window between an ``aborted`` read and a
  ``add_listener`` append where ``_fire`` could snapshot the
  listener list and silently drop our newly-appended listener.
* Post-with-block recheck — surfaces a signal that fires between
  ``__exit__`` and ``return``.

Plumbing:
* ``query.py``: ``_call_model_sync`` grows ``abort_signal`` keyword
  and forwards to the provider; call site passes
  ``params.abort_controller.signal``. New
  ``except AbortError: raise`` in ``_call_model_sync`` and
  ``except AbortError: pass`` in the query loop route the cancel
  to the existing post-API abort-handling block in one place.
* ``agent_loop.py``: ``_call_provider_for_turn`` grows
  ``cancel_signal`` and forwards as ``abort_signal=`` to the
  provider; ``run_agent_loop`` passes through.
* ``minimax_provider.py`` / ``openai_compatible.py``: pre-call
  fast-path for parity (mid-stream listener for these providers is
  a future PR — same underlying anthropic SDK for Minimax makes it
  a small lift).

Five regression tests pin the contract: pre-aborted fast-path skips
``_ensure_client``, mid-stream abort closes the stream and raises
``AbortError`` within <1s (vs the 20s+ symptom this PR fixes),
unaborted streams return normally, ``abort_signal=None`` is a
no-op for legacy callers, and the listener detaches after normal
completion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant