fix(esc): cancel streaming API call when abort signal trips by ericleepi314 · Pull Request #144 · agentforce314/clawcodex

ericleepi314 · 2026-05-15T10:02:20Z

Summary

ESC during a multi-tool_use streaming response was still taking 20+ seconds to actually stop, despite the prior abort plumbing landing in PRs fix(esc): propagate cancel signal into tool_context for subagents #135/refactor(context): make ToolContext.abort_controller non-optional #140/feat(headless): SIGINT mid-tool cancels via AbortController #141/feat(grep,glob): abort ripgrep mid-search instead of waiting out timeout #143
Root cause: provider.chat_stream_response had no way to observe a tripped abort controller — the only interrupt point was on_text_chunk, which never fires for tool_use-only turns
Thread AbortSignal through chat_stream_response; the Anthropic provider registers a listener that calls stream.response.close() (same pattern as the existing idle-timeout StreamWatchdog), translating the resulting iterator raise to AbortError

The user-reported symptom

User pressed ESC during a turn where the model emitted 8 parallel Write blocks. The abort controller tripped immediately, but client.messages.stream(...) happily read to EOF (~20s of model generation). Only then did query.py's post-API abort check fire and yield "Interrupted by user" for all 8 tool_use blocks. With this fix, the listener forcibly closes the streaming HTTP response within ~50ms of the abort firing.

Changes

src/providers/base.py — new abort_signal: AbortSignal | None = None parameter on chat_stream_response; docstring explains why text-chunk-based interruption isn't sufficient (tool_use-only turns produce no text chunks to observe).
src/providers/anthropic_provider.py — three abort defenses:
1. Pre-call fast-path: pre-tripped signal raises AbortError before any network work (skips the round-trip on turn-boundary aborts).
2. Stream-close listener: registered via abort_signal.add_listener(_close_stream_on_abort, once=True) immediately after entering the stream context. Listener calls stream.response.close(). Register-then-recheck ordering closes the sub-microsecond race where _fire could snapshot the listener list and drop a newly-appended listener.
3. Post-with-block recheck: catches a signal that fires between __exit__ and return.
4. Exception translation: except Exception block consults abort_signal.aborted (not exception class — SDK versions vary) and re-raises AbortError when true.
5. Cleanup: finally block detaches the listener so long-lived controllers (the REPL engine's, reused across many turns) don't accumulate dead listeners.
src/providers/minimax_provider.py + src/providers/openai_compatible.py — pre-call fast-path only. Mid-stream listener for these providers is a future PR (Minimax shares the anthropic SDK, so it should be a small lift).
src/query/query.py:
- _call_model_sync grows abort_signal keyword and forwards to the provider.
- New except AbortError: raise in _call_model_sync prevents the cancel from routing through the substring-classification error-message gauntlet (also stops the misleading [DIAG] EXCEPTION log).
- New except AbortError: pass in the query loop falls through to the existing if params.abort_controller.signal.aborted: block — cancellation processing happens in one place.
src/tool_system/agent_loop.py — _call_provider_for_turn grows cancel_signal keyword and forwards as abort_signal=; run_agent_loop passes through. Same fix benefits the TUI path (which uses run_agent_loop via AgentBridge).
tests/test_provider_abort_signal.py (new) — 5 regression tests:
- test_pre_aborted_signal_short_circuits_before_request — _ensure_client is never called for a pre-tripped signal
- test_mid_stream_abort_closes_stream_and_raises_abort_error — synthetic 500ms stream + mid-stream abort returns in <1s (vs the 20s symptom this fixes); stream.response.close() is called
- test_uncancelled_stream_returns_normally — no regression for never-tripped signals
- test_no_abort_signal_param_preserves_legacy_callers — abort_signal=None default keeps SDK consumers working
- test_listener_detached_after_normal_completion — pins that the listener is removed after normal exit

Test plan

5 new regression tests pass
85 directly related tests pass (abort, query, streaming executor, subagent, esc-cancel propagation, tool context default)
4409 broader related tests pass
Critic subagent review: APPROVE (after applying race-window fix, pre-call fast-path on other providers, explicit AbortError handler in _call_model_sync)
Manual: kick off a prompt that triggers a multi-Write parallel response, press ESC during the model's generation — should unwind in ~50ms, not 20+s

Out of scope (deferred to follow-ups)

Mid-stream cancellation for Minimax / GLM / OpenAI providers — pre-call fast-path is in place; the response-close listener pattern needs to be ported (Minimax uses the same anthropic SDK so it's a small lift; OpenAI/GLM need their own listener around the SDK's iterator).
MCP tools — out-of-process JSON-RPC; transport-level cancellation is a cross-cutting change across the MCP client.

🤖 Generated with Claude Code

Even after PRs #135/#140/#141/#143 wired ESC through subagents, headless SIGINT, the non-optional ``tool_context.abort_controller`` contract, and ripgrep mid-search, the user-visible ESC latency was still 20+ seconds when the model emitted a multi-tool_use response. Root cause: the provider's streaming HTTP read had no way to observe the abort controller. ``query.py`` passed no abort plumbing to ``chat_stream_response``; the only existing interrupt point was the ``on_text_chunk`` callback, which never fires for a turn that emits tool_use blocks without intervening text. The model could spend ~20s generating eight parallel ``Write`` blocks, and only after the stream returned naturally would the outer query loop check the abort signal and yield "Interrupted by user" for all eight. Thread an ``AbortSignal`` through ``chat_stream_response``. In the Anthropic provider, register a listener that calls ``stream.response.close()`` (the same close pattern the existing ``StreamWatchdog`` uses for idle timeout). Closing the underlying HTTP response causes the SDK's blocking socket read to raise in the consumer thread; the provider catches it, detects the abort via the signal state (not the exception class — different SDK versions raise different classes), and re-raises ``AbortError``. Three defenses against the registration race: * Pre-call fast-path bails when the signal was already tripped at call entry — skips the round-trip entirely. * Register-then-recheck after entering the stream context — closes the sub-microsecond window between an ``aborted`` read and a ``add_listener`` append where ``_fire`` could snapshot the listener list and silently drop our newly-appended listener. * Post-with-block recheck — surfaces a signal that fires between ``__exit__`` and ``return``. Plumbing: * ``query.py``: ``_call_model_sync`` grows ``abort_signal`` keyword and forwards to the provider; call site passes ``params.abort_controller.signal``. New ``except AbortError: raise`` in ``_call_model_sync`` and ``except AbortError: pass`` in the query loop route the cancel to the existing post-API abort-handling block in one place. * ``agent_loop.py``: ``_call_provider_for_turn`` grows ``cancel_signal`` and forwards as ``abort_signal=`` to the provider; ``run_agent_loop`` passes through. * ``minimax_provider.py`` / ``openai_compatible.py``: pre-call fast-path for parity (mid-stream listener for these providers is a future PR — same underlying anthropic SDK for Minimax makes it a small lift). Five regression tests pin the contract: pre-aborted fast-path skips ``_ensure_client``, mid-stream abort closes the stream and raises ``AbortError`` within <1s (vs the 20s+ symptom this PR fixes), unaborted streams return normally, ``abort_signal=None`` is a no-op for legacy callers, and the listener detaches after normal completion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(esc): cancel streaming API call when abort signal trips#144

fix(esc): cancel streaming API call when abort signal trips#144
ericleepi314 wants to merge 1 commit into
mainfrom
fix/esc-cancel-streaming-api

ericleepi314 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericleepi314 commented May 15, 2026

Summary

The user-reported symptom

Changes

Test plan

Out of scope (deferred to follow-ups)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant