Skip to content

fix: Anthropic proxy timeout — native SSE streaming + hardened HTTP client#485

Draft
worldofgeese wants to merge 4 commits intospacedriveapp:mainfrom
worldofgeese:fix/anthropic-proxy-timeout-v033
Draft

fix: Anthropic proxy timeout — native SSE streaming + hardened HTTP client#485
worldofgeese wants to merge 4 commits intospacedriveapp:mainfrom
worldofgeese:fix/anthropic-proxy-timeout-v033

Conversation

@worldofgeese
Copy link

@worldofgeese worldofgeese commented Mar 24, 2026

Problem

When using the Anthropic API through a corporate proxy (e.g. https://$URL/claude/v1/messages), completions from Opus consistently fail:

  1. Client-side timeout (before this PR): CompletionError: ProviderError: error sending request for url — the 120s base reqwest client timeout expired before Opus finished thinking.
  2. Proxy-side timeout (after bumping client timeout): 504 Gateway Timeout — the proxy killed the idle TCP connection because the non-streaming Anthropic path sends a single POST and waits for the entire response as one blob, with no intermediate data to keep the connection alive.

Root Cause

Three issues compound:

  1. No Anthropic SSE streaming. The stream() method for ApiType::Anthropic calls attempt_completion() (non-streaming) then wraps the result in a fake stream via stream_from_completion_response(). Corporate proxies with gateway timeouts (typically 120–300s) kill the idle connection before Opus finishes generating.

  2. Base HTTP client timeout too low. Both reqwest::Client::builder() sites use a 120s flat timeout with no connect_timeout or tcp_keepalive.

  3. Error chain discarded. Every .map_err(|e| CompletionError::ProviderError(e.to_string())) only captures the top-level reqwest error. The actual cause (timeout, connection reset, proxy disconnect) is silently lost.

The Fix

Commit 1: Hardened HTTP client + error diagnostics

src/llm/manager.rs

  • Bump base timeout: 120s → 300s
  • Add connect_timeout(30s) — fail fast on connection establishment
  • Add tcp_keepalive(30s) — keeps proxy connections alive
  • Add pool_idle_timeout(90s) — prevents stale pooled connections

src/llm/model.rs

  • Add .timeout(STREAM_REQUEST_TIMEOUT_SECS) to call_anthropic() request builder (1800s, matching OpenAI)
  • Replace e.to_string() with format!("{e:#}") for full error cause chain preservation

Commit 2: Native Anthropic SSE streaming (~200 lines)

Adds stream_anthropic() that:

  • Sends "stream": true in the request body
  • Parses all Anthropic SSE event types: message_start, content_block_start/delta/stop, message_delta, message_stop, error, ping
  • Handles text, tool_use (incremental JSON accumulation), and thinking blocks
  • Tracks usage (input/cached/output tokens)
  • Handles OAuth tool name reverse-mapping

Supporting changes:

  • anthropic/params.rs: Added body field to AnthropicRequest and anthropic_messages_url() helper
  • anthropic.rs: Re-exported new symbols

Commit 3: Compilation fixes + code review improvements

  • Fixed ReasoningContent::Text variant (struct syntax with signature field)
  • Fixed message_id to preserve Option<String> type
  • Fixed StreamingCompletionResponse constructor call
  • Added tracing::warn! for malformed tool JSON (was silently replaced with {})

Testing

  • cargo check --release: ✅ clean
  • cargo test: 623 passed, 2 failed (pre-existing on v0.3.3 — config::tests::test_llm_provider_tables_parse_with_env_and_lowercase_keys, config::tests::toml_round_trip_with_named_instances)
  • Based on release tag v0.3.3

Impact

  • Anthropic path: Real SSE streaming — continuous data keeps proxy connections alive
  • All providers: TCP keepalive + full error chains
  • No config schema changes, no new dependencies, no behavioral changes for non-proxy users

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce47c112-6418-45d8-9fd4-de47a4bcda16

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

cuttlefish-server and others added 2 commits March 25, 2026 07:50
Increases default HTTP client timeout from 120s to 300s and adds
connect_timeout, tcp_keepalive, and pool_idle_timeout settings to
prevent corporate proxy idle-timeout kills during long-running LLM
completions.

- timeout: 120s → 300s (overall request timeout)
- connect_timeout: 30s (connection establishment)
- tcp_keepalive: 30s (TCP keepalive probes)
- pool_idle_timeout: 90s (connection pool cleanup)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds native Server-Sent Events (SSE) streaming support for Anthropic API
to prevent corporate proxy 504 Gateway Timeout errors during long-running
completions. Previously, non-streaming requests would idle and trigger
proxy timeouts; now both call_anthropic() and stream_anthropic() use SSE
with continuous data flow to keep the connection alive.

## Key changes

### SSE streaming infrastructure
- Custom SSE client with auto-decompression disabled (no_gzip/no_brotli/
  no_deflate) to handle proxies that incorrectly advertise Content-Encoding
- build_anthropic_sse_request(): shared request builder with proper headers
  (Accept: text/event-stream, accept-encoding: identity)
- parse_anthropic_sse_event(): unified event parser returning type-safe enum
- AnthropicSseEvent enum: structured representation of all Anthropic SSE events

### Error handling improvements
- Full error cause chain preservation: e.to_string() → format!("{e:#}")
- Better error messages for failed streams, JSON parsing, and API errors
- Graceful handling of malformed SSE chunks with tracing

### Refactoring
- Eliminated ~90% code duplication between call_anthropic() and stream_anthropic()
- Both methods now share the same request builder and event parser
- Removed debug response header logging (was added for proxy diagnosis)

### OAuth tool name mapping
- Preserved reverse-mapping for Claude Code canonical ↔ original tool names
- Handled in both streaming and non-streaming paths

## Compatibility
- All existing behavior preserved (tests pass, no API changes)
- SSE used internally for call_anthropic() but returns full CompletionResponse
- stream_anthropic() yields events as before
- Proper handling of thinking blocks, tool calls, and text deltas

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@worldofgeese worldofgeese force-pushed the fix/anthropic-proxy-timeout-v033 branch from 4e0f03c to 0d7deee Compare March 25, 2026 06:52
Adds tracing::warn! for content-type, content-encoding, and
transfer-encoding from the SSE response, plus a hex+text dump
of the first 128/200 bytes of the first SSE chunk. This will
reveal exactly what the proxy is sending.
Adds std::error::Error::source() to the 'Anthropic stream read failed'
error message. This will show the underlying hyper/h2/io error that
causes 'error decoding response body', helping identify whether the
failure is a proxy timeout, connection reset, or actual decode error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants