Skip to content

feat(routing): EndpointModeAuto for runtime protocol auto-detection#1137

Open
0x0079 wants to merge 11 commits into
mainfrom
feat/endpoint-mode-auto
Open

feat(routing): EndpointModeAuto for runtime protocol auto-detection#1137
0x0079 wants to merge 11 commits into
mainfrom
feat/endpoint-mode-auto

Conversation

@0x0079

@0x0079 0x0079 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • EndpointModeAuto: new auto option for OpenAIEndpointMode that detects per-model protocol support at runtime — no static config needed for aggregator providers (OpenRouter, SiliconFlow, etc.) hosting models with mixed Chat Completions / Responses API support
  • Runtime fallback: on first request, tries the incoming protocol; if it fails with a retryable error, transparently retries with the alternate protocol. Success-only results are cached (24h TTL) so subsequent requests hit the right endpoint directly
  • E2E probe auto-detection: the probe subsystem also respects EndpointModeAuto, using the same cache for consistent behavior
  • Frontend: endpoint mode selector added to ProviderFormDialog with auto as a visible option

Design

  • EndpointCache (provider+model → protocol) with 24h TTL, success-only writes
  • firstChunkGate pattern buffers the response until commit/discard decision — no bytes hit the wire on failed first attempt
  • dispatchWithPriorityFailoverGated accepts an external gate to avoid nested gate commit signal issues
  • Error classification by exclusion: auth (401/403), rate limit (429), content errors are non-retryable; everything else triggers fallback
  • Override flags retain highest priority over auto-detection

Key files

File Change
ai/provider.go EndpointModeAuto constant
internal/server/endpoint_cache.go In-memory success cache
internal/server/endpoint_auto.go Error classification + helpers
internal/server/failover_dispatch.go Gated dispatch variant
internal/server/openai_chat.go Auto fallback in Chat handler
internal/server/openai_responses.go Auto fallback in Responses handler
internal/server/endpoint_resolution.go Explicit auto case
internal/probe/e2e.go Probe auto-detection with cache
frontend/src/components/ProviderFormDialog.tsx UI selector

Test plan

  • endpoint_cache_test.go: Get/Set, TTL expiry, concurrent safety
  • endpoint_auto_test.go: error classification (401→no retry, 404→retry, 429→no retry, context_length→no retry, unknown 500→retry)
  • endpoint_resolution_test.go: auto mode cases
  • Existing failover tests pass (signature unchanged)
  • Manual: configure provider as auto, verify chat-only model falls back correctly, cache populated on success

Generated by Claude Code

@0x0079 0x0079 force-pushed the feat/endpoint-mode-auto branch 5 times, most recently from 02f1a8c to 3f9ef19 Compare June 11, 2026 13:13
@0x0079 0x0079 force-pushed the feat/endpoint-mode-auto branch from 587d682 to 17806c9 Compare June 13, 2026 08:38
claude added 10 commits June 16, 2026 06:58
Third-party aggregator providers host models with mixed Chat/Responses
support under a single OpenAI-style API base. EndpointModeAuto lets the
gateway auto-detect per-model protocol support using the real request:
try incoming protocol first, fallback to alternate on failure, cache
successful results per provider+model (24h TTL, success-only).

Key design: exclusion-based retry — auth (401/403), rate limit (429),
and content errors are never retried; all other failures trigger
protocol fallback.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…ntModeAuto

Add UI controls for selecting OpenAI endpoint mode per provider (auto,
chat, responses, both) with i18n support. Wire openai_endpoint_mode
through create/update provider API fields and handler logic. Update
design doc with Auto mode architecture details.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…back

E2E probe now handles providers with EndpointModeAuto by trying chat
first, falling back to responses on retryable failure, and caching
successful results via the shared EndpointCache.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…solveAutoTarget

- Move error classification to shared client.IsNonRetryableForProtocolSwitch,
  replacing duplicate isProbeNonRetryable and isNonRetryableForProtocolFallback
- Extract resolveAutoTarget helper to deduplicate auto-mode branch in both
  OpenAI Chat and Responses handlers (~30 lines each → single call)
- Collapse split success check in dispatchWithAutoFallback into one expression
- Fix probe cache-hit failure: skip directly to alternate protocol instead of
  retrying the same endpoint that just failed
- Fix gofmt alignment in server_types.go

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
Auto endpoint detection is now the zero-value behavior for OpenAI-style
providers. Users don't need to configure it — rule extensions
(openai_endpoint_override) handle the per-rule escape hatch.

- Zero value / empty OpenAIEndpointMode → auto (was: chat-only)
- Explicit "chat"/"responses"/"both" modes still honored for providers
  with hard constraints (Codex → responses, OAuth issuers, etc.)
- Add ai.IsAutoEndpointMode helper for zero-value + explicit "auto"
- Remove frontend endpoint mode selector (ProviderFormDialog, i18n)
- Remove openai_endpoint_mode from Create/Update provider API
- Update endpoint_resolution.go with explicit chat case + auto default

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
Under multi-service failover, dispatchWithPriorityFailoverGated may serve
the request from a fallback provider after the initial one fails. The
auto-endpoint cache previously wrote the entry against the initially
selected provider, pinning it to a protocol it never confirmed — e.g. a
chat-only provider failing /responses with 404, a fallback provider
succeeding, and the failed protocol being cached for the chat-only
provider for the full TTL. Each subsequent request then burned a
guaranteed-failing upstream call before failover rescued it, and the
cache-hit path (no gate) never self-corrected.

dispatchWithPriorityFailoverGated now returns the provider/model of the
final attempt; dispatchWithAutoFallback caches against that identity.
Also extracts the duplicated success predicate into gateSucceeded.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
… Responses)

On a cache miss, the auto-mode first attempt previously always mirrored
the incoming API. For scenarios whose client ecosystem is natively
Responses-based — Codex — providers overwhelmingly speak Responses, so
mirroring a Chat ingress wastes the first round trip. resolveAutoTarget
now consults scenarioPreferredProtocol: codex (profile suffixes
normalized via Base()) leads with Responses; all other scenarios keep
mirroring the incoming API. Override and cache precedence are unchanged.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…thAutoFallback

Cover the remaining flow paths: first-attempt success caching, fallback
success caching alternate protocol, both-fail no-cache, non-retryable
errors (401, rate limit) skip fallback, status-0 no-retry, gin errors
cleared between attempts, non-streaming buffered success.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
… flag

Add `auto_endpoint` experimental flag (default OFF) to control the
runtime protocol auto-detection feature. When disabled, providers with
EndpointModeAuto/Unknown fall through to the standard ResolveOpenAIEndpoint
path, eliminating risk from the new fallback logic until it's been
validated in production.

Backend: extension key, config setter, server/probe gate checks.
Frontend: toggle on Experimental Features page with i18n (en/zh).

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…ackage

Move pure endpoint resolution logic out of the server package into
its own module (internal/endpoint). This separates policy (what
protocol to use) from mechanism (gin gate, failover dispatch).

Moved: ResolveOpenAIEndpoint, EndpointOverride/ParseEndpointOverride,
Cache (was EndpointCache), and auto-detection helpers (AlternateOpenAI
Protocol, ScenarioPreferredProtocol, ResolveAutoTarget, etc.).

Kept in server: dispatchWithAutoFallback, firstChunkGate, and all
gin/failover-bound mechanism code.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
@0x0079 0x0079 force-pushed the feat/endpoint-mode-auto branch from 3d33661 to 112a94a Compare June 16, 2026 07:03
- Remove LogModeOverrideIgnored: zero callers (was dead even before the
  extraction; the move had promoted it from unexported to public).
- Unexport package-internal helpers (parseEndpointOverride,
  incomingToTarget, overrideToTarget, scenarioPreferredProtocol, the
  endpointOverride enum). The package's real surface is now just
  ResolveOpenAIEndpoint, ResolveAutoTarget, AlternateOpenAIProtocol,
  the IncomingAPI* types, and Cache/NewCache.
- Collapse the byte-identical Both/Auto branches in ResolveOpenAIEndpoint
  into a single incomingToTarget call.
- Drop trivial mapping tests now covered transitively.

https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants