feat(routing): EndpointModeAuto for runtime protocol auto-detection#1137
Open
0x0079 wants to merge 11 commits into
Open
feat(routing): EndpointModeAuto for runtime protocol auto-detection#11370x0079 wants to merge 11 commits into
0x0079 wants to merge 11 commits into
Conversation
4 tasks
02f1a8c to
3f9ef19
Compare
587d682 to
17806c9
Compare
Third-party aggregator providers host models with mixed Chat/Responses support under a single OpenAI-style API base. EndpointModeAuto lets the gateway auto-detect per-model protocol support using the real request: try incoming protocol first, fallback to alternate on failure, cache successful results per provider+model (24h TTL, success-only). Key design: exclusion-based retry — auth (401/403), rate limit (429), and content errors are never retried; all other failures trigger protocol fallback. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…ntModeAuto Add UI controls for selecting OpenAI endpoint mode per provider (auto, chat, responses, both) with i18n support. Wire openai_endpoint_mode through create/update provider API fields and handler logic. Update design doc with Auto mode architecture details. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…back E2E probe now handles providers with EndpointModeAuto by trying chat first, falling back to responses on retryable failure, and caching successful results via the shared EndpointCache. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…solveAutoTarget - Move error classification to shared client.IsNonRetryableForProtocolSwitch, replacing duplicate isProbeNonRetryable and isNonRetryableForProtocolFallback - Extract resolveAutoTarget helper to deduplicate auto-mode branch in both OpenAI Chat and Responses handlers (~30 lines each → single call) - Collapse split success check in dispatchWithAutoFallback into one expression - Fix probe cache-hit failure: skip directly to alternate protocol instead of retrying the same endpoint that just failed - Fix gofmt alignment in server_types.go https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
Auto endpoint detection is now the zero-value behavior for OpenAI-style providers. Users don't need to configure it — rule extensions (openai_endpoint_override) handle the per-rule escape hatch. - Zero value / empty OpenAIEndpointMode → auto (was: chat-only) - Explicit "chat"/"responses"/"both" modes still honored for providers with hard constraints (Codex → responses, OAuth issuers, etc.) - Add ai.IsAutoEndpointMode helper for zero-value + explicit "auto" - Remove frontend endpoint mode selector (ProviderFormDialog, i18n) - Remove openai_endpoint_mode from Create/Update provider API - Update endpoint_resolution.go with explicit chat case + auto default https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
Under multi-service failover, dispatchWithPriorityFailoverGated may serve the request from a fallback provider after the initial one fails. The auto-endpoint cache previously wrote the entry against the initially selected provider, pinning it to a protocol it never confirmed — e.g. a chat-only provider failing /responses with 404, a fallback provider succeeding, and the failed protocol being cached for the chat-only provider for the full TTL. Each subsequent request then burned a guaranteed-failing upstream call before failover rescued it, and the cache-hit path (no gate) never self-corrected. dispatchWithPriorityFailoverGated now returns the provider/model of the final attempt; dispatchWithAutoFallback caches against that identity. Also extracts the duplicated success predicate into gateSucceeded. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
… Responses) On a cache miss, the auto-mode first attempt previously always mirrored the incoming API. For scenarios whose client ecosystem is natively Responses-based — Codex — providers overwhelmingly speak Responses, so mirroring a Chat ingress wastes the first round trip. resolveAutoTarget now consults scenarioPreferredProtocol: codex (profile suffixes normalized via Base()) leads with Responses; all other scenarios keep mirroring the incoming API. Override and cache precedence are unchanged. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…thAutoFallback Cover the remaining flow paths: first-attempt success caching, fallback success caching alternate protocol, both-fail no-cache, non-retryable errors (401, rate limit) skip fallback, status-0 no-retry, gin errors cleared between attempts, non-streaming buffered success. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
… flag Add `auto_endpoint` experimental flag (default OFF) to control the runtime protocol auto-detection feature. When disabled, providers with EndpointModeAuto/Unknown fall through to the standard ResolveOpenAIEndpoint path, eliminating risk from the new fallback logic until it's been validated in production. Backend: extension key, config setter, server/probe gate checks. Frontend: toggle on Experimental Features page with i18n (en/zh). https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
…ackage Move pure endpoint resolution logic out of the server package into its own module (internal/endpoint). This separates policy (what protocol to use) from mechanism (gin gate, failover dispatch). Moved: ResolveOpenAIEndpoint, EndpointOverride/ParseEndpointOverride, Cache (was EndpointCache), and auto-detection helpers (AlternateOpenAI Protocol, ScenarioPreferredProtocol, ResolveAutoTarget, etc.). Kept in server: dispatchWithAutoFallback, firstChunkGate, and all gin/failover-bound mechanism code. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
3d33661 to
112a94a
Compare
- Remove LogModeOverrideIgnored: zero callers (was dead even before the extraction; the move had promoted it from unexported to public). - Unexport package-internal helpers (parseEndpointOverride, incomingToTarget, overrideToTarget, scenarioPreferredProtocol, the endpointOverride enum). The package's real surface is now just ResolveOpenAIEndpoint, ResolveAutoTarget, AlternateOpenAIProtocol, the IncomingAPI* types, and Cache/NewCache. - Collapse the byte-identical Both/Auto branches in ResolveOpenAIEndpoint into a single incomingToTarget call. - Drop trivial mapping tests now covered transitively. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
autooption forOpenAIEndpointModethat detects per-model protocol support at runtime — no static config needed for aggregator providers (OpenRouter, SiliconFlow, etc.) hosting models with mixed Chat Completions / Responses API supportEndpointModeAuto, using the same cache for consistent behaviorautoas a visible optionDesign
EndpointCache(provider+model → protocol) with 24h TTL, success-only writesfirstChunkGatepattern buffers the response until commit/discard decision — no bytes hit the wire on failed first attemptdispatchWithPriorityFailoverGatedaccepts an external gate to avoid nested gate commit signal issuesKey files
ai/provider.goEndpointModeAutoconstantinternal/server/endpoint_cache.gointernal/server/endpoint_auto.gointernal/server/failover_dispatch.gointernal/server/openai_chat.gointernal/server/openai_responses.gointernal/server/endpoint_resolution.goautocaseinternal/probe/e2e.gofrontend/src/components/ProviderFormDialog.tsxTest plan
endpoint_cache_test.go: Get/Set, TTL expiry, concurrent safetyendpoint_auto_test.go: error classification (401→no retry, 404→retry, 429→no retry, context_length→no retry, unknown 500→retry)endpoint_resolution_test.go: auto mode casesauto, verify chat-only model falls back correctly, cache populated on successGenerated by Claude Code