feat(routing): EndpointModeAuto for runtime protocol auto-detection by 0x0079 · Pull Request #1137 · tingly-dev/tingly-box

0x0079 · 2026-06-05T06:47:34Z

Summary

EndpointModeAuto: new auto option for OpenAIEndpointMode that detects per-model protocol support at runtime — no static config needed for aggregator providers (OpenRouter, SiliconFlow, etc.) hosting models with mixed Chat Completions / Responses API support
Runtime fallback: on first request, tries the incoming protocol; if it fails with a retryable error, transparently retries with the alternate protocol. Success-only results are cached (24h TTL) so subsequent requests hit the right endpoint directly
E2E probe auto-detection: the probe subsystem also respects EndpointModeAuto, using the same cache for consistent behavior
Frontend: endpoint mode selector added to ProviderFormDialog with auto as a visible option

Design

EndpointCache (provider+model → protocol) with 24h TTL, success-only writes
firstChunkGate pattern buffers the response until commit/discard decision — no bytes hit the wire on failed first attempt
dispatchWithPriorityFailoverGated accepts an external gate to avoid nested gate commit signal issues
Error classification by exclusion: auth (401/403), rate limit (429), content errors are non-retryable; everything else triggers fallback
Override flags retain highest priority over auto-detection

Key files

File	Change
`ai/provider.go`	`EndpointModeAuto` constant
`internal/server/endpoint_cache.go`	In-memory success cache
`internal/server/endpoint_auto.go`	Error classification + helpers
`internal/server/failover_dispatch.go`	Gated dispatch variant
`internal/server/openai_chat.go`	Auto fallback in Chat handler
`internal/server/openai_responses.go`	Auto fallback in Responses handler
`internal/server/endpoint_resolution.go`	Explicit `auto` case
`internal/probe/e2e.go`	Probe auto-detection with cache
`frontend/src/components/ProviderFormDialog.tsx`	UI selector

Test plan

endpoint_cache_test.go: Get/Set, TTL expiry, concurrent safety
endpoint_auto_test.go: error classification (401→no retry, 404→retry, 429→no retry, context_length→no retry, unknown 500→retry)
endpoint_resolution_test.go: auto mode cases
Existing failover tests pass (signature unchanged)
Manual: configure provider as auto, verify chat-only model falls back correctly, cache populated on success

Generated by Claude Code

Third-party aggregator providers host models with mixed Chat/Responses support under a single OpenAI-style API base. EndpointModeAuto lets the gateway auto-detect per-model protocol support using the real request: try incoming protocol first, fallback to alternate on failure, cache successful results per provider+model (24h TTL, success-only). Key design: exclusion-based retry — auth (401/403), rate limit (429), and content errors are never retried; all other failures trigger protocol fallback. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

…ntModeAuto Add UI controls for selecting OpenAI endpoint mode per provider (auto, chat, responses, both) with i18n support. Wire openai_endpoint_mode through create/update provider API fields and handler logic. Update design doc with Auto mode architecture details. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

…back E2E probe now handles providers with EndpointModeAuto by trying chat first, falling back to responses on retryable failure, and caching successful results via the shared EndpointCache. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

…solveAutoTarget - Move error classification to shared client.IsNonRetryableForProtocolSwitch, replacing duplicate isProbeNonRetryable and isNonRetryableForProtocolFallback - Extract resolveAutoTarget helper to deduplicate auto-mode branch in both OpenAI Chat and Responses handlers (~30 lines each → single call) - Collapse split success check in dispatchWithAutoFallback into one expression - Fix probe cache-hit failure: skip directly to alternate protocol instead of retrying the same endpoint that just failed - Fix gofmt alignment in server_types.go https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

Auto endpoint detection is now the zero-value behavior for OpenAI-style providers. Users don't need to configure it — rule extensions (openai_endpoint_override) handle the per-rule escape hatch. - Zero value / empty OpenAIEndpointMode → auto (was: chat-only) - Explicit "chat"/"responses"/"both" modes still honored for providers with hard constraints (Codex → responses, OAuth issuers, etc.) - Add ai.IsAutoEndpointMode helper for zero-value + explicit "auto" - Remove frontend endpoint mode selector (ProviderFormDialog, i18n) - Remove openai_endpoint_mode from Create/Update provider API - Update endpoint_resolution.go with explicit chat case + auto default https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

Under multi-service failover, dispatchWithPriorityFailoverGated may serve the request from a fallback provider after the initial one fails. The auto-endpoint cache previously wrote the entry against the initially selected provider, pinning it to a protocol it never confirmed — e.g. a chat-only provider failing /responses with 404, a fallback provider succeeding, and the failed protocol being cached for the chat-only provider for the full TTL. Each subsequent request then burned a guaranteed-failing upstream call before failover rescued it, and the cache-hit path (no gate) never self-corrected. dispatchWithPriorityFailoverGated now returns the provider/model of the final attempt; dispatchWithAutoFallback caches against that identity. Also extracts the duplicated success predicate into gateSucceeded. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

… Responses) On a cache miss, the auto-mode first attempt previously always mirrored the incoming API. For scenarios whose client ecosystem is natively Responses-based — Codex — providers overwhelmingly speak Responses, so mirroring a Chat ingress wastes the first round trip. resolveAutoTarget now consults scenarioPreferredProtocol: codex (profile suffixes normalized via Base()) leads with Responses; all other scenarios keep mirroring the incoming API. Override and cache precedence are unchanged. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

…thAutoFallback Cover the remaining flow paths: first-attempt success caching, fallback success caching alternate protocol, both-fail no-cache, non-retryable errors (401, rate limit) skip fallback, status-0 no-retry, gin errors cleared between attempts, non-streaming buffered success. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

… flag Add `auto_endpoint` experimental flag (default OFF) to control the runtime protocol auto-detection feature. When disabled, providers with EndpointModeAuto/Unknown fall through to the standard ResolveOpenAIEndpoint path, eliminating risk from the new fallback logic until it's been validated in production. Backend: extension key, config setter, server/probe gate checks. Frontend: toggle on Experimental Features page with i18n (en/zh). https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

…ackage Move pure endpoint resolution logic out of the server package into its own module (internal/endpoint). This separates policy (what protocol to use) from mechanism (gin gate, failover dispatch). Moved: ResolveOpenAIEndpoint, EndpointOverride/ParseEndpointOverride, Cache (was EndpointCache), and auto-detection helpers (AlternateOpenAI Protocol, ScenarioPreferredProtocol, ResolveAutoTarget, etc.). Kept in server: dispatchWithAutoFallback, firstChunkGate, and all gin/failover-bound mechanism code. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

- Remove LogModeOverrideIgnored: zero callers (was dead even before the extraction; the move had promoted it from unexported to public). - Unexport package-internal helpers (parseEndpointOverride, incomingToTarget, overrideToTarget, scenarioPreferredProtocol, the endpointOverride enum). The package's real surface is now just ResolveOpenAIEndpoint, ResolveAutoTarget, AlternateOpenAIProtocol, the IncomingAPI* types, and Cache/NewCache. - Collapse the byte-identical Both/Auto branches in ResolveOpenAIEndpoint into a single incomingToTarget call. - Drop trivial mapping tests now covered transitively. https://claude.ai/code/session_01N9JnUTVt41NcGxYyj3xewf

0x0079 mentioned this pull request Jun 5, 2026

refactor(probe): remove client probe methods, dispatch through real-traffic paths #1138

Merged

4 tasks

0x0079 force-pushed the feat/endpoint-mode-auto branch 5 times, most recently from 02f1a8c to 3f9ef19 Compare June 11, 2026 13:13

0x0079 force-pushed the feat/endpoint-mode-auto branch from 587d682 to 17806c9 Compare June 13, 2026 08:38

claude added 10 commits June 16, 2026 06:58

0x0079 force-pushed the feat/endpoint-mode-auto branch from 3d33661 to 112a94a Compare June 16, 2026 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(routing): EndpointModeAuto for runtime protocol auto-detection#1137

feat(routing): EndpointModeAuto for runtime protocol auto-detection#1137
0x0079 wants to merge 11 commits into
mainfrom
feat/endpoint-mode-auto

0x0079 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0x0079 commented Jun 5, 2026

Summary

Design

Key files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants