Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_7f6563c7-4a73-4ce2-bff7-bd3f20325cb6
Introduced in #10 by @WilliamAGH on Feb 6, 2026
Summary
- Context: When
OpenAIServiceException is a 429, OpenAiProviderRoutingService.recordProviderFailure() calls RateLimitService.recordRateLimitFromOpenAiServiceException(), which resolves retry timing strictly from headers.
- Bug:
RateLimitDecisionException thrown from recordProviderFailure() bypasses BOTH provider protection mechanisms: (1) primary backoff AND (2) circuit breaker state.
- Actual vs. expected: Actual: provider remains immediately "available" after a 429 without timing headers, and the thrown
RateLimitDecisionException escapes the caller’s catch block (no fallback attempted). Expected: some provider protection should still be applied (at minimum primary backoff), and routing should continue (e.g., attempt fallback when eligible).
- Impact: Provider appears "available" immediately after a 429 without timing headers, receiving retry traffic that should be blocked.
Code with Bug
// `OpenAiProviderRoutingService.recordProviderFailure()`
public void recordProviderFailure(RateLimitService.ApiProvider provider, Throwable throwable) {
if (throwable instanceof OpenAIServiceException serviceException
&& serviceException.statusCode() == HTTP_TOO_MANY_REQUESTS) {
rateLimitService.recordRateLimitFromOpenAiServiceException(provider, serviceException);
// <-- BUG 🔴 throws RateLimitDecisionException; skips primary backoff + bubbles up
}
if (provider == configuredPrimaryProvider() && shouldBackoffPrimary(throwable)) {
markPrimaryBackoff();
}
}
// `RateLimitService.recordRateLimitFromOpenAiServiceException()`
public void recordRateLimitFromOpenAiServiceException(ApiProvider provider, OpenAIServiceException exception) {
// ...
RateLimitDecision decision = decisionResolver.resolveFromOpenAiHeaders(requiredException.headers());
// <-- BUG 🔴 can throw RateLimitDecisionException; applyRateLimit() never runs
applyRateLimit(requiredProvider, decision);
}
// `OpenAIStreamingService.complete()`
} catch (RuntimeException completionException) {
lastProviderFailure = completionException;
providerRoutingService.recordProviderFailure(activeProvider, completionException);
// <-- BUG 🔴 RateLimitDecisionException thrown here escapes the catch; fallback code below is never reached
// NEVER REACHED:
boolean hasNextProvider = providerIndex + 1 < availableProviders.size();
if (hasNextProvider && isCompletionFallbackEligible(completionException)) {
continue;
}
return Mono.error(completionException);
}
Explanation
- If a 429 response lacks timing headers,
decisionResolver.resolveFromOpenAiHeaders(...) throws RateLimitDecisionException.
- Because
recordProviderFailure() does not catch that exception, execution never reaches the subsequent markPrimaryBackoff() branch, leaving primaryBackoffUntilEpochMs unchanged (default 0).
- The circuit breaker is also not updated because
applyRateLimit() is skipped, so ProviderCircuitState.recordRateLimit() is never called and circuitOpen remains false.
- Availability checks (
isPrimaryInBackoff() and rateLimitService.isProviderAvailable()) therefore both treat the provider as available, causing immediate retry traffic after a 429.
- Additionally, the exception is thrown from within
OpenAIStreamingService.complete()’s catch block, so it escapes the try/catch entirely and prevents fallback behavior.
Recommended Fix
Catch and swallow RateLimitDecisionException inside recordProviderFailure() so primary backoff still applies (without mutating circuit-breaker timing when headers are absent):
public void recordProviderFailure(RateLimitService.ApiProvider provider, Throwable throwable) {
if (throwable instanceof OpenAIServiceException serviceException
&& serviceException.statusCode() == HTTP_TOO_MANY_REQUESTS) {
try {
rateLimitService.recordRateLimitFromOpenAiServiceException(provider, serviceException);
} catch (RateLimitDecisionException e) {
log.warn(
"[{}] Rate-limit timing unavailable ({}), continuing with backoff",
provider.getName(),
e.getMessage());
// Continue to markPrimaryBackoff() - do not re-throw
}
}
if (provider == configuredPrimaryProvider() && shouldBackoffPrimary(throwable)) {
markPrimaryBackoff();
}
}
History
This bug was introduced in commit fa355bd (strict header-only timing resolution now throws RateLimitDecisionException when headers are missing; the exception propagates and bypasses both primary backoff and circuit breaker updates).
Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_7f6563c7-4a73-4ce2-bff7-bd3f20325cb6
Introduced in #10 by @WilliamAGH on Feb 6, 2026
Summary
OpenAIServiceExceptionis a 429,OpenAiProviderRoutingService.recordProviderFailure()callsRateLimitService.recordRateLimitFromOpenAiServiceException(), which resolves retry timing strictly from headers.RateLimitDecisionExceptionthrown fromrecordProviderFailure()bypasses BOTH provider protection mechanisms: (1) primary backoff AND (2) circuit breaker state.RateLimitDecisionExceptionescapes the caller’s catch block (no fallback attempted). Expected: some provider protection should still be applied (at minimum primary backoff), and routing should continue (e.g., attempt fallback when eligible).Code with Bug
Explanation
decisionResolver.resolveFromOpenAiHeaders(...)throwsRateLimitDecisionException.recordProviderFailure()does not catch that exception, execution never reaches the subsequentmarkPrimaryBackoff()branch, leavingprimaryBackoffUntilEpochMsunchanged (default 0).applyRateLimit()is skipped, soProviderCircuitState.recordRateLimit()is never called andcircuitOpenremains false.isPrimaryInBackoff()andrateLimitService.isProviderAvailable()) therefore both treat the provider as available, causing immediate retry traffic after a 429.OpenAIStreamingService.complete()’scatchblock, so it escapes the try/catch entirely and prevents fallback behavior.Recommended Fix
Catch and swallow
RateLimitDecisionExceptioninsiderecordProviderFailure()so primary backoff still applies (without mutating circuit-breaker timing when headers are absent):History
This bug was introduced in commit
fa355bd(strict header-only timing resolution now throwsRateLimitDecisionExceptionwhen headers are missing; the exception propagates and bypasses both primary backoff and circuit breaker updates).