Skip to content

[Detail Bug] LLM routing: rate-limit exceptions with missing timing headers bypass backoff and circuit breaker #62

Description

@detail-app

Detail Bug Report

https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_7f6563c7-4a73-4ce2-bff7-bd3f20325cb6

Introduced in #10 by @WilliamAGH on Feb 6, 2026

Summary

  • Context: When OpenAIServiceException is a 429, OpenAiProviderRoutingService.recordProviderFailure() calls RateLimitService.recordRateLimitFromOpenAiServiceException(), which resolves retry timing strictly from headers.
  • Bug: RateLimitDecisionException thrown from recordProviderFailure() bypasses BOTH provider protection mechanisms: (1) primary backoff AND (2) circuit breaker state.
  • Actual vs. expected: Actual: provider remains immediately "available" after a 429 without timing headers, and the thrown RateLimitDecisionException escapes the caller’s catch block (no fallback attempted). Expected: some provider protection should still be applied (at minimum primary backoff), and routing should continue (e.g., attempt fallback when eligible).
  • Impact: Provider appears "available" immediately after a 429 without timing headers, receiving retry traffic that should be blocked.

Code with Bug

// `OpenAiProviderRoutingService.recordProviderFailure()`
public void recordProviderFailure(RateLimitService.ApiProvider provider, Throwable throwable) {
    if (throwable instanceof OpenAIServiceException serviceException
            && serviceException.statusCode() == HTTP_TOO_MANY_REQUESTS) {
        rateLimitService.recordRateLimitFromOpenAiServiceException(provider, serviceException);
        // <-- BUG 🔴 throws RateLimitDecisionException; skips primary backoff + bubbles up
    }

    if (provider == configuredPrimaryProvider() && shouldBackoffPrimary(throwable)) {
        markPrimaryBackoff();
    }
}
// `RateLimitService.recordRateLimitFromOpenAiServiceException()`
public void recordRateLimitFromOpenAiServiceException(ApiProvider provider, OpenAIServiceException exception) {
    // ...
    RateLimitDecision decision = decisionResolver.resolveFromOpenAiHeaders(requiredException.headers());
    // <-- BUG 🔴 can throw RateLimitDecisionException; applyRateLimit() never runs
    applyRateLimit(requiredProvider, decision);
}
// `OpenAIStreamingService.complete()`
} catch (RuntimeException completionException) {
    lastProviderFailure = completionException;
    providerRoutingService.recordProviderFailure(activeProvider, completionException);
    // <-- BUG 🔴 RateLimitDecisionException thrown here escapes the catch; fallback code below is never reached

    // NEVER REACHED:
    boolean hasNextProvider = providerIndex + 1 < availableProviders.size();
    if (hasNextProvider && isCompletionFallbackEligible(completionException)) {
        continue;
    }
    return Mono.error(completionException);
}

Explanation

  • If a 429 response lacks timing headers, decisionResolver.resolveFromOpenAiHeaders(...) throws RateLimitDecisionException.
  • Because recordProviderFailure() does not catch that exception, execution never reaches the subsequent markPrimaryBackoff() branch, leaving primaryBackoffUntilEpochMs unchanged (default 0).
  • The circuit breaker is also not updated because applyRateLimit() is skipped, so ProviderCircuitState.recordRateLimit() is never called and circuitOpen remains false.
  • Availability checks (isPrimaryInBackoff() and rateLimitService.isProviderAvailable()) therefore both treat the provider as available, causing immediate retry traffic after a 429.
  • Additionally, the exception is thrown from within OpenAIStreamingService.complete()’s catch block, so it escapes the try/catch entirely and prevents fallback behavior.

Recommended Fix

Catch and swallow RateLimitDecisionException inside recordProviderFailure() so primary backoff still applies (without mutating circuit-breaker timing when headers are absent):

public void recordProviderFailure(RateLimitService.ApiProvider provider, Throwable throwable) {
    if (throwable instanceof OpenAIServiceException serviceException
            && serviceException.statusCode() == HTTP_TOO_MANY_REQUESTS) {
        try {
            rateLimitService.recordRateLimitFromOpenAiServiceException(provider, serviceException);
        } catch (RateLimitDecisionException e) {
            log.warn(
                    "[{}] Rate-limit timing unavailable ({}), continuing with backoff",
                    provider.getName(),
                    e.getMessage());
            // Continue to markPrimaryBackoff() - do not re-throw
        }
    }

    if (provider == configuredPrimaryProvider() && shouldBackoffPrimary(throwable)) {
        markPrimaryBackoff();
    }
}

History

This bug was introduced in commit fa355bd (strict header-only timing resolution now throws RateLimitDecisionException when headers are missing; the exception propagates and bypasses both primary backoff and circuit breaker updates).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions