Hi! We are using ai-gateway-provider with @ai-sdk/openai and noticed that transient upstream failures sometimes aren't retried. I did some digging and my findings are:
When the upstream provider fails, the gateway sometimes returns HTTP 400 with its own error format:
{"success":false,"result":[],"messages":[],"error":[{"code":2005,"message":"Failed to get response from provider"}]}
Rather than forwarding the upstream response, the gateway replaces it with this. This means:
- The AI SDK can't determine retryability from the original upstream status code
- The AI SDK treats 400 as non-retryable (isRetryable defaults to statusCode >= 500), so the request fails permanently
In contrast, when the gateway passes through the upstream's raw response (e.g. a 503 from OpenAI), the SDK correctly retries and succeeds. We've observed both behaviors for the same type of request.
Two things we noticed
- HTTP semantics: Would 502 be more appropriate for code 2005? A 400 implies a client error, which makes every HTTP client and retry mechanism in the stack treat it as non-retryable. Even if the upstream did return a 400, the information is lost.
- ai-gateway-provider: processModelRequest handles error codes 2001 and 2009 on 400 responses, but code 2005 falls through and gets passed to @ai-sdk/openai, which can't parse the gateway error format ({"error":[...]} vs the expected {"error":{"message":"..."}}) and defaults retryability based on the 400 status code.
I hope I got this right, would like to hear your thoughts on this and if this is fixable?
If it helps, here are two log ids for when this happend
01KJSPSTZK2FRN0746W7HAGVT7
01KJA5ZAA082QHT1CXZT645NY7
Hi! We are using
ai-gateway-providerwith@ai-sdk/openaiand noticed that transient upstream failures sometimes aren't retried. I did some digging and my findings are:When the upstream provider fails, the gateway sometimes returns HTTP 400 with its own error format:
{"success":false,"result":[],"messages":[],"error":[{"code":2005,"message":"Failed to get response from provider"}]}Rather than forwarding the upstream response, the gateway replaces it with this. This means:
In contrast, when the gateway passes through the upstream's raw response (e.g. a 503 from OpenAI), the SDK correctly retries and succeeds. We've observed both behaviors for the same type of request.
Two things we noticed
I hope I got this right, would like to hear your thoughts on this and if this is fixable?
If it helps, here are two log ids for when this happend
01KJSPSTZK2FRN0746W7HAGVT701KJA5ZAA082QHT1CXZT645NY7