Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_079f3fb5-3984-4848-85a5-1ae010e9b95f
Introduced in #3 by @WilliamAGH on Jan 24, 2026
Summary
- Context:
OpenAIStreamingService orchestrates LLM calls via the OpenAI Responses API, handling both streaming and non-streaming (complete) paths.
- Bug: The
complete() method and executeStreamingRequest() method never inspect the Response.status() field, so responses with status=FAILED or status=INCOMPLETE are treated identically to successful ones — rateLimitService.recordSuccess() is called and partial/empty text is returned silently.
- Actual vs. expected: When the API returns HTTP 200 with
status=FAILED or status=INCOMPLETE, the code should surface an error (or at minimum not record success). Instead it records a successful rate-limit outcome and returns whatever text exists in the output array, which may be empty or truncated.
- Impact: A
status=INCOMPLETE response with partial valid JSON (e.g., {"order":[0,3]} from RerankerService's 128-token budget) can be parsed as legitimate, silently producing wrong reranking results. Additionally, false recordSuccess() calls reset circuit-breaker state and zero the consecutive-failure counter for non-429 failures.
Code with Bug
// Non-streaming path (complete method, ~line 223-226):
Response completion =
providerCandidate.client().responses().create(requestParameters, requestOptions);
rateLimitService.recordSuccess(activeProvider); // <-- BUG 🔴 records success without checking response.status()
// Streaming path (executeStreamingRequest, ~line 343-345):
.doOnComplete(() -> {
log.debug("[LLM] Stream completed successfully (providerId={})", activeProvider.ordinal());
rateLimitService.recordSuccess(activeProvider); // <-- BUG 🔴 fires on any stream termination, even ResponseFailedEvent/ResponseIncompleteEvent
})
// Text extraction (extractTextFromResponse, ~line 399-417):
private String extractTextFromResponse(Response response) {
if (response == null) {
return "";
}
StringBuilder outputBuilder = new StringBuilder();
for (ResponseOutputItem outputItem : response.output()) {
// ... extracts text without ever checking response.status()
}
return outputBuilder.toString();
}
// Streaming text extraction (extractTextDelta, ~line 395-396):
private Optional<String> extractTextDelta(ResponseStreamEvent event) {
return event.outputTextDelta().map(ResponseTextDeltaEvent::delta);
// <-- BUG 🔴 never checks event.failed() or event.incomplete() terminal events
}
Explanation
- The OpenAI Responses API can return HTTP 200 while the response object indicates a terminal non-success state (
status=failed / status=incomplete).
- In the non-streaming path, the code immediately records success after
responses().create(...) and then extracts output text without validating response.status(). This allows partial/empty output to be returned as if the call succeeded.
- In the streaming path, the OpenAI Java SDK yields
response.failed and response.incomplete as normal terminal events (not exceptions). Because the code only hooks doOnComplete(), it records success when the stream ends regardless of whether the terminal event was completed, failed, or incomplete.
Codebase Inconsistency
RerankerService only throws when the parsed ordering is empty; it does not verify that all documents were reranked. This means a truncated-but-valid JSON payload like {"order":[0,3]} (possible under the 128-token output budget) produces a non-empty but incomplete ordering with no error.
for (Integer documentIndex : orderResponse.order()) {
if (documentIndex == null) {
continue; // Silently skips nulls
}
if (documentIndex >= 0 && documentIndex < documents.size()) {
reordered.add(documents.get(documentIndex)); // Silently skips out-of-bounds
}
}
return reordered; // Returns whatever was valid - NO COUNT CHECK
RateLimitService.recordSuccess() resets backoff/circuit state, so misclassifying FAILED/INCOMPLETE as success removes escalation for non-429 failures.
state.setConsecutiveFailures(0);
state.setBackoffMultiplier(1.0);
state.setCircuitOpen(false);
Recommended Fix
- In the non-streaming path, validate
response.status() is COMPLETED before extracting output; otherwise throw (and do not call recordSuccess()).
- In the streaming path, handle terminal events explicitly: if
event.failed() or event.incomplete() is present, fail the stream; only call rateLimitService.recordSuccess(...) when a completed terminal event is observed.
History
This bug was introduced in commit 6c3bdca. The commit migrated from the OpenAI Chat Completions API to the Responses API but failed to account for the new Response.status field. The Chat Completions API returned content directly or threw exceptions, so recordSuccess() after a non-exception response was correct. The Responses API introduced a status field that can be FAILED or INCOMPLETE while still returning HTTP 200 with partial content — but the migration simply extracted text without checking status.
Detail Bug Report
https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_079f3fb5-3984-4848-85a5-1ae010e9b95f
Introduced in #3 by @WilliamAGH on Jan 24, 2026
Summary
OpenAIStreamingServiceorchestrates LLM calls via the OpenAI Responses API, handling both streaming and non-streaming (complete) paths.complete()method andexecuteStreamingRequest()method never inspect theResponse.status()field, so responses withstatus=FAILEDorstatus=INCOMPLETEare treated identically to successful ones —rateLimitService.recordSuccess()is called and partial/empty text is returned silently.status=FAILEDorstatus=INCOMPLETE, the code should surface an error (or at minimum not record success). Instead it records a successful rate-limit outcome and returns whatever text exists in the output array, which may be empty or truncated.status=INCOMPLETEresponse with partial valid JSON (e.g.,{"order":[0,3]}from RerankerService's 128-token budget) can be parsed as legitimate, silently producing wrong reranking results. Additionally, falserecordSuccess()calls reset circuit-breaker state and zero the consecutive-failure counter for non-429 failures.Code with Bug
Explanation
status=failed/status=incomplete).responses().create(...)and then extracts output text without validatingresponse.status(). This allows partial/empty output to be returned as if the call succeeded.response.failedandresponse.incompleteas normal terminal events (not exceptions). Because the code only hooksdoOnComplete(), it records success when the stream ends regardless of whether the terminal event wascompleted,failed, orincomplete.Codebase Inconsistency
RerankerServiceonly throws when the parsed ordering is empty; it does not verify that all documents were reranked. This means a truncated-but-valid JSON payload like{"order":[0,3]}(possible under the 128-token output budget) produces a non-empty but incomplete ordering with no error.RateLimitService.recordSuccess()resets backoff/circuit state, so misclassifying FAILED/INCOMPLETE as success removes escalation for non-429 failures.Recommended Fix
response.status()isCOMPLETEDbefore extracting output; otherwise throw (and do not callrecordSuccess()).event.failed()orevent.incomplete()is present, fail the stream; only callrateLimitService.recordSuccess(...)when acompletedterminal event is observed.History
This bug was introduced in commit 6c3bdca. The commit migrated from the OpenAI Chat Completions API to the Responses API but failed to account for the new
Response.statusfield. The Chat Completions API returned content directly or threw exceptions, sorecordSuccess()after a non-exception response was correct. The Responses API introduced a status field that can beFAILEDorINCOMPLETEwhile still returning HTTP 200 with partial content — but the migration simply extracted text without checking status.