[Detail Bug] LLM responses with status FAILED/INCOMPLETE are treated as success, returning partial output and resetting rate-limit state

# Detail Bug Report

https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_079f3fb5-3984-4848-85a5-1ae010e9b95f

Introduced in [#3](https://github.com/WilliamAGH/java-chat/pull/3) by @WilliamAGH on Jan 24, 2026

# Summary
- **Context**: `OpenAIStreamingService` orchestrates LLM calls via the OpenAI Responses API, handling both streaming and non-streaming (complete) paths.
- **Bug**: The `complete()` method and `executeStreamingRequest()` method never inspect the `Response.status()` field, so responses with `status=FAILED` or `status=INCOMPLETE` are treated identically to successful ones — `rateLimitService.recordSuccess()` is called and partial/empty text is returned silently.
- **Actual vs. expected**: When the API returns HTTP 200 with `status=FAILED` or `status=INCOMPLETE`, the code should surface an error (or at minimum not record success). Instead it records a successful rate-limit outcome and returns whatever text exists in the output array, which may be empty or truncated.
- **Impact**: A `status=INCOMPLETE` response with partial valid JSON (e.g., `{"order":[0,3]}` from RerankerService's 128-token budget) can be parsed as legitimate, silently producing wrong reranking results. Additionally, false `recordSuccess()` calls reset circuit-breaker state and zero the consecutive-failure counter for non-429 failures.

# Code with Bug
```java
// Non-streaming path (complete method, ~line 223-226):
Response completion =
        providerCandidate.client().responses().create(requestParameters, requestOptions);
rateLimitService.recordSuccess(activeProvider);  // <-- BUG 🔴 records success without checking response.status()
```

```java
// Streaming path (executeStreamingRequest, ~line 343-345):
.doOnComplete(() -> {
    log.debug("[LLM] Stream completed successfully (providerId={})", activeProvider.ordinal());
    rateLimitService.recordSuccess(activeProvider);  // <-- BUG 🔴 fires on any stream termination, even ResponseFailedEvent/ResponseIncompleteEvent
})
```

```java
// Text extraction (extractTextFromResponse, ~line 399-417):
private String extractTextFromResponse(Response response) {
    if (response == null) {
        return "";
    }
    StringBuilder outputBuilder = new StringBuilder();
    for (ResponseOutputItem outputItem : response.output()) {
        // ... extracts text without ever checking response.status()
    }
    return outputBuilder.toString();
}
```

```java
// Streaming text extraction (extractTextDelta, ~line 395-396):
private Optional<String> extractTextDelta(ResponseStreamEvent event) {
    return event.outputTextDelta().map(ResponseTextDeltaEvent::delta);
    // <-- BUG 🔴 never checks event.failed() or event.incomplete() terminal events
}
```

# Explanation
- The OpenAI Responses API can return HTTP 200 while the response object indicates a terminal non-success state (`status=failed` / `status=incomplete`).
- In the non-streaming path, the code immediately records success after `responses().create(...)` and then extracts output text without validating `response.status()`. This allows partial/empty output to be returned as if the call succeeded.
- In the streaming path, the OpenAI Java SDK yields `response.failed` and `response.incomplete` as normal terminal events (not exceptions). Because the code only hooks `doOnComplete()`, it records success when the stream ends regardless of whether the terminal event was `completed`, `failed`, or `incomplete`.

## Codebase Inconsistency
- `RerankerService` only throws when the parsed ordering is empty; it does not verify that all documents were reranked. This means a truncated-but-valid JSON payload like `{"order":[0,3]}` (possible under the 128-token output budget) produces a non-empty but incomplete ordering with no error.

```java
for (Integer documentIndex : orderResponse.order()) {
    if (documentIndex == null) {
        continue;  // Silently skips nulls
    }
    if (documentIndex >= 0 && documentIndex < documents.size()) {
        reordered.add(documents.get(documentIndex));  // Silently skips out-of-bounds
    }
}
return reordered;  // Returns whatever was valid - NO COUNT CHECK
```

- `RateLimitService.recordSuccess()` resets backoff/circuit state, so misclassifying FAILED/INCOMPLETE as success removes escalation for non-429 failures.

```java
state.setConsecutiveFailures(0);
state.setBackoffMultiplier(1.0);
state.setCircuitOpen(false);
```

# Recommended Fix
- In the non-streaming path, validate `response.status()` is `COMPLETED` before extracting output; otherwise throw (and do not call `recordSuccess()`).
- In the streaming path, handle terminal events explicitly: if `event.failed()` or `event.incomplete()` is present, fail the stream; only call `rateLimitService.recordSuccess(...)` when a `completed` terminal event is observed.

# History
This bug was introduced in commit 6c3bdca. The commit migrated from the OpenAI Chat Completions API to the Responses API but failed to account for the new `Response.status` field. The Chat Completions API returned content directly or threw exceptions, so `recordSuccess()` after a non-exception response was correct. The Responses API introduced a status field that can be `FAILED` or `INCOMPLETE` while still returning HTTP 200 with partial content — but the migration simply extracted text without checking status.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Detail Bug] LLM responses with status FAILED/INCOMPLETE are treated as success, returning partial output and resetting rate-limit state #60

Detail Bug Report

Summary

Code with Bug

Explanation

Codebase Inconsistency

Recommended Fix

History

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Detail Bug] LLM responses with status FAILED/INCOMPLETE are treated as success, returning partial output and resetting rate-limit state #60

Description

Detail Bug Report

Summary

Code with Bug

Explanation

Codebase Inconsistency

Recommended Fix

History

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions