Skip to content

fix: reconstruct OpenAI Responses SSE output items#414

Open
tomasmach wants to merge 6 commits intospacedriveapp:mainfrom
tomasmach:fix/openai-chatgpt-responses-sse
Open

fix: reconstruct OpenAI Responses SSE output items#414
tomasmach wants to merge 6 commits intospacedriveapp:mainfrom
tomasmach:fix/openai-chatgpt-responses-sse

Conversation

@tomasmach
Copy link
Contributor

@tomasmach tomasmach commented Mar 12, 2026

Summary

  • reconstruct OpenAI ChatGPT Responses SSE output items from streamed response.output_item, content-part, text-delta, and function-call-argument events before parsing the final response
  • merge the reconstructed output back into response.completed so message-only completion shells from ChatGPT Codex no longer fail memory persistence flows in channels like Telegram
  • add regression tests covering message-text and function-call reconstruction from streamed Responses events

Changed Files

  • src/llm/model.rs — teach the Responses SSE parser to accumulate streamed output state and merge it with completed-response metadata
  • src/llm/model.rs — add regression tests for message content reconstruction and function call reconstruction

Verification

  • ./scripts/preflight.sh ✅ passed
  • ./scripts/gate-pr.sh ✅ passed
  • cargo test parse_openai_ ✅ passed
  • cargo build --release ✅ passed locally with embedded frontend during reproduction setup

Root Cause

  • ChatGPT OAuth requests use the OpenAI Responses SSE path, but some response.completed payloads only include a bare message shell without the text or function-call payload that arrived earlier in stream events
  • the old parser only trusted response.completed, so it surfaced empty or unsupported response from OpenAI ChatGPT Responses API instead of reconstructing the assistant output

Residual Risk

  • this change assumes the streamed Responses events remain internally consistent; if OpenAI introduces new output-item event types, they will still fall back to the existing completed-response parsing behavior

Note

This fix significantly improves robustness of the OpenAI Responses SSE parser by implementing stateful stream reconstruction. The parser now accumulates output events across response.output_item.added, response.content_part.added/done, response.output_text.delta/done, response.refusal.delta/done, and response.function_call_arguments.delta/done events before finalizing the response. This ensures incomplete response.completed payloads (which only contain metadata shells) are enriched with content captured during streaming. Two regression tests validate both message text reconstruction and function call argument reconstruction flows.

Written by Tembo for commit 35904669f. This will update automatically on new commits.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6dcc7863-3227-4347-8dc1-40442846368b

📥 Commits

Reviewing files that changed from the base of the PR and between 27aaa19 and d88882c.

📒 Files selected for processing (1)
  • src/llm/model.rs

Walkthrough

Replaces the SSE parser in src/llm/model.rs with an accumulator that reconstructs OpenAI-style streaming Responses by processing multiple data: events into output_items, adds JSON merge/shape helpers, expands error handling for provider-stream errors and missing data, and includes unit tests for message and function_call delta reconstruction; no public API signature changes.

Changes

Cohort / File(s) Summary
SSE reconstruction & JSON helpers
src/llm/model.rs
Replaces parse_openai_responses_sse_response with a streaming-aware SSE accumulator that collects data: events into an output_items map and finalizes a consolidated Responses payload. Adds private JSON utilities (json_value_is_empty, merge_json_value, ensure_json_object, ensure_json_array_field, ensure_json_array_index), response builders (ensure_responses_output_item, ensure_responses_content_part, append_json_string_field, set_json_string_field), merging/collection helpers (merge_responses_output_items, collect_responses_output_items), and logic to handle error events, missing SSE data, and reconstruction failure. Adds unit tests for streamed message output_text accumulation and streamed function_call arguments accumulation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: reconstruct OpenAI Responses SSE output items' directly and clearly summarizes the main change: implementing SSE output reconstruction in the OpenAI Responses parser.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the SSE reconstruction implementation, root cause, verification steps, and residual risks.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

src/llm/model.rs Outdated
return Vec::new();
};

let mut ordered_items = vec![serde_json::Value::Null; max_index + 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filling gaps with null means the final response.output can contain null entries if indices ever get sparse, which can make downstream parsing/reporting noisy. If you don’t rely on preserving sparse indices, you can just collect the existing items in key order (BTreeMap already keeps them ordered).

Suggested change
let mut ordered_items = vec![serde_json::Value::Null; max_index + 1];
fn collect_responses_output_items(
output_items: &BTreeMap<usize, serde_json::Value>,
) -> Vec<serde_json::Value> {
output_items.values().cloned().collect()
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/llm/model.rs (1)

2851-2858: Consider adding a comment explaining the streaming merge heuristic.

The string merge logic has specialized semantics for streaming text reconstruction—it only replaces when the source is longer and starts with the target. This non-obvious behavior could confuse future maintainers.

         (serde_json::Value::String(target_text), serde_json::Value::String(source_text)) => {
+            // Streaming merge: only replace if source extends target (common in delta accumulation)
+            // or if target is empty
             if target_text.is_empty()
                 || (source_text.len() > target_text.len()
                     && source_text.starts_with(target_text.as_str()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/model.rs` around lines 2851 - 2858, Add a concise comment above the
match arm handling (serde_json::Value::String(target_text),
serde_json::Value::String(source_text)) explaining the streaming-merge
heuristic: we only replace target_text when the incoming source_text is longer
and starts_with(target_text) to support incremental/streamed text reconstruction
(where newer chunks append to an earlier partial value) and to avoid overwriting
with shorter partial updates; mention that empty target_text is also replaced.
Keep the comment short, reference the variables target_text and source_text, and
place it immediately above that block in src/llm/model.rs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/llm/model.rs`:
- Around line 2851-2858: Add a concise comment above the match arm handling
(serde_json::Value::String(target_text), serde_json::Value::String(source_text))
explaining the streaming-merge heuristic: we only replace target_text when the
incoming source_text is longer and starts_with(target_text) to support
incremental/streamed text reconstruction (where newer chunks append to an
earlier partial value) and to avoid overwriting with shorter partial updates;
mention that empty target_text is also replaced. Keep the comment short,
reference the variables target_text and source_text, and place it immediately
above that block in src/llm/model.rs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c9b908f-229e-4027-a620-9a31f19fe850

📥 Commits

Reviewing files that changed from the base of the PR and between 4fe99c6 and 3590466.

📒 Files selected for processing (1)
  • src/llm/model.rs

Co-authored-by: tembo[bot] <208362400+tembo[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llm/model.rs`:
- Around line 3004-3012: There is a stray duplicated block of code that
reimplements the append logic after the append_json_string_field function has
already closed, causing an unmatched delimiter and compilation failure; remove
the duplicated block (the second occurrence that starts with let object =
ensure_json_object(value); and the subsequent operations on field_name/suffix)
so only the proper append_json_string_field implementation remains, ensuring no
extra closing brace or duplicated logic referencing ensure_json_object,
field_name, or suffix is left in the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15ac7337-5b59-486f-b648-99268ff8fb48

📥 Commits

Reviewing files that changed from the base of the PR and between 3590466 and 91befa6.

📒 Files selected for processing (1)
  • src/llm/model.rs

tomasmach and others added 4 commits March 12, 2026 17:13
- Simplify collect_responses_output_items to use BTreeMap::values().cloned().collect()
  instead of filling sparse indices with null (per Tembo review)
- Add doc comments to all 13 new helper functions for docstring coverage
- Add explanatory comment on merge_json_value streaming heuristic (per CodeRabbit review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants