Skip to content

Fix responses transformer to properly close reasoning before message content#466

Open
kwanLeeFrmVi wants to merge 3 commits intodecolua:masterfrom
kwanLeeFrmVi:fix/responses-transformer-reasoning-transition
Open

Fix responses transformer to properly close reasoning before message content#466
kwanLeeFrmVi wants to merge 3 commits intodecolua:masterfrom
kwanLeeFrmVi:fix/responses-transformer-reasoning-transition

Conversation

@kwanLeeFrmVi
Copy link
Copy Markdown
Contributor

Problem

When models output native reasoning_content (e.g., ollama/glm-5, ollama/kimi-k2.5), the Responses API transformer was not properly transitioning from reasoning to message content. The reasoning stream stayed "open" until finish_reason arrived, causing out-of-order events that confused clients and prevented message rendering after reasoning.

Root Cause

In createResponsesApiTransformStream, closeReasoning() was only called at:

  1. </thinking> tag detection (for tag-based reasoning)
  2. finish_reason arrival
  3. Stream flush

When native reasoning_content was followed by content, there was no explicit transition.

Fix

Added early reasoning closure when content arrives after reasoning_content:

if (delta.content) {
  // Close reasoning if we had reasoning_content and now have content
  if (state.reasoningId && !state.reasoningDone) {
    closeReasoning(controller);
  }
  // ... handle content
}

…fter reasoning blocks

- Auto-close reasoning block when switching from reasoning_content to regular content
- Use incremented output_index (reasoningIndex + 1) for message items after reasoning
- Apply msgIdx correction to both text content and tool_calls handling
- Prevent output_index collision between reasoning and subsequent message items
@kwanLeeFrmVi kwanLeeFrmVi force-pushed the fix/responses-transformer-reasoning-transition branch from ee9d427 to ab03398 Compare April 1, 2026 09:06
…_tokens

- Convert OpenAI reasoning_effort (low/medium/high) to Claude thinking config
- Map low→1024, medium→8192, high→32768 budget_tokens
- Use DEFAULT_BUDGET_TOKENS fallback for thinking.budget_tokens when not specified
- Import DEFAULT_BUDGET_TOKENS from runtimeConfig
@kwanLeeFrmVi
Copy link
Copy Markdown
Contributor Author

Description for commit #972363

Fix: Add mandatory budget_tokens for Claude thinking config and reasoning_effort support

Problem

Claude-compatible providers return 422 error when thinking is enabled without budget_tokens:

Failed to deserialize the JSON body into the target type: thinking: missing field budget_tokens at line 1 column 111088

Additionally, Claude Code clients sending reasoning_effort (e.g., reasoning_effort: "high") were not having this converted to the appropriate thinking.budget_tokens value.

Solution

open-sse/config/runtimeConfig.js

  • Added DEFAULT_BUDGET_TOKENS = 16000 constant for default thinking budget

open-sse/translator/request/openai-to-claude.js

  • Import DEFAULT_BUDGET_TOKENS from runtime config
  • Fix: Always include budget_tokens in thinking object (was previously omitted when not explicitly provided)
  • Feature: Convert reasoning_effort to thinking.budget_tokens:
    • low → 1024 tokens
    • medium → 8192 tokens
    • high → 32768 tokens

Files Changed

  • open-sse/config/runtimeConfig.js
  • open-sse/translator/request/openai-to-claude.js

… budget_tokens for older models

- Detect Claude 4.6 models via model name pattern (4-6 or 4.6)
- Map reasoning_effort to output_config.effort + adaptive thinking for 4.6
- Keep budget_tokens mapping (low→1024, medium→8192, high→32768) for older Claude models
- Default to adaptive thinking type for 4.6 when explicit thinking config provided without type
- Conditionally include budget_tokens only when present in explicit thinking config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant