fix: bedrock converse thinking block issue #20355

zxzinn · 2025-12-11T15:01:06Z

Description

This PR exposes thinking/reasoning content from Claude 4 Sonnet with extended thinking mode through additional_kwargs["thinking_delta"] during streaming in BedrockConverse.

Problem

When using BedrockConverse with Claude 4 Sonnet (model ID: us.anthropic.claude-sonnet-4-20250514-v1:0) with thinking mode enabled, the thinking/reasoning content is only available in the raw response object via event.raw['contentBlockDelta']['delta']['reasoningContent']['text']. This makes it difficult for downstream consumers (like FunctionAgent) to access thinking content in a standard way.

Solution

Store thinking delta in ChatResponse.additional_kwargs["thinking_delta"] during streaming, following the existing pattern used by other LLMs for metadata like tool_calls and annotations.

Key changes:

Modified stream_chat() to populate additional_kwargs["thinking_delta"] when reasoning content is present
Modified astream_chat() with the same logic
Thinking content is still accumulated in ThinkingBlock in the final message
Added comprehensive unit tests

Example Usage

Basic Streaming with Thinking Content

from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core.base.llms.types import ChatMessage, MessageRole

llm = BedrockConverse(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    thinking={"type": "enabled", "budget_tokens": 1024},
    temperature=1,  # Required for thinking mode
)

messages = [ChatMessage(role=MessageRole.USER, content="What is 15 + 27?")]

# Stream and access thinking deltas in real-time
for response in llm.stream_chat(messages):
    # Access thinking content (reasoning process)
    if "thinking_delta" in response.additional_kwargs:
        thinking = response.additional_kwargs["thinking_delta"]
        print(f"Thinking: {thinking}", end="", flush=True)

    # Access text content (final answer)
    if response.delta:
        print(f"Answer: {response.delta}", end="", flush=True)

Collecting Complete Thinking Content

from llama_index.core.base.llms.types import ThinkingBlock

responses = list(llm.stream_chat(messages))

# Collect all thinking deltas
thinking_content = [
    r.additional_kwargs["thinking_delta"]
    for r in responses
    if "thinking_delta" in r.additional_kwargs
]
full_thinking = "".join(thinking_content)

# Or access accumulated thinking from final message
final_response = responses[-1]
thinking_blocks = [
    b for b in final_response.message.blocks
    if isinstance(b, ThinkingBlock)
]
if thinking_blocks:
    accumulated_thinking = thinking_blocks[0].content

Async Streaming

async for response in await llm.astream_chat(messages):
    if "thinking_delta" in response.additional_kwargs:
        print(f"Thinking: {response.additional_kwargs['thinking_delta']}")
    if response.delta:
        print(f"Answer: {response.delta}")

Important Notes:

temperature=1 is required when using extended thinking mode
thinking_delta and delta are mutually exclusive in each response
Final message's ThinkingBlock contains the complete accumulated thinking content

New Package?

Yes
No

Version Bump?

Yes (0.12.2 → 0.12.3)
No

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

I added new unit tests to cover this change
- Unit tests (tests/test_thinking_delta.py):
  - test_thinking_delta_populated_in_stream_chat: Verifies thinking_delta is correctly populated in additional_kwargs
  - test_thinking_delta_none_for_non_thinking_content: Ensures None for regular text without thinking
  - test_thinking_block_in_message_blocks: Validates ThinkingBlock accumulation in final message
- Integration test (tests/test_llms_bedrock_converse.py):
  - test_bedrock_converse_thinking_delta_in_additional_kwargs: Real AWS Bedrock API test verifying thinking_delta in both sync and async streaming

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

Add thinking_delta field to ChatResponseAsyncGen to capture reasoning content deltas during streaming. Extract reasoning text into separate variable for clarity and populate thinking_delta in stream responses. Update version to 0.12.3 and add comprehensive tests for thinking delta functionality.

logan-markewich · 2025-12-12T05:13:33Z

llama-index-core/llama_index/core/base/llms/types.py

    message: ChatMessage
    raw: Optional[Any] = None
    delta: Optional[str] = None
+    thinking_delta: Optional[str] = Field(


Id rather not add this. Lets abuse additional_kwargs for now like other llms do. At some point we need to expose streaming content blocks instead

@logan-markewich
I've updated the implementation and added the usage example in the PR description.

Move thinking_delta from a dedicated field to additional_kwargs to simplify the ChatResponse structure. Update bedrock_converse streaming methods and tests to access thinking_delta via additional_kwargs instead of the direct field.

zxzinn added 3 commits December 11, 2025 22:59

linting.

d16e660

linting.

f15b385

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 11, 2025

logan-markewich reviewed Dec 12, 2025

View reviewed changes

zxzinn added 2 commits December 12, 2025 20:34

Merge branch 'main' into zxzinn/fix-thinking-block-issue

fe89514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: bedrock converse thinking block issue #20355

fix: bedrock converse thinking block issue #20355

zxzinn commented Dec 11, 2025 •

edited

Loading

Uh oh!

logan-markewich Dec 12, 2025

Uh oh!

zxzinn Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: bedrock converse thinking block issue #20355

Are you sure you want to change the base?

fix: bedrock converse thinking block issue #20355

Conversation

zxzinn commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Example Usage

Basic Streaming with Thinking Content

Collecting Complete Thinking Content

Async Streaming

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

logan-markewich Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

zxzinn Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zxzinn commented Dec 11, 2025 •

edited

Loading