fix: bedrock converse thinking block issue #20355
Open
+276
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR exposes thinking/reasoning content from Claude 4 Sonnet with extended thinking mode through
additional_kwargs["thinking_delta"]during streaming in BedrockConverse.Problem
When using BedrockConverse with Claude 4 Sonnet (model ID:
us.anthropic.claude-sonnet-4-20250514-v1:0) with thinking mode enabled, the thinking/reasoning content is only available in the raw response object viaevent.raw['contentBlockDelta']['delta']['reasoningContent']['text']. This makes it difficult for downstream consumers (likeFunctionAgent) to access thinking content in a standard way.Solution
Store thinking delta in
ChatResponse.additional_kwargs["thinking_delta"]during streaming, following the existing pattern used by other LLMs for metadata like tool_calls and annotations.Key changes:
stream_chat()to populateadditional_kwargs["thinking_delta"]when reasoning content is presentastream_chat()with the same logicThinkingBlockin the final messageExample Usage
Basic Streaming with Thinking Content
Collecting Complete Thinking Content
Async Streaming
Important Notes:
temperature=1is required when using extended thinking modethinking_deltaanddeltaare mutually exclusive in each responseThinkingBlockcontains the complete accumulated thinking contentNew Package?
Version Bump?
Type of Change
How Has This Been Tested?
test_thinking_delta_populated_in_stream_chat: Verifies thinking_delta is correctly populated in additional_kwargstest_thinking_delta_none_for_non_thinking_content: Ensures None for regular text without thinkingtest_thinking_block_in_message_blocks: Validates ThinkingBlock accumulation in final messagetest_bedrock_converse_thinking_delta_in_additional_kwargs: Real AWS Bedrock API test verifying thinking_delta in both sync and async streamingSuggested Checklist:
uv run make format; uv run make lintto appease the lint gods