Skip to content

fix: max_tokens limits#244

Merged
Lavriz merged 2 commits into
mainfrom
fix-claude-tokens
May 25, 2025
Merged

fix: max_tokens limits#244
Lavriz merged 2 commits into
mainfrom
fix-claude-tokens

Conversation

@Lavriz
Copy link
Copy Markdown
Contributor

@Lavriz Lavriz commented May 25, 2025

Issue

API requests were failing with: max_tokens: 128000 > 32000, which is the maximum allowed number of output tokens for claude-opus-4-20250514, #241 (comment)

see models limits for input/output, we don't have 128k output anymore.

Claude may not use the entire budget allocated, especially at ranges above 32k.

budget_tokens must be set to a value less than max_tokens.

Changes

  • Set correct max_tokens limits: 32k for Opus, 64k for Sonnet
  • Opus thinking: 20k thinking budget, 12k output
  • Sonnet thinking: 32k thinking budget, 32k output

@Lavriz
Copy link
Copy Markdown
Contributor Author

Lavriz commented May 25, 2025

@claude review this PR

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2025

Claude finished @Lavriz's task —— View job


PR Review Complete ✅

This PR correctly fixes the token limits issue for Claude models. Here's my detailed review:

Review Summary

LGTM - This change properly addresses the API failures by setting correct token limits according to Anthropic's current model specifications.

Code Analysis

Fixes Applied (src/providers/anthropic.ts:65-83):

  • Opus limits: Set to 32k max_tokens (down from 128k)
  • Sonnet limits: Set to 64k max_tokens
  • Thinking budgets: Properly allocated within max_tokens constraints
    • Opus thinking: 20k budget + 12k output = 32k total
    • Sonnet thinking: 32k budget + 32k output = 64k total

Technical Review

Strengths:

  1. Correct token allocation: The math works correctly - thinking budget + output never exceeds max_tokens
  2. Model mapping preserved: The logic correctly maps thinking variants to base models (src/providers/anthropic.ts:44-48)
  3. Temperature override: Properly sets temperature: 1 for thinking models (src/providers/anthropic.ts:78)
  4. Clean conditional logic: Well-structured if/else blocks handle all model variants

Code Quality:

  • Clean, readable implementation
  • Consistent formatting and naming
  • No breaking changes to the API interface

Verification Against Anthropic Docs

The token limits align with Anthropic's model specifications:

  • Claude Opus: 32k output limit ✅
  • Claude Sonnet: 64k output limit ✅
  • Thinking budgets must be < max_tokens ✅

Minor Observations

  • The default max_tokens: 8_192 at line 52 gets overridden for all Claude 4 models, which is correct behavior
  • No concerns with the anthropic-beta header handling for thinking models

Recommendation

Approve and merge - This fix resolves the immediate API failures and implements correct token limits according to current Anthropic model specifications.

Todo List

  • Read repository setup instructions (CLAUDE.md) - No CLAUDE.md found
  • Analyze the changed file src/providers/anthropic.ts
  • Review token limits against Anthropic documentation
  • Check thinking budget implementation
  • Provide comprehensive review feedback

@Lavriz Lavriz self-assigned this May 25, 2025
@Lavriz Lavriz requested review from evgenydmitriev and jalmonter and removed request for evgenydmitriev and jalmonter May 25, 2025 08:17
@Lavriz
Copy link
Copy Markdown
Contributor Author

Lavriz commented May 25, 2025

@claude explicitly configure max_tokens for each supported Claude 4 model without using a default value (8_192) since we don't support any other claude models.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 25, 2025

Claude finished @Lavriz's task —— View job


Todo List

  • Read current anthropic.ts implementation
  • Remove default max_tokens value (8_192)
  • Set explicit max_tokens for each Claude 4 model
  • Verify all model variants are covered
  • Commit changes to PR branch

Changes Made ✅

  • Removed the default max_tokens: 8_192 from the query object
  • Created explicit max_tokens configuration for each Claude 4 model variant:
    • Claude_4_Opus_thinking: 32,000 tokens
    • Claude_4_Sonnet_thinking: 64,000 tokens
    • Claude_4_Opus: 32,000 tokens
    • Claude_4_Sonnet: 64,000 tokens
  • Added error handling for unsupported models with clear error message
  • Simplified the thinking mode configuration logic

The code now explicitly configures max_tokens for each supported model without relying on any default value. Each Claude 4 model variant has its own explicit configuration as requested.

Commit: f7b3bbf


| Branch: fix-claude-tokens

- Remove default max_tokens value (8_192) that was being overridden
- Set explicit max_tokens for each supported Claude 4 model variant
- Add error handling for unsupported models
- Simplify thinking mode configuration logic

Co-authored-by: Lavriz <Lavriz@users.noreply.github.com>
@Lavriz Lavriz requested review from evgenydmitriev and jalmonter and removed request for jalmonter May 25, 2025 08:29
@Lavriz Lavriz merged commit a87b3d5 into main May 25, 2025
2 of 3 checks passed
@Lavriz Lavriz deleted the fix-claude-tokens branch May 25, 2025 17:14
@Lavriz
Copy link
Copy Markdown
Contributor Author

Lavriz commented May 25, 2025

@Kseymur fixed, thanks!!

@Lavriz Lavriz restored the fix-claude-tokens branch May 25, 2025 17:44
@Lavriz Lavriz deleted the fix-claude-tokens branch May 25, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant