revert: Token estimation values should use existing auto-calculation#29
Draft
revert: Token estimation values should use existing auto-calculation#29
Conversation
- 在 LLMConfig 中添加 estimated_output_tokens, vlm_estimated_output_tokens, vlm_max_tokens, vlm_image_tokens_estimate - 在 DeepSeekOCRConfig 中添加 image_tokens_estimate - 更新 multi_tenant.py 使用配置值替代硬编码 - 更新 deepseek_ocr_client.py 使用配置值替代硬编码 - 更新 env.example 添加新的环境变量说明 Closes #issue Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix hardcoded token estimation values for accurate rate limiting
feat: Extract hardcoded token estimation values to configuration
Dec 15, 2025
BukeLy
requested changes
Dec 15, 2025
Owner
BukeLy
left a comment
There was a problem hiding this comment.
整个估算的配置就不应该这么加,按照设计目标来说应该是通过
LLM_REQUESTS_PER_MINUTE=800 # 每分钟最大请求数(默认 800)
LLM_TOKENS_PER_MINUTE=40000 # 每分钟最大令牌数(包含输入+输出,默认 40000)
这两个环境变量来算
env.example
Outdated
Comment on lines
34
to
41
| # --- LLM Token 估算配置(用于速率限制) --- | ||
| # 估算输出 tokens 数量,用于速率限制计算 | ||
| # 如果估算过高,并发会受限;如果估算不足,可能触发 429 错误 | ||
| # LLM_ESTIMATED_OUTPUT_TOKENS=3000 # LLM 输出估算(实体提取约 3000 tokens,默认 3000) | ||
| # LLM_VLM_ESTIMATED_OUTPUT_TOKENS=500 # VLM 输出估算(图片描述较短,默认 500) | ||
| # LLM_VLM_MAX_TOKENS=500 # VLM API 最大输出 tokens(默认 500) | ||
| # LLM_VLM_IMAGE_TOKENS_ESTIMATE=200 # VLM 图片输入估算 tokens(默认 200) | ||
|
|
Owner
Contributor
Author
There was a problem hiding this comment.
确实有冲突。已在 e2cb6e7 中撤销这些配置项。
rate_limiter.py 的 avg_tokens_map 已内置各服务的 token 估算值,通过 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 自动计算并发数,不需要额外暴露这些内部值。
src/multi_tenant.py
Outdated
Comment on lines
67
to
72
| # Token 估算配置 | ||
| self.llm_estimated_output_tokens = config.llm.estimated_output_tokens | ||
| self.vlm_estimated_output_tokens = config.llm.vlm_estimated_output_tokens | ||
| self.vlm_max_tokens = config.llm.vlm_max_tokens | ||
| self.vlm_image_tokens_estimate = config.llm.vlm_image_tokens_estimate | ||
|
|
根据用户反馈,这些估算值应该通过 LLM_REQUESTS_PER_MINUTE 和 LLM_TOKENS_PER_MINUTE 自动计算,不需要额外的配置项。 rate_limiter.py 中已有 avg_tokens_map 用于自动计算并发数。 Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>
Contributor
Author
已在 e2cb6e7 中撤销所有修改。 速率限制的并发数确实应该通过 |
Copilot
AI
changed the title
feat: Extract hardcoded token estimation values to configuration
revert: Token estimation values should use existing auto-calculation
Dec 15, 2025
Owner
|
@copilot 后续如何改进呢?issue是存在的,请问怎么解决? |
Contributor
Author
建议改进方案:将分散的硬编码值统一到 当前问题:
改进方案:
这样可保持与 是否需要我实现这个方案? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After review, the proposed changes to extract hardcoded token estimation values to configuration have been reverted.
Reason
The rate limiting concurrency is already automatically calculated using existing environment variables:
LLM_REQUESTS_PER_MINUTE(default: 800)LLM_TOKENS_PER_MINUTE(default: 40000)The
rate_limiter.pymodule contains an internalavg_tokens_mapthat provides token estimation values for each service (LLM: 3500, Embedding: 20000, Rerank: 500, DS_OCR: 3500). These internal values are used to auto-calculate the appropriate concurrency level.Conclusion
No additional configuration is needed. The existing
LLM_REQUESTS_PER_MINUTEandLLM_TOKENS_PER_MINUTEenvironment variables, combined with the internal token estimation inrate_limiter.py, already provide the correct behavior for rate limiting.Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.