revert: Token estimation values should use existing auto-calculation by Copilot · Pull Request #29 · BukeLy/rag-api

Copilot · 2025-12-15T09:35:15Z

After review, the proposed changes to extract hardcoded token estimation values to configuration have been reverted.

Reason

The rate limiting concurrency is already automatically calculated using existing environment variables:

LLM_REQUESTS_PER_MINUTE (default: 800)
LLM_TOKENS_PER_MINUTE (default: 40000)

The rate_limiter.py module contains an internal avg_tokens_map that provides token estimation values for each service (LLM: 3500, Embedding: 20000, Rerank: 500, DS_OCR: 3500). These internal values are used to auto-calculate the appropriate concurrency level.

Conclusion

No additional configuration is needed. The existing LLM_REQUESTS_PER_MINUTE and LLM_TOKENS_PER_MINUTE environment variables, combined with the internal token estimation in rate_limiter.py, already provide the correct behavior for rate limiting.

Original prompt

This section details on the original issue you should resolve

<issue_title>Token estimation values hardcoded multiple places - inaccurate rate limiting</issue_title>
<issue_description>## 问题描述
Token 估算值硬编码在多个文件中，导致速率限制不准确。

受影响的文件和位置

multi_tenant.py 行 113：estimated_output = 3000（LLM）

multi_tenant.py 行 313：estimated_output = 500（VLM）

multi_tenant.py 行 339：max_tokens = 500

deepseek_ocr_client.py 行 285, 352

问题

无法按不同模型调整估算值

同一个值 "3000" 在多处重复，维护困难

使用小模型时估算过高，并发受限；使用大模型时估算不足

解决方案

应该提取到 .env 配置：

LLM_ESTIMATED_OUTPUT_TOKENS

VLM_ESTIMATED_OUTPUT_TOKENS 等</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Token estimation values hardcoded multiple places - inaccurate rate limiting #13

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- 在 LLMConfig 中添加 estimated_output_tokens, vlm_estimated_output_tokens, vlm_max_tokens, vlm_image_tokens_estimate - 在 DeepSeekOCRConfig 中添加 image_tokens_estimate - 更新 multi_tenant.py 使用配置值替代硬编码 - 更新 deepseek_ocr_client.py 使用配置值替代硬编码 - 更新 env.example 添加新的环境变量说明 Closes #issue Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>

BukeLy

整个估算的配置就不应该这么加,按照设计目标来说应该是通过

LLM_REQUESTS_PER_MINUTE=800        # 每分钟最大请求数（默认 800）
LLM_TOKENS_PER_MINUTE=40000        # 每分钟最大令牌数（包含输入+输出，默认 40000）

这两个环境变量来算

BukeLy · 2025-12-15T10:00:45Z

env.example

+# --- LLM Token 估算配置（用于速率限制） ---
+# 估算输出 tokens 数量，用于速率限制计算
+# 如果估算过高，并发会受限；如果估算不足，可能触发 429 错误
+# LLM_ESTIMATED_OUTPUT_TOKENS=3000  # LLM 输出估算（实体提取约 3000 tokens，默认 3000）
+# LLM_VLM_ESTIMATED_OUTPUT_TOKENS=500  # VLM 输出估算（图片描述较短，默认 500）
+# LLM_VLM_MAX_TOKENS=500            # VLM API 最大输出 tokens（默认 500）
+# LLM_VLM_IMAGE_TOKENS_ESTIMATE=200 # VLM 图片输入估算 tokens（默认 200）
+


这个变量值和LLM_REQUESTS_PER_MINUTE=800 # 每分钟最大请求数（默认 800）
LLM_TOKENS_PER_MINUTE=40000 # 每分钟最大令牌数（包含输入+输出，默认 40000）
是不是有冲突?明明是自动计算,为什么还要这些估算值? @copilot @claude

确实有冲突。已在 e2cb6e7 中撤销这些配置项。

rate_limiter.py 的 avg_tokens_map 已内置各服务的 token 估算值，通过 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 自动计算并发数，不需要额外暴露这些内部值。

BukeLy · 2025-12-15T10:01:12Z

src/multi_tenant.py

+        # Token 估算配置
+        self.llm_estimated_output_tokens = config.llm.estimated_output_tokens
+        self.vlm_estimated_output_tokens = config.llm.vlm_estimated_output_tokens
+        self.vlm_max_tokens = config.llm.vlm_max_tokens
+        self.vlm_image_tokens_estimate = config.llm.vlm_image_tokens_estimate
+


问题同.env.example

已在 e2cb6e7 中撤销相关修改。

根据用户反馈，这些估算值应该通过 LLM_REQUESTS_PER_MINUTE 和 LLM_TOKENS_PER_MINUTE 自动计算，不需要额外的配置项。 rate_limiter.py 中已有 avg_tokens_map 用于自动计算并发数。 Co-authored-by: BukeLy <19304666+BukeLy@users.noreply.github.com>

Copilot · 2025-12-15T10:03:58Z

整个估算的配置就不应该这么加,按照设计目标来说应该是通过
LLM_REQUESTS_PER_MINUTE=800        # 每分钟最大请求数（默认 800）...

已在 e2cb6e7 中撤销所有修改。

速率限制的并发数确实应该通过 LLM_REQUESTS_PER_MINUTE 和 LLM_TOKENS_PER_MINUTE 自动计算，rate_limiter.py 中的 avg_tokens_map 已包含各服务的内部 token 估算值用于计算。

BukeLy · 2025-12-15T10:20:02Z

@copilot 后续如何改进呢?issue是存在的,请问怎么解决?

Copilot · 2025-12-15T10:21:41Z

@copilot 后续如何改进呢?issue是存在的,请问怎么解决?

建议改进方案：将分散的硬编码值统一到 rate_limiter.py 的 avg_tokens_map 中复用。

当前问题：

multi_tenant.py:113 - estimated_output = 3000
multi_tenant.py:312-313 - image_tokens = 200, estimated_output = 500
deepseek_ocr_client.py:285,352 - 1000 + self.config.max_tokens

改进方案：

在 rate_limiter.py 导出 avg_tokens_map 或创建 get_avg_tokens(service) 函数
multi_tenant.py 和 deepseek_ocr_client.py 调用此函数获取估算值
这样只需维护 rate_limiter.py 一处，无需添加新环境变量

这样可保持与 LLM_REQUESTS_PER_MINUTE / LLM_TOKENS_PER_MINUTE 的自动计算逻辑一致，同时解决硬编码分散的问题。

是否需要我实现这个方案？

Initial plan

7cd7c2e

Copilot AI assigned Copilot and BukeLy Dec 15, 2025

Copilot started work on behalf of BukeLy December 15, 2025 09:35 View session

Copilot AI changed the title ~~[WIP] Fix hardcoded token estimation values for accurate rate limiting~~ feat: Extract hardcoded token estimation values to configuration Dec 15, 2025

Copilot AI requested a review from BukeLy December 15, 2025 09:47

Copilot finished work on behalf of BukeLy December 15, 2025 09:47

BukeLy requested changes Dec 15, 2025

View reviewed changes

Copilot started work on behalf of BukeLy December 15, 2025 10:02 View session

Copilot AI changed the title ~~feat: Extract hardcoded token estimation values to configuration~~ revert: Token estimation values should use existing auto-calculation Dec 15, 2025

Copilot AI requested a review from BukeLy December 15, 2025 10:05

Copilot finished work on behalf of BukeLy December 15, 2025 10:05

Copilot started work on behalf of BukeLy December 15, 2025 10:20 View session

Copilot finished work on behalf of BukeLy December 15, 2025 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert: Token estimation values should use existing auto-calculation#29

revert: Token estimation values should use existing auto-calculation#29
Copilot wants to merge 3 commits intomainfrom
copilot/fix-token-estimation-values

Copilot AI commented Dec 15, 2025 •

edited

Loading

Uh oh!

BukeLy left a comment

Uh oh!

BukeLy Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

BukeLy Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI commented Dec 15, 2025

Uh oh!

BukeLy commented Dec 15, 2025

Uh oh!

Copilot AI commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reason

Conclusion

受影响的文件和位置

问题

解决方案

Comments on the Issue (you are @copilot in this section)

Uh oh!

BukeLy left a comment

Choose a reason for hiding this comment

Uh oh!

BukeLy Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

BukeLy Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 15, 2025

Uh oh!

BukeLy commented Dec 15, 2025

Uh oh!

Copilot AI commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 15, 2025 •

edited

Loading