-
-
Notifications
You must be signed in to change notification settings - Fork 5k
feat(proxy): add max_parallel_requests for model/user/team levels #17839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(proxy): add max_parallel_requests for model/user/team levels #17839
Conversation
- Add `get_key_model_max_parallel_requests()` helper in auth_utils.py - Add `user_max_parallel_requests` and `team_max_parallel_requests` fields to UserAPIKeyAuth - Wire model-level limits in parallel_request_limiter.py (line 323 TODO) - Wire user-level limits in parallel_request_limiter.py (line 366 TODO) - Wire team-level limits in parallel_request_limiter.py (line 394 TODO) - Add 10 unit tests for helper function and type fields Closes BerriAI#17824
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@peterkc can you add a loom of key model max parallel requests working as expected? |
JiangJiaWei1103
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over LGTM. I'll test this integration in our use case. Thanks!
|
@krrishdholakia Happy to add a Loom demo! A couple of quick clarifications to make sure I cover what you're looking for: v3 Limiter Support
Demo Scope
Want me to also include team-level limits and the priority override (key > team)? |
- Port model-level max_parallel_requests support to v3 rate limiter - Add documentation with mermaid priority flowchart - Covers key-level, team-level, and priority override patterns
Summary
Implements per-model, per-user, and per-team
max_parallel_requestslimits in the proxy rate limiter, addressing three TODOs inparallel_request_limiter.py.model_max_parallel_requests) ormodel_max_budgetuser_max_parallel_requestsfield onUserAPIKeyAuthteam_max_parallel_requestsfield onUserAPIKeyAuthChanges
auth_utils.pyget_key_model_max_parallel_requests()helper_types.pyuser_max_parallel_requestsandteam_max_parallel_requestsfieldsparallel_request_limiter.pyparallel_request_limiter_v3.pytest_max_parallel_requests.pyDocumentation
📄 Model-Specific Max Parallel Requests - New doc with:
Config Examples
Model-level (via API key metadata):
{ "model_max_parallel_requests": { "gpt-4": 5, "gpt-3.5-turbo": 20 } }User-level: Set
user_max_parallel_requestson user recordTeam-level: Set
team_max_parallel_requestson team recordTest plan
Closes #17824