feat(proxy): add max_parallel_requests for model/user/team levels #17839

peterkc · 2025-12-11T20:56:55Z

Summary

Implements per-model, per-user, and per-team max_parallel_requests limits in the proxy rate limiter, addressing three TODOs in parallel_request_limiter.py.

Model-level: Configurable via key metadata (model_max_parallel_requests) or model_max_budget
User-level: New user_max_parallel_requests field on UserAPIKeyAuth
Team-level: New team_max_parallel_requests field on UserAPIKeyAuth

Changes

File	Change
`auth_utils.py`	Added `get_key_model_max_parallel_requests()` helper
`_types.py`	Added `user_max_parallel_requests` and `team_max_parallel_requests` fields
`parallel_request_limiter.py`	Wired all 3 TODO locations (v1 limiter)
`parallel_request_limiter_v3.py`	Added model-level support to v3 limiter
`test_max_parallel_requests.py`	10 unit tests

Documentation

📄 Model-Specific Max Parallel Requests - New doc with:

Quick start guide
Key, team, and priority override examples
Mermaid flowchart showing priority resolution

Config Examples

Model-level (via API key metadata):

{
  "model_max_parallel_requests": {
    "gpt-4": 5,
    "gpt-3.5-turbo": 20
  }
}

User-level: Set user_max_parallel_requests on user record
Team-level: Set team_max_parallel_requests on team record

Test plan

Unit tests pass (10/10)
v3 limiter support added
Documentation added
CI passes

Closes #17824

- Add `get_key_model_max_parallel_requests()` helper in auth_utils.py - Add `user_max_parallel_requests` and `team_max_parallel_requests` fields to UserAPIKeyAuth - Wire model-level limits in parallel_request_limiter.py (line 323 TODO) - Wire user-level limits in parallel_request_limiter.py (line 366 TODO) - Wire team-level limits in parallel_request_limiter.py (line 394 TODO) - Add 10 unit tests for helper function and type fields Closes BerriAI#17824

vercel · 2025-12-11T20:57:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
litellm	Error		Dec 13, 2025 11:27pm

krrishdholakia · 2025-12-12T12:02:47Z

@peterkc can you add a loom of key model max parallel requests working as expected?

JiangJiaWei1103

Over LGTM. I'll test this integration in our use case. Thanks!

peterkc · 2025-12-13T20:32:58Z

@krrishdholakia Happy to add a Loom demo! A couple of quick clarifications to make sure I cover what you're looking for:

v3 Limiter Support
While testing, I noticed the PR modifies parallel_request_limiter.py (v1), but the default is v3 (parallel_request_limiter_v3.py). I've drafted a small patch (~15 lines) to port model-level max_parallel_requests to v3 as well. Should I:

Include the v3 port in this PR? (Recommended—ensures feature works out of the box)
Demo with v1 only using LEGACY_MULTI_INSTANCE_RATE_LIMITING=true?

Demo Scope
Planning to show:

Key-level model limits (metadata.model_max_parallel_requests)
Model isolation (GPT-4 limit=2 doesn't affect GPT-3.5 limit=5)
429 responses when limits are exceeded

Want me to also include team-level limits and the priority override (key > team)?

- Port model-level max_parallel_requests support to v3 rate limiter - Add documentation with mermaid priority flowchart - Covers key-level, team-level, and priority override patterns

vercel bot had a problem deploying to Preview December 11, 2025 20:58 Failure

peterkc marked this pull request as ready for review December 12, 2025 01:29

krrishdholakia added the awaiting: user response label Dec 12, 2025

JiangJiaWei1103 reviewed Dec 13, 2025

View reviewed changes

feat(proxy): add model max_parallel_requests to v3 limiter + docs

6851fd9

- Port model-level max_parallel_requests support to v3 rate limiter - Add documentation with mermaid priority flowchart - Covers key-level, team-level, and priority override patterns

vercel bot had a problem deploying to Preview December 13, 2025 23:27 Failure

peterkc marked this pull request as draft December 13, 2025 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(proxy): add max_parallel_requests for model/user/team levels #17839

feat(proxy): add max_parallel_requests for model/user/team levels #17839

peterkc commented Dec 11, 2025 •

edited

Loading

Uh oh!

vercel bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

krrishdholakia commented Dec 12, 2025

Uh oh!

JiangJiaWei1103 left a comment

Uh oh!

peterkc commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat(proxy): add max_parallel_requests for model/user/team levels #17839

Are you sure you want to change the base?

feat(proxy): add max_parallel_requests for model/user/team levels #17839

Conversation

peterkc commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Documentation

Config Examples

Test plan

Uh oh!

vercel bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krrishdholakia commented Dec 12, 2025

Uh oh!

JiangJiaWei1103 left a comment

Choose a reason for hiding this comment

Uh oh!

peterkc commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peterkc commented Dec 11, 2025 •

edited

Loading

vercel bot commented Dec 11, 2025 •

edited

Loading