Skip to content

Comments

feat: Fix vLLM placement group conflicts in Ray clusters and add local mode…#669

Draft
ffrujeri wants to merge 2 commits intomainfrom
ffrujeri/local-vllm-model-placement-groups-fix
Draft

feat: Fix vLLM placement group conflicts in Ray clusters and add local mode…#669
ffrujeri wants to merge 2 commits intomainfrom
ffrujeri/local-vllm-model-placement-groups-fix

Conversation

@ffrujeri
Copy link
Contributor

What does this PR do?

Fixes vLLM placement group conflicts in Ray clusters and adds support for local model paths.

This PR patches vLLM's v1 engine to handle multiple Ray placement groups, preventing crashes when running multiple vLLM instances in the same cluster. It also adds support for using locally stored models instead of always downloading from HuggingFace.

Issues

List issues that this PR closes (syntax):

Changes in this PR

  • vLLM Placement Group Patch: _patch_vllm_placement_group_filter()

    • Filters out placement group node resource keys (e.g., node:IP_group_N_hash)
    • Allows multiple vLLM instances to coexist in the same Ray cluster
    • Adds unique timestamp-based suffixes to avoid placement group name conflicts
    • Includes comprehensive logging for debugging placement group creation
  • Local Model Support: Modified download_model()

    • Checks if model path exists locally before attempting HuggingFace download
    • Skips download step for local models, improving startup time
  • Debugging Infrastructure: _cleanup_stale_placement_groups()

    • Logs existing placement groups when debug=True
    • Helps diagnose placement group conflicts and resource allocation issues
  • Import Updates

    • Added list_placement_groups from ray.util.state for placement group inspection

…l support.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
@ffrujeri ffrujeri changed the title Fix vLLM placement group conflicts in Ray clusters and add local mode… feat: Fix vLLM placement group conflicts in Ray clusters and add local mode… Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants