Skip to content

Add support for VLMs in vLLM#1099

Open
tengomucho wants to merge 4 commits intomainfrom
on-vlm-vllm
Open

Add support for VLMs in vLLM#1099
tengomucho wants to merge 4 commits intomainfrom
on-vlm-vllm

Conversation

@tengomucho
Copy link
Copy Markdown
Collaborator

What does this PR do?

Enable support for VLMs in vLLM, and add a test to show functionality.

This will be called by vLLM integration too.
This is to prepare for the VLM support.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial vision-language model (VLM) support to the Optimum Neuron vLLM integration by introducing a multimodal runner path and extending the vLLM service test client/tests to exercise image-conditioned generation.

Changes:

  • Add a VLM-specific vLLM runner that extracts pixel_values from vLLM multimodal features and passes them into Neuron models during prefill.
  • Introduce an OptimumNeuronModelForImageTextToText wrapper for vLLM model loading and forward/prefill APIs.
  • Extend vLLM service fixtures/client and add a new service test for generation with images.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/vllm/service/test_vllm_service_generate.py Refactors service fixture creation and adds a VLM image generation service test.
tests/fixtures/llm/vllm_service.py Extends the OpenAI-compatible test client with image-capable chat sampling helpers.
optimum/neuron/vllm/runner.py Adds a VLM runner variant and refactors token generation helpers for reuse.
optimum/neuron/vllm/model_loader.py Adds a VLM model wrapper and introduces a task mapping entry for image-text-to-text.
optimum/neuron/models/inference/backend/modules/decoder/vlm_decoder.py Exposes a reusable prepare_vlm_prefill() for chunked prefill image feature preparation.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

A custom runner and a model loader class are added to handle
vision-language models (VLMs) served through vLLM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants