Add support for VLMs in vLLM by tengomucho · Pull Request #1099 · huggingface/optimum-neuron

tengomucho · 2026-03-27T16:17:10Z

What does this PR do?

Enable support for VLMs in vLLM, and add a test to show functionality.

This will be called by vLLM integration too.

This is to prepare for the VLM support.

Copilot

Pull request overview

Adds initial vision-language model (VLM) support to the Optimum Neuron vLLM integration by introducing a multimodal runner path and extending the vLLM service test client/tests to exercise image-conditioned generation.

Changes:

Add a VLM-specific vLLM runner that extracts pixel_values from vLLM multimodal features and passes them into Neuron models during prefill.
Introduce an OptimumNeuronModelForImageTextToText wrapper for vLLM model loading and forward/prefill APIs.
Extend vLLM service fixtures/client and add a new service test for generation with images.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/vllm/service/test_vllm_service_generate.py	Refactors service fixture creation and adds a VLM image generation service test.
tests/fixtures/llm/vllm_service.py	Extends the OpenAI-compatible test client with image-capable chat sampling helpers.
optimum/neuron/vllm/runner.py	Adds a VLM runner variant and refactors token generation helpers for reuse.
optimum/neuron/vllm/model_loader.py	Adds a VLM model wrapper and introduces a task mapping entry for image-text-to-text.
optimum/neuron/models/inference/backend/modules/decoder/vlm_decoder.py	Exposes a reusable `prepare_vlm_prefill()` for chunked prefill image feature preparation.

tests/vllm/service/test_vllm_service_generate.py

optimum/neuron/vllm/model_loader.py

optimum/neuron/vllm/runner.py

HuggingFaceDocBuilderDev · 2026-03-27T16:23:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

A custom runner and a model loader class are added to handle vision-language models (VLMs) served through vLLM.

tengomucho added 2 commits March 27, 2026 13:22

refactor(vlm): extract prepare_vlm_prefill

4534323

This will be called by vLLM integration too.

refactor(vLLM): extract _execute_prefill method

6aca24f

This is to prepare for the VLM support.

tengomucho requested review from Copilot and dacorvo March 27, 2026 16:17

Copilot started reviewing on behalf of tengomucho March 27, 2026 16:17 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

tests/vllm/service/test_vllm_service_generate.py Show resolved Hide resolved

optimum/neuron/vllm/model_loader.py Show resolved Hide resolved

optimum/neuron/vllm/runner.py Show resolved Hide resolved

tengomucho added 2 commits March 27, 2026 21:07

feat(vllm): add runner for vision-language models served through vLLM

cd534bb

A custom runner and a model loader class are added to handle vision-language models (VLMs) served through vLLM.

test(vLLM): added sertvice test to verify VLM with image URL as input

b9ce37b

tengomucho force-pushed the on-vlm-vllm branch from 1361834 to b9ce37b Compare March 27, 2026 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for VLMs in vLLM#1099

Add support for VLMs in vLLM#1099
tengomucho wants to merge 4 commits intomainfrom
on-vlm-vllm

tengomucho commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tengomucho commented Mar 27, 2026

What does this PR do?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants