Add TransformerEngine FP8 for image encoder by ybgao-nvidia · Pull Request #45 · CentML/vllm

ybgao-nvidia · 2026-02-10T23:03:59Z

This PR enables FP8 ViT attention using TransformerEngine with cuDNN backend.

It makes the following changes:

Add TE_FP8 as a new AttentionBackendEnum and register it in the CUDA platform's supported ViT backends and Qwen3-VL's allowed backends
Implement _forward_te_fp8 in MMEncoderAttention using TE's DotProductAttention with BSHD format, FP8 autocast via DelayedScaling, head-dimension padding (to multiples of 16), and seqlen bucketing to avoid cuDNN graph recompilation
Process images one-at-a-time in gpu_model_runner.py (_execute_encoder_one_by_one_eager) because TE does not support THD format for FP8 — we reinterpret each single-sequence THD tensor as BSHD with B=1, S=T to avoid expensive layout conversion
Add Transformer Engine build to the Docker image (cmake, cuda-toolkit-13-0, libcudnn9-dev, pybind11)

github-actions · 2026-02-10T23:04:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Yu Bo Gao and others added 7 commits February 5, 2026 16:35

wip

f342064

pad to avoid recompilation during eager

0fdaa97

cleanup

dd1abce

update docs

413d6c3

install dependencies

7075fb8

add cmake

53bf4d5

use 16 buckets

fb9336e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add TransformerEngine FP8 for image encoder#45

Add TransformerEngine FP8 for image encoder#45
ybgao-nvidia wants to merge 7 commits intoreduce-vit-kernel-gapsfrom
ybgao/reduce-vit-kernel-gaps

ybgao-nvidia commented Feb 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ybgao-nvidia commented Feb 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ybgao-nvidia commented Feb 10, 2026 •

edited by github-actions bot

Loading