Update GatherBlockQuantized to support 2-bits#28530
Open
HectorSVC wants to merge 5 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends GatherBlockQuantized to support 2-bit uint8-packed data, updating schema/docs, CPU/WebGPU implementations, and tests.
Changes:
- Adds 2-bit packing/dequantization support for CPU and WebGPU paths.
- Updates contrib operator schema and generated docs to include 2-bit uint8 support.
- Adds CPU and WebGPU tests for 2-bit uint8 no-zero-point cases.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
onnxruntime/contrib_ops/cpu/quantization/gather_block_quantized.cc |
Adds CPU 2-bit uint8 extraction and default zero-point handling. |
onnxruntime/contrib_ops/webgpu/quantization/gather_block_quantized.cc |
Adds WebGPU shader logic and shape handling for 2-bit packed uint8 data. |
onnxruntime/contrib_ops/webgpu/quantization/gather_block_quantized.h |
Expands WebGPU bits validation to include 2. |
onnxruntime/core/graph/contrib_ops/contrib_defs.cc |
Updates schema documentation and zero-point shape inference for packed components. |
docs/ContribOperators.md |
Updates public operator documentation for 2-bit support. |
onnxruntime/test/contrib_ops/gather_block_quantized_op_test.cc |
Adds 2-bit packing helper logic and new CPU/WebGPU test cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
reviewed
May 20, 2026
Contributor
tianleiwu
left a comment
There was a problem hiding this comment.
Review Summary
The core zero-point indexing issues from the prior review iteration have been properly fixed in both CPU and WebGPU paths. The 2-bit extraction logic and the scale_row / q_in_row decomposition for zero-point addressing are correct.
Two suggestions:
- Inconsistent
is_int8vsis_uint8guards — The 2-bit code paths inComputeInternalmixis_int8(which covers both INT8 and UINT8) andis_uint8inconsistently. Since bits==2 is only valid for uint8, usingis_uint8throughout would make intent clearer. - Missing test coverage for 2-bit zero_points — Both new tests only exercise the default-zero-point path. The 2-bit zero-point unpacking logic (the most complex new code) has no dedicated test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Update GatherBlockQuantized to support 2-bits.
Updated op schema, implemented the CPU and WebGPU EP.
This helps to make the model smaller.