-
Notifications
You must be signed in to change notification settings - Fork 720
Implement 4over6 NVFP4 recipe #2972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zianglih
wants to merge
57
commits into
NVIDIA:main
Choose a base branch
from
zianglih:4over6
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
19b6b08
Initial implementation
zianglih 7b0b2d0
Make 4over6 compile time for dequant
zianglih 1e5b6ad
Expand 1d fwd+bwd test
zianglih 99660fc
Refactor
zianglih cb2e0a3
Clean up
zianglih 2c066f9
Clean up
zianglih 69e8f3a
Add gemm test
zianglih 009e651
Add more tests and fix offload
zianglih 3153fc3
Fix offload
zianglih e31b758
Clean up arg
zianglih fcd526c
Add more test
zianglih 100c378
Add more tests
zianglih 1c9f26b
Clean up test
zianglih 93fe922
Refactor cuh kernel impl
zianglih f4e4a4e
Further extract
zianglih b3f59ee
Clean up
zianglih 31decf9
Add recipe_id
zianglih 2fa6b8c
Fix failing unit tests
zianglih 7df2db0
Clean up test
zianglih ce85be2
Clean up
zianglih 1b68038
Refactor ref
zianglih bb722a3
Update comments and docs
zianglih fe18a1e
Drop unnecessary test_sanity workaround
zianglih 522e93e
Refactor `QuantizerRole`
zianglih 782b7ee
Allow separate recipe 4over6 config
zianglih d9cd12c
Support 2d
zianglih 708c1ec
Refactor 2d
zianglih 4d31f18
Clean up anti pattern
zianglih dfc15f2
Enforce 4over6 consistency
zianglih 9453670
Update comments
zianglih 6d871da
Update docs
zianglih f8338e8
Fix test
zianglih c9bc921
Drop test_fusible_ops
zianglih 00ba694
Revert "Drop test_fusible_ops"
zianglih 3252d4e
Refactor test_fusible_ops
zianglih 3f33c1d
Refactor ref and extend cpp test
zianglih 8607e03
Clean up cpp test
zianglih d3dbf34
Minor comment
zianglih 565f33f
Drop doc
zianglih 54b4da8
Explicit handle conditional smem buffer
zianglih fa09200
Further clean up
zianglih e57e8be
More templates
zianglih a1df319
Simplify cpp
zianglih 21720da
Drop write back lifting
zianglih b1d073a
Add MAE and dedicated fast math env var
zianglih 0392708
Harden cpp test
zianglih 0b77a37
Add warning and err fast math coverage
zianglih 81e579e
Fold test case and clean up cpp test
zianglih 1e311ef
Initial 448 vs 256 implementation
zianglih 38a1c4c
Use e4m3 max instead of boolean, more template
zianglih 3cdd9d9
Add benchmark script and minor optimization
zianglih 7deba75
Use standalone kernels
zianglih 93dbf2b
Use cp async
zianglih 8819d12
Add benchmark script
zianglih 24e417b
Minor fix after rebase
zianglih 472e5b8
Naming consistency
zianglih 83e2308
Remove 4over6 benchmark
zianglih File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is okay, but it would provide much more confidence if the NVFP4 quantization tests compared against a CPU reference impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended
tests/cpp/operator/test_cast_nvfp4_transpose.cucoverage in 3bb42b1.