Skip to content

Comments

perf: MXFP8 training with fp8_param_gather#1969

Draft
guyueh1 wants to merge 11 commits intoNVIDIA-NeMo:mainfrom
guyueh1:mxfp8_train
Draft

perf: MXFP8 training with fp8_param_gather#1969
guyueh1 wants to merge 11 commits intoNVIDIA-NeMo:mainfrom
guyueh1:mxfp8_train

Conversation

@guyueh1
Copy link
Contributor

@guyueh1 guyueh1 commented Feb 17, 2026

What does this PR do ?

Support MXFP8 training with fp8_param_gather=True to save memory

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

guyueh1 and others added 11 commits February 5, 2026 16:48
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: root <root@gpu-254.slurm-workers-slurm.slurm.svc.cluster.local>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: root <root@gpu-462.slurm-workers-slurm.slurm.svc.cluster.local>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 changed the title feat: MXFP8 training with fp8_param_gather perf: MXFP8 training with fp8_param_gather Feb 17, 2026
@guyueh1 guyueh1 self-assigned this Feb 17, 2026
@guyueh1 guyueh1 added super-v3 deepseek Related to deepseek 671b Performance Related to improving performance labels Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to deepseek 671b Performance Related to improving performance super-v3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant