support swiglu a4w4 moe by XiaobingSuper · Pull Request #674 · ROCm/ATOM

XiaobingSuper · 2026-04-30T03:44:07Z

Motivation

Add ATOM-side support for GPT-OSS SwiGLU A4W4/MXFP4 MoE weight layout selection.
The AITER PR for GPT-OSS SwiGLU MoE supports both the legacy A16W4-style layout and a generic MXFP4 preshuffled layout. ATOM needs to prepare GPT-OSS MoE weights/scales in the matching layout before dispatching into AITER.

Technical Details

Add GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT switch in ATOM.
- Default is off (0) to preserve the original/legacy path.
- Set to 1 to use the new generic MXFP4 layout.
Add generic MXFP4 scale shuffle helper.
- Handles scale tensors by flattening the combined [expert, row] axes before e8m0_shuffle.
- Restores the original scale shape after shuffling.
Update GPT-OSS MXFP4 SwiGLU weight loading path.
- Legacy mode keeps the original A16W4-style shuffle_weight_a16w4 / shuffle_scale_a16w4 path.
- Generic mode uses shuffle_weights(...) for w13/w2 and the new generic scale shuffle helper.

Test Plan

Validate legacy/default path:
- Run without setting GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT.
- Confirm original GPT-OSS SwiGLU MXFP4 behavior is preserved.
Validate generic path:
- Run with GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT=1.
- Confirm ATOM prepares weights/scales in the generic MXFP4 layout expected by AITER.
Run GPT-OSS MoE accuracy/performance smoke tests through the ATOM/vLLM benchmark path.

Notes

This PR only changes ATOM-side weight/scale/bias preparation. The kernel dispatch and tuned GPT-OSS FlyDSL/CK-Tile behavior are handled in the corresponding AITER PR.

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds an environment-variable-controlled switch in Mxfp4MoEMethod.process_weights_after_loading() to support an alternate (generic) preshuffle layout for MXFP4 (fp4x2) MoE layers using the SwiGLU activation, while preserving a legacy A16W4-style layout behind a fallback branch.

Changes:

Add GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT gate to choose between generic vs legacy SwiGLU shuffle paths.
Introduce _shuffle_generic_mxfp4_weight_scale() helper to shuffle MXFP4 weight scales for the generic preshuffle layout.
Update the SwiGLU weight/scale processing to use shuffle_weights() + the new scale shuffler when the generic path is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _use_generic_swiglu_mxfp4_layout() -> bool:
+    return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"
+




+def _use_generic_swiglu_mxfp4_layout() -> bool:
+    return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"


+            if _use_generic_swiglu_mxfp4_layout():
+                # New GPT-OSS A4W4 Swiglu path: use the same generic preshuffle
+                # layout for bf16 and fp4x2 activations.
+                shuffle_weights(layer.w13_weight, layer.w2_weight)
+                shuffled_w13_scale, shuffled_w2_scale = (
+                    _shuffle_generic_mxfp4_weight_scale(layer.w13_weight_scale),
+                    _shuffle_generic_mxfp4_weight_scale(layer.w2_weight_scale),
+                )


+    # Generic preshuffle packs the combined [expert, row] axis, not experts alone.
+    rows = 1
+    for dim in scale.shape[:-1]:
+        rows *= dim
+    return fp4_utils.e8m0_shuffle(scale.reshape(rows, scale.shape[-1])).reshape(
+        scale.shape
+    )


Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.



+def _use_generic_swiglu_mxfp4_layout() -> bool:
+    return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"


+            if _use_generic_swiglu_mxfp4_layout():
+                # New GPT-OSS A4W4 Swiglu path: use the same generic preshuffle
+                # layout for bf16 and fp4x2 activations.
+                shuffle_weights(layer.w13_weight, layer.w2_weight)
+                shuffled_w13_scale, shuffled_w2_scale = (
+                    _shuffle_generic_mxfp4_weight_scale(layer.w13_weight_scale),
+                    _shuffle_generic_mxfp4_weight_scale(layer.w2_weight_scale),
+                )


XiaobingSuper force-pushed the xiaobing/swiglu_moe branch from 629d26c to 31b4b7a Compare May 6, 2026 11:57

XiaobingSuper marked this pull request as ready for review May 6, 2026 12:34

Copilot AI review requested due to automatic review settings May 6, 2026 12:34

Copilot started reviewing on behalf of XiaobingSuper May 6, 2026 12:36 View session

XiaobingSuper requested a review from valarLip May 6, 2026 12:40

Copilot AI reviewed May 6, 2026

View reviewed changes

XiaobingSuper added 4 commits May 7, 2026 01:05

support swiglu a4w4 moe

15b7010

force bias to fp32

e2929e0

set GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT default off

7a8438c

clear code

58fe8a3

XiaobingSuper force-pushed the xiaobing/swiglu_moe branch from 717a527 to 58fe8a3 Compare May 7, 2026 06:06

Merge branch 'main' into xiaobing/swiglu_moe

09e04fb

Copilot AI review requested due to automatic review settings May 8, 2026 01:56

Copilot started reviewing on behalf of XiaobingSuper May 8, 2026 01:57 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support swiglu a4w4 moe#674

support swiglu a4w4 moe#674
XiaobingSuper wants to merge 5 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/swiglu_moe

XiaobingSuper commented Apr 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def _use_generic_swiglu_mxfp4_layout() -> bool:
		return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"

Conversation

XiaobingSuper commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Notes

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XiaobingSuper commented Apr 30, 2026 •

edited

Loading