Skip to content

support swiglu a4w4 moe#674

Open
XiaobingSuper wants to merge 5 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/swiglu_moe
Open

support swiglu a4w4 moe#674
XiaobingSuper wants to merge 5 commits intoROCm:mainfrom
XiaobingSuper:xiaobing/swiglu_moe

Conversation

@XiaobingSuper
Copy link
Copy Markdown
Contributor

@XiaobingSuper XiaobingSuper commented Apr 30, 2026

Motivation

Add ATOM-side support for GPT-OSS SwiGLU A4W4/MXFP4 MoE weight layout selection.
The AITER PR for GPT-OSS SwiGLU MoE supports both the legacy A16W4-style layout and a generic MXFP4 preshuffled layout. ATOM needs to prepare GPT-OSS MoE weights/scales in the matching layout before dispatching into AITER.

Technical Details

  • Add GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT switch in ATOM.
    • Default is off (0) to preserve the original/legacy path.
    • Set to 1 to use the new generic MXFP4 layout.
  • Add generic MXFP4 scale shuffle helper.
    • Handles scale tensors by flattening the combined [expert, row] axes before e8m0_shuffle.
    • Restores the original scale shape after shuffling.
  • Update GPT-OSS MXFP4 SwiGLU weight loading path.
    • Legacy mode keeps the original A16W4-style shuffle_weight_a16w4 / shuffle_scale_a16w4 path.
    • Generic mode uses shuffle_weights(...) for w13/w2 and the new generic scale shuffle helper.

Test Plan

  • Validate legacy/default path:
    • Run without setting GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT.
    • Confirm original GPT-OSS SwiGLU MXFP4 behavior is preserved.
  • Validate generic path:
    • Run with GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT=1.
    • Confirm ATOM prepares weights/scales in the generic MXFP4 layout expected by AITER.
  • Run GPT-OSS MoE accuracy/performance smoke tests through the ATOM/vLLM benchmark path.

Notes

This PR only changes ATOM-side weight/scale/bias preparation. The kernel dispatch and tuned GPT-OSS FlyDSL/CK-Tile behavior are handled in the corresponding AITER PR.

Test Result

Submission Checklist

@XiaobingSuper XiaobingSuper force-pushed the xiaobing/swiglu_moe branch from 629d26c to 31b4b7a Compare May 6, 2026 11:57
@XiaobingSuper XiaobingSuper marked this pull request as ready for review May 6, 2026 12:34
Copilot AI review requested due to automatic review settings May 6, 2026 12:34
@XiaobingSuper XiaobingSuper requested a review from valarLip May 6, 2026 12:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an environment-variable-controlled switch in Mxfp4MoEMethod.process_weights_after_loading() to support an alternate (generic) preshuffle layout for MXFP4 (fp4x2) MoE layers using the SwiGLU activation, while preserving a legacy A16W4-style layout behind a fallback branch.

Changes:

  • Add GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT gate to choose between generic vs legacy SwiGLU shuffle paths.
  • Introduce _shuffle_generic_mxfp4_weight_scale() helper to shuffle MXFP4 weight scales for the generic preshuffle layout.
  • Update the SwiGLU weight/scale processing to use shuffle_weights() + the new scale shuffler when the generic path is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/model_ops/moe.py
Comment on lines +61 to +63
def _use_generic_swiglu_mxfp4_layout() -> bool:
return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"

Comment thread atom/model_ops/moe.py


def _use_generic_swiglu_mxfp4_layout() -> bool:
return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"
Comment thread atom/model_ops/moe.py
Comment on lines +894 to +901
if _use_generic_swiglu_mxfp4_layout():
# New GPT-OSS A4W4 Swiglu path: use the same generic preshuffle
# layout for bf16 and fp4x2 activations.
shuffle_weights(layer.w13_weight, layer.w2_weight)
shuffled_w13_scale, shuffled_w2_scale = (
_shuffle_generic_mxfp4_weight_scale(layer.w13_weight_scale),
_shuffle_generic_mxfp4_weight_scale(layer.w2_weight_scale),
)
Comment thread atom/model_ops/moe.py
Comment on lines +72 to +78
# Generic preshuffle packs the combined [expert, row] axis, not experts alone.
rows = 1
for dim in scale.shape[:-1]:
rows *= dim
return fp4_utils.e8m0_shuffle(scale.reshape(rows, scale.shape[-1])).reshape(
scale.shape
)
@XiaobingSuper XiaobingSuper force-pushed the xiaobing/swiglu_moe branch from 717a527 to 58fe8a3 Compare May 7, 2026 06:06
Copilot AI review requested due to automatic review settings May 8, 2026 01:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread atom/model_ops/moe.py


def _use_generic_swiglu_mxfp4_layout() -> bool:
return os.environ.get("GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT", "0") == "1"
Comment thread atom/model_ops/moe.py
Comment on lines +896 to +903
if _use_generic_swiglu_mxfp4_layout():
# New GPT-OSS A4W4 Swiglu path: use the same generic preshuffle
# layout for bf16 and fp4x2 activations.
shuffle_weights(layer.w13_weight, layer.w2_weight)
shuffled_w13_scale, shuffled_w2_scale = (
_shuffle_generic_mxfp4_weight_scale(layer.w13_weight_scale),
_shuffle_generic_mxfp4_weight_scale(layer.w2_weight_scale),
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants