Dsv4 moe flydsl by amd-ruitang3 · Pull Request #718 · ROCm/ATOM

amd-ruitang3 · 2026-05-08T08:02:07Z

Motivation

use flydsl moe
based on zane_moe_flydsl

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR updates DeepSeek-V4’s MoE path to support flydsl-style MoE execution by introducing a gate layout mode (interleaved) and adjusting weight post-processing so the expected kernel layout is used, while also preventing incorrect fusion of shared experts when routed/shared quant dtypes differ.

Changes:

Add flydsl GateMode plumbing and set DeepSeek-V4 MoE to use INTERLEAVE gating.
Disable “fuse shared expert into routed” when routed vs shared quant dtypes don’t match, by explicitly comparing layer quant configs.
Add a gfx950-specific FP4 (per_1x32) weight/scale shuffle branch keyed off gate_mode, and forward gate_mode/swiglu_limit into the fused_moe(...) call (non-modular path).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`atom/models/deepseek_v4.py`	Sets MoE `gate_mode` to interleaved and refines shared-expert fusion enablement based on routed/shared quant dtype match.
`atom/model_ops/moe.py`	Adds flydsl `GateMode` import, forwards `gate_mode`/`swiglu_limit` into the aiter MoE kernel (direct path), and introduces a gfx950+INTERLEAVE FP4 shuffle path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 from aiter.dist.parallel_state import get_tensor_model_parallel_world_size
 from aiter.ops.topk import top_k_per_row_decode, top_k_per_row_prefill
 from aiter.ops.triton.fp8_mqa_logits import fp8_mqa_logits
+from aiter.ops.flydsl.moe_common import GateMode


    QuantizationConfig,
    get_current_atom_config,
 )
+from aiter.ops.flydsl.moe_common import GateMode


                doweight_stage1=apply_router_weight_on_input,
                hidden_pad=self.hidden_pad,
                intermediate_pad=self.intermediate_pad,
                bias1=layer.w13_bias,
                bias2=layer.w2_bias,
+                swiglu_limit=getattr(layer, "swiglu_limit", 0.0),
+                gate_mode=getattr(layer, "gate_mode", GateMode.SEPARATED.value),
            )


+        elif (
+            get_gfx() == "gfx950"
+            and self.quant_type == QuantType.per_1x32
+            and self.quant_dtype == dtypes.fp4x2
+            and not self.use_triton
+            and getattr(layer, "gate_mode", None) == GateMode.INTERLEAVE.value
+        ):
+            layer.w13_weight.data = shuffle_weight_a16w4(


amd-ruitang3 added 3 commits May 8, 2026 02:38

[DSV4]moe flydsl

32611be

update

c273953

update

2b7915e

Copilot AI review requested due to automatic review settings May 8, 2026 08:02

Copilot started reviewing on behalf of amd-ruitang3 May 8, 2026 08:03 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dsv4 moe flydsl #718

Dsv4 moe flydsl #718
amd-ruitang3 wants to merge 3 commits intomainfrom
dsv4_moe_flydsl_

amd-ruitang3 commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amd-ruitang3 commented May 8, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants