Skip to content

Dsv4 moe flydsl #718

Open
amd-ruitang3 wants to merge 3 commits intomainfrom
dsv4_moe_flydsl_
Open

Dsv4 moe flydsl #718
amd-ruitang3 wants to merge 3 commits intomainfrom
dsv4_moe_flydsl_

Conversation

@amd-ruitang3
Copy link
Copy Markdown
Contributor

Motivation

use flydsl moe
based on zane_moe_flydsl

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings May 8, 2026 08:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates DeepSeek-V4’s MoE path to support flydsl-style MoE execution by introducing a gate layout mode (interleaved) and adjusting weight post-processing so the expected kernel layout is used, while also preventing incorrect fusion of shared experts when routed/shared quant dtypes differ.

Changes:

  • Add flydsl GateMode plumbing and set DeepSeek-V4 MoE to use INTERLEAVE gating.
  • Disable “fuse shared expert into routed” when routed vs shared quant dtypes don’t match, by explicitly comparing layer quant configs.
  • Add a gfx950-specific FP4 (per_1x32) weight/scale shuffle branch keyed off gate_mode, and forward gate_mode/swiglu_limit into the fused_moe(...) call (non-modular path).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
atom/models/deepseek_v4.py Sets MoE gate_mode to interleaved and refines shared-expert fusion enablement based on routed/shared quant dtype match.
atom/model_ops/moe.py Adds flydsl GateMode import, forwards gate_mode/swiglu_limit into the aiter MoE kernel (direct path), and introduces a gfx950+INTERLEAVE FP4 shuffle path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from aiter.dist.parallel_state import get_tensor_model_parallel_world_size
from aiter.ops.topk import top_k_per_row_decode, top_k_per_row_prefill
from aiter.ops.triton.fp8_mqa_logits import fp8_mqa_logits
from aiter.ops.flydsl.moe_common import GateMode
Comment thread atom/model_ops/moe.py
QuantizationConfig,
get_current_atom_config,
)
from aiter.ops.flydsl.moe_common import GateMode
Comment thread atom/model_ops/moe.py
Comment on lines 1092 to 1099
doweight_stage1=apply_router_weight_on_input,
hidden_pad=self.hidden_pad,
intermediate_pad=self.intermediate_pad,
bias1=layer.w13_bias,
bias2=layer.w2_bias,
swiglu_limit=getattr(layer, "swiglu_limit", 0.0),
gate_mode=getattr(layer, "gate_mode", GateMode.SEPARATED.value),
)
Comment thread atom/model_ops/moe.py
Comment on lines +904 to +911
elif (
get_gfx() == "gfx950"
and self.quant_type == QuantType.per_1x32
and self.quant_dtype == dtypes.fp4x2
and not self.use_triton
and getattr(layer, "gate_mode", None) == GateMode.INTERLEAVE.value
):
layer.w13_weight.data = shuffle_weight_a16w4(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants