Dsv4 moe flydsl #718
Open
amd-ruitang3 wants to merge 3 commits intomainfrom
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates DeepSeek-V4’s MoE path to support flydsl-style MoE execution by introducing a gate layout mode (interleaved) and adjusting weight post-processing so the expected kernel layout is used, while also preventing incorrect fusion of shared experts when routed/shared quant dtypes differ.
Changes:
- Add flydsl
GateModeplumbing and set DeepSeek-V4 MoE to useINTERLEAVEgating. - Disable “fuse shared expert into routed” when routed vs shared quant dtypes don’t match, by explicitly comparing layer quant configs.
- Add a gfx950-specific FP4 (per_1x32) weight/scale shuffle branch keyed off
gate_mode, and forwardgate_mode/swiglu_limitinto thefused_moe(...)call (non-modular path).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
atom/models/deepseek_v4.py |
Sets MoE gate_mode to interleaved and refines shared-expert fusion enablement based on routed/shared quant dtype match. |
atom/model_ops/moe.py |
Adds flydsl GateMode import, forwards gate_mode/swiglu_limit into the aiter MoE kernel (direct path), and introduces a gfx950+INTERLEAVE FP4 shuffle path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from aiter.dist.parallel_state import get_tensor_model_parallel_world_size | ||
| from aiter.ops.topk import top_k_per_row_decode, top_k_per_row_prefill | ||
| from aiter.ops.triton.fp8_mqa_logits import fp8_mqa_logits | ||
| from aiter.ops.flydsl.moe_common import GateMode |
| QuantizationConfig, | ||
| get_current_atom_config, | ||
| ) | ||
| from aiter.ops.flydsl.moe_common import GateMode |
Comment on lines
1092
to
1099
| doweight_stage1=apply_router_weight_on_input, | ||
| hidden_pad=self.hidden_pad, | ||
| intermediate_pad=self.intermediate_pad, | ||
| bias1=layer.w13_bias, | ||
| bias2=layer.w2_bias, | ||
| swiglu_limit=getattr(layer, "swiglu_limit", 0.0), | ||
| gate_mode=getattr(layer, "gate_mode", GateMode.SEPARATED.value), | ||
| ) |
Comment on lines
+904
to
+911
| elif ( | ||
| get_gfx() == "gfx950" | ||
| and self.quant_type == QuantType.per_1x32 | ||
| and self.quant_dtype == dtypes.fp4x2 | ||
| and not self.use_triton | ||
| and getattr(layer, "gate_mode", None) == GateMode.INTERLEAVE.value | ||
| ): | ||
| layer.w13_weight.data = shuffle_weight_a16w4( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
use flydsl moe
based on zane_moe_flydsl
Technical Details
Test Plan
Test Result
Submission Checklist