add mlu rms_norm; optimize fa2 call;rename platform to ; #698

Wq-dd · 2025-12-31T10:38:01Z

No description provided.

gemini-code-assist · 2025-12-31T10:38:25Z

Summary of Changes

Hello @Wq-dd, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance and clarity of MLU (Cambricon MLU) operations within the system. It achieves this by integrating a dedicated MLU-optimized RMS normalization, fine-tuning Flash Attention 2 calls for better efficiency, and standardizing the platform's naming convention. These changes collectively contribute to a more optimized and maintainable codebase for MLU-accelerated workloads.

Highlights

MLU RMS Normalization: Introduced a new MLU-optimized Root Mean Square (RMS) normalization implementation, mlu_rms_norm, to leverage specialized hardware capabilities for improved performance. This includes a new RMSWeightTemplate and integration into the platform's registry.
Flash Attention 2 (FA2) Optimization: Optimized existing Flash Attention 2 calls for MLU devices by adjusting sequence length handling parameters and explicitly setting the compute data type to torch.half, aiming for enhanced efficiency during attention computations.
Platform Renaming and Consistency: Refactored the platform identifier from mlu to cambricon_mlu across configuration files, scripts, and internal platform checks to ensure consistent naming and better clarity for MLU-specific components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for MLU RMS norm, optimizes FlashAttention calls for MLU, and renames the mlu platform to cambricon_mlu. The changes are generally moving in the right direction. However, I've identified a few areas for improvement. There's a significant code duplication issue with the new norm_template.py file, which should be addressed by reusing the existing common template. Additionally, I've pointed out several instances of duplicated logic that can be refactored to enhance code clarity and maintainability. Please see the detailed comments for specific suggestions.

gemini-code-assist · 2025-12-31T10:39:57Z

lightx2v_platform/ops/norm/norm_template.py

+    return DTYPE_MAP[RUNNING_FLAG]
+
+
+class RMSWeightTemplate(metaclass=ABCMeta):


This RMSWeightTemplate class appears to be a duplicate of the one defined in lightx2v/common/ops/norm/rms_norm_weight.py. To improve maintainability and avoid code duplication, this new file should be removed. The MluRmsNormWeight class in lightx2v_platform/ops/norm/cambricon_mlu/mlu_rms_norm.py should instead import the template from the common location, for example: from lightx2v.common.ops.norm.rms_norm_weight import RMSWeightTemplate.

gemini-code-assist · 2025-12-31T10:39:57Z

lightx2v/models/networks/wan/weights/matrix_game2/transformer_weights.py

        if self.config.get("sf_config", False):
-            self.attn_rms_type = "self_forcing"
+            self.attn_rms_type = self.config.get("rms_type", "self_forcing")
        else:
-            self.attn_rms_type = "sgl-kernel"
+            self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")


The logic for setting self.attn_rms_type is duplicated across two branches of the if/else statement. This can be simplified to improve readability and reduce redundancy.

default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel" self.attn_rms_type = self.config.get("rms_type", default_rms_type)

gemini-code-assist · 2025-12-31T10:39:57Z

lightx2v/models/networks/wan/weights/transformer_weights.py

        if self.config.get("sf_config", False):
-            self.attn_rms_type = "self_forcing"
+            self.attn_rms_type = self.config.get("rms_type", "self_forcing")
        else:
-            self.attn_rms_type = "sgl-kernel"
+            self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")


The logic for setting self.attn_rms_type is duplicated in both the if and else blocks. This can be refactored to be more concise and easier to maintain by determining the default value first and then calling self.config.get once.

default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel" self.attn_rms_type = self.config.get("rms_type", default_rms_type)

gemini-code-assist · 2025-12-31T10:39:57Z

lightx2v/models/networks/wan/weights/transformer_weights.py

        if self.config.get("sf_config", False):
-            self.attn_rms_type = "self_forcing"
+            self.attn_rms_type = self.config.get("rms_type", "self_forcing")
        else:
-            self.attn_rms_type = "sgl-kernel"
+            self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")


This if/else block contains duplicated logic for getting the rms_type from the configuration. You can simplify this by defining the default value based on sf_config and then calling config.get only once. This will make the code cleaner and more maintainable.

default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel" self.attn_rms_type = self.config.get("rms_type", default_rms_type)

gemini-code-assist · 2025-12-31T10:39:57Z

lightx2v_platform/ops/norm/cambricon_mlu/mlu_rms_norm.py

+    def apply(self, input_tensor):
+        if self.sensitive_layer_dtype != self.infer_dtype:
+            output = tmo.fused_rms_norm(
+                input_tensor.float(),
+                residual=None,
+                gamma=self.weight,
+                beta=None,
+                bias=None,
+                eps=self.eps,
+                store_output_before_norm=False,
+                quant_scale=None,
+                out=None,
+            ).to(self.infer_dtype)
+        else:
+            output = tmo.fused_rms_norm(
+                input_tensor,
+                residual=None,
+                gamma=self.weight,
+                beta=None,
+                bias=None,
+                eps=self.eps,
+                store_output_before_norm=False,
+                quant_scale=None,
+                out=None,
+            )
+        return output


The apply method has significant code duplication between the if and else blocks. The call to tmo.fused_rms_norm is almost identical in both cases. This can be refactored to remove redundancy, making the code cleaner and easier to maintain.

def apply(self, input_tensor): input_data = input_tensor if self.sensitive_layer_dtype != self.infer_dtype: input_data = input_tensor.float() output = tmo.fused_rms_norm( input_data, residual=None, gamma=self.weight, beta=None, bias=None, eps=self.eps, store_output_before_norm=False, quant_scale=None, out=None, ) if self.sensitive_layer_dtype != self.infer_dtype: output = output.to(self.infer_dtype) return output

add mlu rms_norm; optimize fa2 call;rename platform to ;

b2ae8fe

Wq-dd requested a review from helloyongyang December 31, 2025 10:38

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

helloyongyang approved these changes Jan 5, 2026

View reviewed changes

helloyongyang merged commit 31b27ee into main Jan 5, 2026
2 checks passed

helloyongyang deleted the mlu_opt branch January 5, 2026 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add mlu rms_norm; optimize fa2 call;rename platform to ; #698

add mlu rms_norm; optimize fa2 call;rename platform to ; #698

Uh oh!

Wq-dd commented Dec 31, 2025

Uh oh!

gemini-code-assist bot commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return DTYPE_MAP[RUNNING_FLAG]


		class RMSWeightTemplate(metaclass=ABCMeta):

add mlu rms_norm; optimize fa2 call;rename platform to ; #698

add mlu rms_norm; optimize fa2 call;rename platform to ; #698

Uh oh!

Conversation

Wq-dd commented Dec 31, 2025

Uh oh!

gemini-code-assist bot commented Dec 31, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants