Skip to content

Conversation

@Wq-dd
Copy link
Contributor

@Wq-dd Wq-dd commented Dec 31, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Wq-dd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance and clarity of MLU (Cambricon MLU) operations within the system. It achieves this by integrating a dedicated MLU-optimized RMS normalization, fine-tuning Flash Attention 2 calls for better efficiency, and standardizing the platform's naming convention. These changes collectively contribute to a more optimized and maintainable codebase for MLU-accelerated workloads.

Highlights

  • MLU RMS Normalization: Introduced a new MLU-optimized Root Mean Square (RMS) normalization implementation, mlu_rms_norm, to leverage specialized hardware capabilities for improved performance. This includes a new RMSWeightTemplate and integration into the platform's registry.
  • Flash Attention 2 (FA2) Optimization: Optimized existing Flash Attention 2 calls for MLU devices by adjusting sequence length handling parameters and explicitly setting the compute data type to torch.half, aiming for enhanced efficiency during attention computations.
  • Platform Renaming and Consistency: Refactored the platform identifier from mlu to cambricon_mlu across configuration files, scripts, and internal platform checks to ensure consistent naming and better clarity for MLU-specific components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Wq-dd Wq-dd requested a review from helloyongyang December 31, 2025 10:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for MLU RMS norm, optimizes FlashAttention calls for MLU, and renames the mlu platform to cambricon_mlu. The changes are generally moving in the right direction. However, I've identified a few areas for improvement. There's a significant code duplication issue with the new norm_template.py file, which should be addressed by reusing the existing common template. Additionally, I've pointed out several instances of duplicated logic that can be refactored to enhance code clarity and maintainability. Please see the detailed comments for specific suggestions.

return DTYPE_MAP[RUNNING_FLAG]


class RMSWeightTemplate(metaclass=ABCMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This RMSWeightTemplate class appears to be a duplicate of the one defined in lightx2v/common/ops/norm/rms_norm_weight.py. To improve maintainability and avoid code duplication, this new file should be removed. The MluRmsNormWeight class in lightx2v_platform/ops/norm/cambricon_mlu/mlu_rms_norm.py should instead import the template from the common location, for example: from lightx2v.common.ops.norm.rms_norm_weight import RMSWeightTemplate.

Comment on lines 192 to +195
if self.config.get("sf_config", False):
self.attn_rms_type = "self_forcing"
self.attn_rms_type = self.config.get("rms_type", "self_forcing")
else:
self.attn_rms_type = "sgl-kernel"
self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for setting self.attn_rms_type is duplicated across two branches of the if/else statement. This can be simplified to improve readability and reduce redundancy.

        default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel"
        self.attn_rms_type = self.config.get("rms_type", default_rms_type)

Comment on lines 229 to +232
if self.config.get("sf_config", False):
self.attn_rms_type = "self_forcing"
self.attn_rms_type = self.config.get("rms_type", "self_forcing")
else:
self.attn_rms_type = "sgl-kernel"
self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for setting self.attn_rms_type is duplicated in both the if and else blocks. This can be refactored to be more concise and easier to maintain by determining the default value first and then calling self.config.get once.

        default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel"
        self.attn_rms_type = self.config.get("rms_type", default_rms_type)

Comment on lines 396 to +399
if self.config.get("sf_config", False):
self.attn_rms_type = "self_forcing"
self.attn_rms_type = self.config.get("rms_type", "self_forcing")
else:
self.attn_rms_type = "sgl-kernel"
self.attn_rms_type = self.config.get("rms_type", "sgl-kernel")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block contains duplicated logic for getting the rms_type from the configuration. You can simplify this by defining the default value based on sf_config and then calling config.get only once. This will make the code cleaner and more maintainable.

        default_rms_type = "self_forcing" if self.config.get("sf_config", False) else "sgl-kernel"
        self.attn_rms_type = self.config.get("rms_type", default_rms_type)

Comment on lines +16 to +41
def apply(self, input_tensor):
if self.sensitive_layer_dtype != self.infer_dtype:
output = tmo.fused_rms_norm(
input_tensor.float(),
residual=None,
gamma=self.weight,
beta=None,
bias=None,
eps=self.eps,
store_output_before_norm=False,
quant_scale=None,
out=None,
).to(self.infer_dtype)
else:
output = tmo.fused_rms_norm(
input_tensor,
residual=None,
gamma=self.weight,
beta=None,
bias=None,
eps=self.eps,
store_output_before_norm=False,
quant_scale=None,
out=None,
)
return output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The apply method has significant code duplication between the if and else blocks. The call to tmo.fused_rms_norm is almost identical in both cases. This can be refactored to remove redundancy, making the code cleaner and easier to maintain.

    def apply(self, input_tensor):
        input_data = input_tensor
        if self.sensitive_layer_dtype != self.infer_dtype:
            input_data = input_tensor.float()

        output = tmo.fused_rms_norm(
            input_data,
            residual=None,
            gamma=self.weight,
            beta=None,
            bias=None,
            eps=self.eps,
            store_output_before_norm=False,
            quant_scale=None,
            out=None,
        )

        if self.sensitive_layer_dtype != self.infer_dtype:
            output = output.to(self.infer_dtype)

        return output

@helloyongyang helloyongyang merged commit 31b27ee into main Jan 5, 2026
2 checks passed
@helloyongyang helloyongyang deleted the mlu_opt branch January 5, 2026 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants