Skip to content

fix(megatron): align moe_router_load_balancing_type to single-string Literal (#8480)#9600

Closed
yuchenwang3 wants to merge 1 commit into
modelscope:mainfrom
yuchenwang3:fix/moe-load-balancing-type-annotation
Closed

fix(megatron): align moe_router_load_balancing_type to single-string Literal (#8480)#9600
yuchenwang3 wants to merge 1 commit into
modelscope:mainfrom
yuchenwang3:fix/moe-load-balancing-type-annotation

Conversation

@yuchenwang3

Copy link
Copy Markdown
Contributor

What

Fixes the type inconsistency reported in #8480.

moe_router_load_balancing_type was declared as Optional[List[str]] in swift/megatron/arguments/megatron_args.py, but everywhere else in the stack it is a single string:

  • mcore_bridge ModelConfig.moe_router_load_balancing_type is Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn', 'none'].
  • swift/megatron/trainers/base.py compares it as a string (config.moe_router_load_balancing_type == 'aux_loss', etc.).

Because the arg was a List[str], --moe_router_load_balancing_type none (the documented way to disable load balancing) parsed to ['none'], which then never matches the string comparisons downstream — so the disable path and the metric logging were effectively broken, and the type was inconsistent with ModelConfig.

Change

Make the arg a string Literal matching mcore_bridge's ModelConfig (keeping Optional/None for the "not set → inherit from config.json" default):

moe_router_load_balancing_type: Optional[Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn',
                                                 'none']] = None

Now --moe_router_load_balancing_type none becomes the string 'none' and disables load balancing as documented, and the value is type-consistent with ModelConfig and the string comparisons in base.py.

Note

Megatron-core's own transformer_config.py types this as Union[str, List[str]] (it can take a list of per-something types), but mcore_bridge.ModelConfig — which ms-swift actually feeds — exposes only the single-string form, and ms-swift has no list-handling code for it. So aligning to the single-string Literal matches the real stack behavior and resolves #8480. If exposing the list form is ever wanted, that would be a separate feature (arg + ModelConfig + base.py all need to handle lists).

…Literal

It was declared Optional[List[str]] but mcore_bridge ModelConfig and the
comparisons in trainers/base.py treat it as a single string, so
--moe_router_load_balancing_type none parsed to ['none'] and never matched
downstream (the documented disable path was broken). Match the ModelConfig
Literal. Fixes modelscope#8480.

Signed-off-by: yuchenwang3 <eang333cms@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refines the type hint for moe_router_load_balancing_type in megatron_args.py to use a specific Literal instead of a generic list of strings. The feedback suggests formatting this long type hint across multiple lines to improve readability and adhere to PEP 8 line length guidelines.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +589 to +590
moe_router_load_balancing_type: Optional[Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn',
'none']] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and adherence to PEP 8 style guidelines for long lines, consider formatting this long Literal type hint across multiple lines. This makes it easier to read and maintain.

Suggested change
moe_router_load_balancing_type: Optional[Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn',
'none']] = None
moe_router_load_balancing_type: Optional[Literal[
'aux_loss',
'seq_aux_loss',
'global_aux_loss',
'sinkhorn',
'none',
]] = None
References
  1. PEP 8 suggests limiting all lines to a maximum of 79 characters (or 99 for some projects) for better readability. Long lines can be broken over multiple lines by wrapping expressions in parentheses, brackets, or braces. (link)

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

@yuchenwang3

Copy link
Copy Markdown
Contributor Author

Thanks @Jintao-Huang — I see #130 handles moe_router_load_balancing_type compatibility on the mcore-bridge side.

This PR is the ms-swift-arg-side of the same issue: the user-facing arg was typed Optional[List[str]], while ModelConfig and trainers/base.py treat it as a single string, so --moe_router_load_balancing_type none parsed to ['none'] and never matched the string comparisons (the documented disable path was broken). If mcore-bridge#130 already normalizes the arg type end-to-end, I'm happy to close this; otherwise it complements #130 on the ms-swift side. Whichever you prefer.

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

Hi! Already fixed in this PR:

#9603

@yuchenwang3

Copy link
Copy Markdown
Contributor Author

Thanks @Jintao-Huang! Keeping the Union[str, List[str]] form and normalizing the single-element list (plus fixing the base.py comparison and the matching moe_aux_loss_coeff list) is cleaner than the collapse-to-string I proposed here — it preserves multi-strategy support, which mine would have dropped. Glad it's fixed in #9603.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

moe_router_load_balancing_type has inconsistent type declarations across different code locations

2 participants