fix(megatron): align moe_router_load_balancing_type to single-string Literal (#8480)#9600
Conversation
…Literal It was declared Optional[List[str]] but mcore_bridge ModelConfig and the comparisons in trainers/base.py treat it as a single string, so --moe_router_load_balancing_type none parsed to ['none'] and never matched downstream (the documented disable path was broken). Match the ModelConfig Literal. Fixes modelscope#8480. Signed-off-by: yuchenwang3 <eang333cms@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request refines the type hint for moe_router_load_balancing_type in megatron_args.py to use a specific Literal instead of a generic list of strings. The feedback suggests formatting this long type hint across multiple lines to improve readability and adhere to PEP 8 line length guidelines.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| moe_router_load_balancing_type: Optional[Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn', | ||
| 'none']] = None |
There was a problem hiding this comment.
For better readability and adherence to PEP 8 style guidelines for long lines, consider formatting this long Literal type hint across multiple lines. This makes it easier to read and maintain.
| moe_router_load_balancing_type: Optional[Literal['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn', | |
| 'none']] = None | |
| moe_router_load_balancing_type: Optional[Literal[ | |
| 'aux_loss', | |
| 'seq_aux_loss', | |
| 'global_aux_loss', | |
| 'sinkhorn', | |
| 'none', | |
| ]] = None |
References
- PEP 8 suggests limiting all lines to a maximum of 79 characters (or 99 for some projects) for better readability. Long lines can be broken over multiple lines by wrapping expressions in parentheses, brackets, or braces. (link)
|
Thanks @Jintao-Huang — I see #130 handles This PR is the ms-swift-arg-side of the same issue: the user-facing arg was typed |
|
Hi! Already fixed in this PR: |
|
Thanks @Jintao-Huang! Keeping the |
What
Fixes the type inconsistency reported in #8480.
moe_router_load_balancing_typewas declared asOptional[List[str]]inswift/megatron/arguments/megatron_args.py, but everywhere else in the stack it is a single string:mcore_bridgeModelConfig.moe_router_load_balancing_typeisLiteral['aux_loss', 'seq_aux_loss', 'global_aux_loss', 'sinkhorn', 'none'].swift/megatron/trainers/base.pycompares it as a string (config.moe_router_load_balancing_type == 'aux_loss', etc.).Because the arg was a
List[str],--moe_router_load_balancing_type none(the documented way to disable load balancing) parsed to['none'], which then never matches the string comparisons downstream — so the disable path and the metric logging were effectively broken, and the type was inconsistent withModelConfig.Change
Make the arg a string
Literalmatchingmcore_bridge'sModelConfig(keepingOptional/Nonefor the "not set → inherit fromconfig.json" default):Now
--moe_router_load_balancing_type nonebecomes the string'none'and disables load balancing as documented, and the value is type-consistent withModelConfigand the string comparisons inbase.py.Note
Megatron-core's own
transformer_config.pytypes this asUnion[str, List[str]](it can take a list of per-something types), butmcore_bridge.ModelConfig— which ms-swift actually feeds — exposes only the single-string form, and ms-swift has no list-handling code for it. So aligning to the single-stringLiteralmatches the real stack behavior and resolves #8480. If exposing the list form is ever wanted, that would be a separate feature (arg +ModelConfig+base.pyall need to handle lists).