WIP: Add MoE config support and correctness fixes for GLM MoE training by tyler-griggs · Pull Request #1210 · NovaSky-AI/SkyRL

tyler-griggs · 2026-02-25T05:55:30Z

Summary

Lays the groundwork for training GLM MoE models (30B-350B) on SkyRL with the Megatron backend. Based on a comprehensive gap analysis between SkyRL and slime (see gaps/MASTER_GAP_ANALYSIS.md).

MoE Config Fields (TODO-1)

Exposes 5 MoE runtime config flags as first-class MegatronConfig dataclass fields: moe_token_dispatcher_type, moe_router_load_balancing_type, moe_grouped_gemm, moe_router_score_function, moe_router_enable_expert_bias
Replaces 2 hardcoded values in megatron_worker.py with config-driven values
Advanced flags continue to work through the existing transformer_config_kwargs pass-through
AutoBridge auto-detects most architecture flags (num_experts, topk, etc.) from HF config.json

GLM-4.7-Flash Bridge Registration (TODO-2)

Registers Glm4MoeLiteForCausalLM as a trivial DeepSeekV3Bridge subclass (same approach as slime — the architecture is identical)
Enables AutoBridge.from_hf_pretrained() to handle GLM-4.7-Flash models
Even the latest Megatron-Bridge HEAD does not include this registration, so we add it ourselves

Megatron-Bridge Bump

Bumps megatron-bridge from 04e370ee (Jan 14) to b058b662 (HEAD, +252 commits)
Includes: DeepSeek-V3 H100 large-scale config fixes, num_query_groups mapping fix, MoE FlexDispatcher backend fix, memory savings for MoE param_l2_norm

Correctness Fixes (TODO-3)

C1: Set config.grad_scale_func = optimizer.scale_loss for proper mixed-precision loss scaling (latent bug, no-op with bf16 but needed for fp16/MoE aux losses)
C4: Vary random seed by PP rank (seed + 100 * pp_rank) to decorrelate dropout masks across pipeline stages
C5: Add pause_generation/resume_generation for non-colocated weight sync to prevent inference engines from seeing partially-updated weights

GLM-4.7-Flash Example Config (TODO-4)

YAML config for GRPO training with Megatron (TP=2, EP=8, sigmoid routing, expert bias)
Launch script following existing conventions
Model: zai-org/GLM-4.7-Flash (64 routed experts, 4 active per token, ~3B active params)

Remaining Blockers for Full RL Loop

transformers: May need >=5.0 for Glm4MoeLiteForCausalLM class (need to test if trust_remote_code=True works on 4.x)
vLLM: Needs nightly (>=0.14) for GLM inference engine support (only blocks rollout generation, not Megatron training)

GPU Validation Needed

Smoke test: GLM-4.7-Flash initializes correctly with new MoE flags via AutoBridge + bridge registration
Weight sync round-trip: Megatron → HF export → vLLM load for MoE model
E2E training: 1-2 steps of GRPO on a small MoE model
Pause/resume weight sync: verify non-colocated path works end-to-end
Megatron-bridge bump: verify no regressions on existing models (Qwen, Llama, Moonlight)

Test plan

CPU unit tests for config dataclass fields and defaults (6 tests)
CPU unit tests for weight sync pause/resume lifecycle (3 tests)
All changes verified by independent review sub-agents
GPU CI: MegatronModelWrapper init with grad_scale_func
GPU CI: seed variation with PP>1
GPU E2E: MoE model training loop

Expose 5 MoE runtime config flags as first-class fields on MegatronConfig: - moe_token_dispatcher_type (replaces hardcoded "alltoall") - moe_router_load_balancing_type (replaces hardcoded "none") - moe_grouped_gemm (enables fused expert computation) - moe_router_score_function (sigmoid for GLM/DeepSeek-V3 models) - moe_router_enable_expert_bias (expert bias for load balancing) Correctness fixes: - Set config.grad_scale_func = optimizer.scale_loss on the Megatron model config for proper mixed-precision loss scaling (latent bug, no-op with bf16 default but needed for fp16/MoE aux losses) - Vary random seed by PP rank (seed + 100 * pp_rank) to decorrelate dropout masks across pipeline stages - Add pause_generation/resume_generation for non-colocated weight sync to prevent inference engines from seeing partially-updated weights Adds GLM4.7-30B-A3B MoE example config and launch script. Adds CPU unit tests for config fields and weight sync lifecycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Register Glm4MoeLiteForCausalLM as a trivial DeepSeekV3Bridge subclass (same approach as slime: the architecture is identical). This allows AutoBridge.from_hf_pretrained() to handle GLM-4.7-Flash models. - Bump megatron-bridge from 04e370ee (Jan 14) to b058b662 (HEAD, 252 commits). Includes DeepSeek-V3 H100 large-scale config fixes, num_query_groups mapping fix, MoE FlexDispatcher backend fix, and memory savings for MoE param_l2_norm. - Fix GLM config: correct model ID (zai-org/GLM-4.7-Flash), correct expert counts (64 routed experts, 4 active per token). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces significant enhancements for training Mixture-of-Experts (MoE) models with the Megatron backend. The changes are well-structured, including new MoE configuration options, important correctness fixes for mixed-precision training and distributed seeding, and safer weight synchronization for non-colocated setups. The addition of a new example configuration and comprehensive unit tests is commendable. My main feedback is a minor maintainability suggestion for the example shell script to reduce configuration duplication.

gemini-code-assist · 2026-02-25T18:07:07Z

skyrl-train/examples/megatron/run_megatron_grpo_glm4_7_30b.sh

+  trainer.policy.megatron_config.moe_token_dispatcher_type=$MOE_TOKEN_DISPATCHER \
+  trainer.policy.megatron_config.moe_router_load_balancing_type=$MOE_ROUTER_LB \
+  trainer.policy.megatron_config.moe_grouped_gemm=$MOE_GROUPED_GEMM \
+  trainer.policy.megatron_config.moe_router_score_function=$MOE_ROUTER_SCORE_FN \
+  trainer.policy.megatron_config.moe_router_enable_expert_bias=$MOE_ROUTER_EXPERT_BIAS \
+  trainer.ref.megatron_config.moe_token_dispatcher_type=$MOE_TOKEN_DISPATCHER \
+  trainer.ref.megatron_config.moe_router_load_balancing_type=$MOE_ROUTER_LB \
+  trainer.ref.megatron_config.moe_grouped_gemm=$MOE_GROUPED_GEMM \
+  trainer.ref.megatron_config.moe_router_score_function=$MOE_ROUTER_SCORE_FN \
+  trainer.ref.megatron_config.moe_router_enable_expert_bias=$MOE_ROUTER_EXPERT_BIAS \


These MoE configuration parameters are duplicated for both the policy and ref models. This is a maintainability risk, as any changes to the policy's MoE config must be manually mirrored for the reference model. A mismatch could lead to subtle bugs due to model inconsistency.

Consider leveraging Hydra's interpolation, as done in the corresponding YAML configuration (skyrl/train/config/examples/glm4_7_30b_moe_grpo.yaml), to ensure the reference model's configuration is always in sync with the policy model's. If this script is intended to override the YAML, you could remove the trainer.ref.* overrides and let them be inherited from the policy config via interpolation in the base YAML config.

tyler-griggs · 2026-02-25T18:07:26Z

@claude review

Set provider.moe_grouped_gemm unconditionally instead of guarding with a truthiness check. The previous `if megatron_config.moe_grouped_gemm:` would skip the assignment when the value is False, making it impossible to override a provider's auto-detected True default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tyler-griggs · 2026-02-25T18:42:03Z

Superseded by 4 focused PRs: #1212 (correctness fixes), #1213 (MoE config fields), #1214 (GLM bridge + megatron-bridge bump), #1215 (GLM example config). See each PR for details.

vercel bot deployed to Preview February 25, 2026 05:59 View deployment

vercel bot deployed to Preview February 25, 2026 17:56 View deployment

tyler-griggs marked this pull request as ready for review February 25, 2026 18:04

gemini-code-assist bot reviewed Feb 25, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

vercel bot deployed to Preview February 25, 2026 18:24 View deployment

tyler-griggs closed this Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add MoE config support and correctness fixes for GLM MoE training#1210

WIP: Add MoE config support and correctness fixes for GLM MoE training#1210
tyler-griggs wants to merge 3 commits intomainfrom
tgriggs/glm-moe-support

tyler-griggs commented Feb 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 25, 2026

Uh oh!

tyler-griggs commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

tyler-griggs commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tyler-griggs commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

MoE Config Fields (TODO-1)

GLM-4.7-Flash Bridge Registration (TODO-2)

Megatron-Bridge Bump

Correctness Fixes (TODO-3)

GLM-4.7-Flash Example Config (TODO-4)

Remaining Blockers for Full RL Loop

GPU Validation Needed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

tyler-griggs commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

tyler-griggs commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tyler-griggs commented Feb 25, 2026 •

edited

Loading