Skip to content

Register GLM-4.7-Flash bridge and bump megatron-bridge#1214

Draft
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/glm47-bridge-registration
Draft

Register GLM-4.7-Flash bridge and bump megatron-bridge#1214
tyler-griggs wants to merge 1 commit intomainfrom
tgriggs/glm47-bridge-registration

Conversation

@tyler-griggs
Copy link
Member

@tyler-griggs tyler-griggs commented Feb 25, 2026

Summary

Enables Megatron backend support for GLM-4.7-Flash (zai-org/GLM-4.7-Flash) by registering its architecture with AutoBridge and bumping megatron-bridge to pick up 252 commits of fixes.

Bridge registration

  • Registers Glm4MoeLiteForCausalLM as a trivial DeepSeekV3Bridge subclass
  • GLM-4.7-Flash uses the identical architecture as DeepSeek-V3 (MLA + MoE), so the same bridge handles all weight conversion
  • Even the latest upstream Megatron-Bridge HEAD does not register this model type

megatron-bridge bump

  • 04e370ee (Jan 14 2026) → b058b662 (HEAD, +252 commits)
  • Key fixes: DeepSeek-V3 H100 large-scale config, num_query_groups mapping, MoE FlexDispatcher backend, memory savings for MoE param_l2_norm

Open with Devin

Register Glm4MoeLiteForCausalLM as a trivial DeepSeekV3Bridge subclass.
GLM-4.7-Flash (zai-org/GLM-4.7-Flash) uses the identical architecture
as DeepSeek-V3 (MLA + MoE), so the same bridge handles weight conversion.
Even the latest upstream Megatron-Bridge does not register this model.

Bump megatron-bridge from 04e370ee (Jan 14) to b058b662 (HEAD, +252
commits). Key fixes included:
- DeepSeek-V3 H100 large-scale config fix
- DeepSeek num_query_groups mapping correction
- MoE FlexDispatcher backend fix
- Memory savings for MoE param_l2_norm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tyler-griggs tyler-griggs force-pushed the tgriggs/glm47-bridge-registration branch from 04bd877 to d4e5ee9 Compare February 25, 2026 18:50
@tyler-griggs tyler-griggs marked this pull request as ready for review February 26, 2026 00:04
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables support for the GLM-4.7-Flash model by registering its architecture with AutoBridge and updating the megatron-bridge dependency to a more recent commit. The changes are clear and well-justified. The registration of Glm4MoeLiteForCausalLM by subclassing DeepSeekV3Bridge is a clean approach, leveraging the architectural similarities. The dependency bump incorporates numerous fixes from upstream. I have one suggestion regarding long-term maintainability of the model registration.

Comment on lines +51 to +68
try:
from megatron.bridge.models.conversion.model_bridge import MegatronModelBridge
from megatron.bridge.models.deepseek.deepseek_v3_bridge import DeepSeekV3Bridge
from megatron.bridge.models.mla_provider import MLAModelProvider
from megatron.core.models.gpt.gpt_model import GPTModel

@MegatronModelBridge.register_bridge(
source="Glm4MoeLiteForCausalLM",
target=GPTModel,
provider=MLAModelProvider,
model_type="glm4_moe_lite",
)
class _GLM47FlashBridge(DeepSeekV3Bridge):
pass

except ImportError:
pass # megatron-bridge not installed (e.g. CPU-only environment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This runtime registration is a good way to add support for the new model. To improve long-term maintainability and ensure this registration is available to the wider community, consider contributing this change upstream to the NVIDIA-NeMo/Megatron-Bridge repository. This would remove the need for this local patch in the future.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@tyler-griggs tyler-griggs marked this pull request as draft February 26, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant