Skip to content

4D Parallelism Qwen3.5 MoE#20

Open
tomiock wants to merge 28 commits intomainfrom
dev-moe
Open

4D Parallelism Qwen3.5 MoE#20
tomiock wants to merge 28 commits intomainfrom
dev-moe

Conversation

@tomiock
Copy link
Copy Markdown
Member

@tomiock tomiock commented May 2, 2026

Current status:

  • 4D Parallelism (TP+PP+EP+DDP) on MoE models
  • tested on 8 devices locally

Missing features:

  • DP and EP should be able to share the mesh

Needs fix:

  • memory fluctuations due to image resolution with VIT (needs revision)
  • all MoE layers are initialized on all devices
  • Qwen3.5 Dense models are broken
  • 1F1B is not correctly implemented because of the microbatching
  • Load pre-trained MoE weights
  • PP checkpoints are broken

@tomiock tomiock self-assigned this May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant