feat: add hybridep by hemildesai · Pull Request #1333 · NVIDIA-NeMo/Automodel

hemildesai · 2026-02-19T02:49:52Z

Wandb - https://wandb.ai/Nemo-automodel/automodel-moe-dispatcher

This PR adds HybridEP support for MoE token dispatch and updates dependency/docker wiring needed to run it reliably.

Changelog

Add HybridEP backend support to MoE flex token dispatch:
- Introduce _HybridEPManager in nemo_automodel/components/moe/megatron/token_dispatcher.py.
- Add moe_flex_dispatcher_backend ("deepep" or "hybridep"), plus backend-specific SM settings.
- Add HybridEP preprocessing path that converts top-k indices to multihot routing metadata.
Extend fused all-to-all utilities in nemo_automodel/components/moe/megatron/fused_a2a.py:
- Add set_deepep_num_sms.
- Add HybridEP dispatch/combine autograd wrappers and buffer initialization/reset helpers.
Extend backend config and MoE wiring:
- BackendConfig.dispatcher now accepts "hybridep".
- Add dispatcher_num_sms.
- Treat "hybridep" as valid with te/gmm experts.
- Pass dispatcher backend + SM settings through MoE, GroupedExpertsDeepEP, and GroupedExpertsTE.
Add unit tests for HybridEP paths:
- tests/unit_tests/moe/test_backend_config.py
- tests/unit_tests/moe/test_experts.py
- tests/unit_tests/moe/test_layers.py
Update dependencies:
- Bump deep_ep to 7febc6e25660af0f54d95dd781ecdcd62265ecca (v1.2.1+7febc6e metadata).
- Add Linux-only override: nvidia-cudnn-cu12==9.19.0.56; sys_platform == 'linux'.
- Update uv.lock accordingly.
Fix Docker update behavior:
- docker/common/update_pyproject_pytorch.sh now removes existing override-dependencies and reinserts docker/common/uv-pytorch.toml under
  [tool.uv] to avoid duplicate/incorrect override blocks.

Additional Information

Closes Support HybridEP #891

Signed-off-by: Hemil Desai <hemild@nvidia.com>

copy-pr-bot · 2026-02-19T02:49:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hemildesai · 2026-02-19T02:50:48Z

/ok to test 8d0be12

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

hemildesai · 2026-02-19T07:30:18Z

/ok to test 6233747

feat: add hybridep

8d0be12

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai requested review from a team, HuiyingLi, ZhiyuLi-Nvidia, adil-a and akoumpa as code owners February 19, 2026 02:49

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 02:51 Inactive

copy-pr-bot bot temporarily deployed to test February 19, 2026 02:51 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 19, 2026 03:20 Failure

hemildesai and others added 3 commits February 19, 2026 07:15

fix

bd013be

Signed-off-by: Hemil Desai <hemild@nvidia.com>

fix

0d36c5d

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Update uv lock

6233747

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 07:30 Inactive

copy-pr-bot bot temporarily deployed to test February 19, 2026 07:30 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 07:51 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 08:00 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 08:16 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 19, 2026 08:16 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 08:16 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 19, 2026 17:36 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add hybridep#1333

feat: add hybridep#1333
hemildesai wants to merge 4 commits intomainfrom
hemil/hybrid-ep-2

hemildesai commented Feb 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 19, 2026

Uh oh!

hemildesai commented Feb 19, 2026

Uh oh!

hemildesai commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

hemildesai commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 19, 2026

Uh oh!

hemildesai commented Feb 19, 2026

Uh oh!

hemildesai commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hemildesai commented Feb 19, 2026 •

edited

Loading