Skip to content

Treat MambaLayer as an FSDP unit module#4633

Draft
wujingyue wants to merge 6 commits into
NVIDIA:mainfrom
wujingyue:mamba
Draft

Treat MambaLayer as an FSDP unit module#4633
wujingyue wants to merge 6 commits into
NVIDIA:mainfrom
wujingyue:mamba

Conversation

@wujingyue
Copy link
Copy Markdown
Contributor

@wujingyue wujingyue commented May 5, 2026

Based on #4717.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

wujingyue and others added 6 commits May 10, 2026 23:30
Buckets whose parameter group has fsdp_unit_id is None must remain
persistently allocated, because their parameters can be read across
module boundaries (e.g. by fused kernels that bypass nn.Module.forward).
Mark only unit-owned buckets as releasable during reset, and scope the
"EMPTY after reset" assertion to those same buckets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…9m�[38;5;129;01mas�[39;00m�[38;5;250m �[39man�[38;5;250m �[39mFSDP�[38;5;250m �[39munit�[38;5;250m �[39m�[38;5;129;01mmodule�[39;00m

Co�[38;5;241m-�[39mAuthored�[38;5;241m-�[39m�[38;5;129;01mBy�[39;00m:�[38;5;250m �[39mClaude�[38;5;250m �[39mOpus�[38;5;250m �[39m�[38;5;241m4.7�[39m�[38;5;250m �[39m(�[38;5;241m1�[39mM�[38;5;250m �[39mcontext)�[38;5;250m �[39m�[38;5;241m<�[39mnoreply�[38;5;136m@anthropic�[39m.com�[38;5;241m>�[39m
@wujingyue wujingyue changed the title Minimal repro for Mamba's resharding Treat MambaLayer as an FSDP unit module May 13, 2026
Copy link
Copy Markdown
Member

@cspades cspades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this (without CG) on my CG branch with a simple HybridModel MEMEMEME and it works, were you able to reproduce the original bug and why does making non-units persistent fix it? Is it basically just that some pointer gets dereferenced after the first reset and we AG the wrong variable? More details would be super nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants