optimize cp gdn#128
Conversation
There was a problem hiding this comment.
Code Review
This pull request optimizes the GatedDeltaNet module by introducing a fused THD AlltoAll path (using a single AlltoAll and sequence permutation) when _build_thd_cp_a2a_perm is available, falling back to the per-sequence loop otherwise. Feedback suggests using the local variable cp_size instead of self.cp_size in the fallback path to ensure consistency and avoid potential AttributeErrors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| qkvzba = qkvzba.index_select(0, thd_cp_a2a_idx) | ||
| else: | ||
| # Fallback: per-sequence loop | ||
| unpacked_qkvzba = _unpack_sequence(qkvzba, cu_seqlens // self.cp_size, dim=0) |
There was a problem hiding this comment.
Use the local variable cp_size instead of self.cp_size for consistency with the rest of the method and to avoid potential AttributeError if self.cp_size is not defined on the parent class.
| unpacked_qkvzba = _unpack_sequence(qkvzba, cu_seqlens // self.cp_size, dim=0) | |
| unpacked_qkvzba = _unpack_sequence(qkvzba, cu_seqlens // cp_size, dim=0) |
No description provided.