Open
Conversation
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Contributor
📝 WalkthroughWalkthroughOptimized shard aggregation in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Contributor
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_rl/distributed/batched_data_dict.py`:
- Around line 572-582: The PackedTensor branch currently skips assignment when
shard_ranges is empty, leaving aggregated_shards[shard_idx][k] missing and
causing downstream KeyError; fix the elif isinstance(v, PackedTensor) block in
batched_data_dict.py to detect empty shard_ranges (or empty packed_slices) and
assign PackedTensor.empty_like(v) instead of skipping or attempting concat/slice
on empty lists so that aggregated_shards always contains the key with an
empty-like PackedTensor representation.
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
17e358b to
534d1c5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Optimize
BatchedDataDict.shard_by_batch_size()by replacing expensive per-chunk concats with single-allocation shard aggregation (pre-allocated tensor + a single PackedTensor concat), reducing allocation overhead.Issues
Closes #2001.
Additional Information
I tried a single concat for both tensors and
PackedTensor, but the overhead of storing N chunktorch.tensorsbefore the concat was slower than the baseline at smaller chunk sizes. Instead, we pre-allocate a per-shard output tensor to make use ofcopy.Performance benchmark
Click to expand benchmark script
cudacpuTesting
Before your PR is "Ready for review"
Pre checks: