Conversation
…genrm change) Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
a1b530f to
4ea3217
Compare
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
This reverts commit 4e2f0a4. Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
📝 WalkthroughWalkthroughThe PR introduces async GRPO training with Nemo Gym environment integration. It adds configuration options for async training, implements conditional branching between synchronous and asynchronous execution paths, extends rollout utilities to support Nemo Gym environments, enhances error handling for context length issues, and adds comprehensive functional testing. Changes
Sequence Diagram(s)sequenceDiagram
participant Script as Training Script
participant Config as Configuration
participant TrainLogic as Training Logic
participant AsyncRollout as Async Rollout Manager
participant NemoGym as Nemo Gym Environment
participant vLLM as vLLM Worker
Script->>Config: Load async_grpo config
Config-->>Script: async_grpo.enabled status
alt Async GRPO Enabled
Script->>TrainLogic: Invoke async_grpo_train
TrainLogic->>AsyncRollout: Start async rollout generation
AsyncRollout->>NemoGym: Check _should_use_nemo_gym()
alt Use Nemo Gym
AsyncRollout->>NemoGym: run_async_nemo_gym_rollout()
NemoGym->>vLLM: Generate responses
vLLM-->>NemoGym: Response batch
NemoGym-->>AsyncRollout: final_batch + metrics
else Use Multi-turn Rollout
AsyncRollout->>vLLM: run_async_multi_turn_rollout()
vLLM-->>AsyncRollout: Rollout results
end
AsyncRollout-->>TrainLogic: Processed trajectories
else Synchronous GRPO
Script->>TrainLogic: Invoke grpo_train
TrainLogic->>vLLM: Standard training rollout
vLLM-->>TrainLogic: Results
end
TrainLogic-->>Script: Training complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/nemo_gym/run_grpo_nemo_gym.py (1)
237-256: Feature validation could be more robust.The validation logic uses hardcoded branching based on feature name (
use_dynamic_samplingvs. everything else). If a future feature is added tounsupported_featuresthat's a boolean (not a dict withenabled), theelsebranch would raise aTypeErrorinstead of a clearNotImplementedError.Consider a more uniform approach:
♻️ Suggested refactor
- for feature in unsupported_features: - if feature not in config["grpo"]: - continue - - if feature == "use_dynamic_sampling": - if config["grpo"][feature]: - raise NotImplementedError( - f"{feature} is not supported with async GRPO" - ) - else: - if config["grpo"][feature]["enabled"]: - raise NotImplementedError( - f"{feature} is not supported with async GRPO" - ) + for feature in unsupported_features: + if feature not in config["grpo"]: + continue + val = config["grpo"][feature] + is_enabled = val if isinstance(val, bool) else val.get("enabled", False) + if is_enabled: + raise NotImplementedError( + f"{feature} is not supported with async GRPO" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/nemo_gym/run_grpo_nemo_gym.py` around lines 237 - 256, The current validation loop around unsupported_features in run_grpo_nemo_gym.py branches on feature name and assumes non-boolean features are dicts with an "enabled" key; update the loop that iterates over unsupported_features to fetch the value via config["grpo"].get(feature) and normalize it to a boolean (e.g., if it's a dict, use value.get("enabled", False); otherwise coerce with bool(value)); then raise NotImplementedError only when the normalized enabled flag is True. This change centralizes the check and prevents TypeError when future unsupported features are simple booleans.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_rl/algorithms/async_utils.py`:
- Line 654: The variable env_cfg assigned in async_utils.py via env_cfg =
self.master_config.get("env") is unused and should be removed; delete that
assignment (or replace with a deliberately ignored name like _env if you want to
signal intention) so Ruff F841 no longer flags it, leaving any needed logic that
depends on self.master_config.get("env") unchanged elsewhere in the file.
---
Nitpick comments:
In `@examples/nemo_gym/run_grpo_nemo_gym.py`:
- Around line 237-256: The current validation loop around unsupported_features
in run_grpo_nemo_gym.py branches on feature name and assumes non-boolean
features are dicts with an "enabled" key; update the loop that iterates over
unsupported_features to fetch the value via config["grpo"].get(feature) and
normalize it to a boolean (e.g., if it's a dict, use value.get("enabled",
False); otherwise coerce with bool(value)); then raise NotImplementedError only
when the normalized enabled flag is True. This change centralizes the check and
prevents TypeError when future unsupported features are simple booleans.
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
What does this PR do ?
This PR adds async RL to the gym path and adds a functional test to test that path that tests a slimmed down workspace assistant enviroment.
This PR also makes several QoL logging improvements:
and replace with a single line warning for brevity:
This wasn't an issue for multiturn examples where we hit the model length since there was some intermediate generations, but if the very first one was too long, there wasn't an error instructing users what to do.
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
closes #1928
closes #1948
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information