Skip to content

Comments

feat: async grpo + nemo gym#1985

Open
terrykong wants to merge 6 commits intomainfrom
tk/async-gym
Open

feat: async grpo + nemo gym#1985
terrykong wants to merge 6 commits intomainfrom
tk/async-gym

Conversation

@terrykong
Copy link
Contributor

@terrykong terrykong commented Feb 18, 2026

What does this PR do ?

This PR adds async RL to the gym path and adds a functional test to test that path that tests a slimmed down workspace assistant enviroment.

This PR also makes several QoL logging improvements:

  1. suppress numerous strict field complaints
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
  1. supress max seqlen logging
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     ) = await self._preprocess_chat(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/workspaces/nemo-rl/nemo_rl/models/generation/vllm/vllm_worker_async.py", line 374, in _preprocess_chat
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     res = await super()._preprocess_chat(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1133, in _preprocess_chat
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     prompt_inputs = await self._tokenize_prompt_input_async(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 990, in _tokenize_prompt_input_async
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     async for result in self._tokenize_prompt_inputs_async(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1011, in _tokenize_prompt_inputs_async
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     yield await self._normalize_prompt_text_to_input(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 881, in _normalize_prompt_text_to_input
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     return self._validate_input(request, input_ids, input_text)
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 962, in _validate_input
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     raise ValueError(

and replace with a single line warning for brevity:

(VllmAsyncGenerationWorker pid=2143685) WARNING:nemo_rl.models.generation.vllm.vllm_worker_async:vllm_worker_async.py:392: Prompt exceeds max_model_len: This model's maximum context length is 512 tokens. However, your request has 4202 input tokens. Please reduce the length of the input messages.
  1. Also add a log explaining exactly what can happen if vllm's model length is shorter than the first prompt. In this case vllm returns nothing and we gym returns an empty response which crashed RL before. Now it'll print:
raise ValueError(
                f"NeMo Gym returned a result with no generation data. "
                f"This typically means the prompt for the first turn already exceeds the vLLM max_model_len, "
                f"so vLLM rejected the request before any tokens could be generated.\n"
                f"  Prompt length: {len(prompt_token_ids)} tokens.\n"
                f"  → Fix: increase `policy.max_total_sequence_length` and `policy.generation.vllm_cfg.max_model_len` "
                f"to a value larger than {len(prompt_token_ids)}."
            )

This wasn't an issue for multiturn examples where we hit the model length since there was some intermediate generations, but if the very first one was too long, there wasn't an error instructing users what to do.

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

closes #1928
closes #1948

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@terrykong terrykong changed the title feat: async gym + genRM feat: async nemo gym Feb 18, 2026
…genrm change)

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
HeyyyyyyG and others added 3 commits February 18, 2026 06:45
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
This reverts commit 4e2f0a4.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong marked this pull request as ready for review February 20, 2026 07:55
@terrykong terrykong requested review from a team as code owners February 20, 2026 07:55
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Feb 20, 2026
@terrykong terrykong requested review from bxyu-nvidia and yfw February 20, 2026 08:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

The PR introduces async GRPO training with Nemo Gym environment integration. It adds configuration options for async training, implements conditional branching between synchronous and asynchronous execution paths, extends rollout utilities to support Nemo Gym environments, enhances error handling for context length issues, and adds comprehensive functional testing.

Changes

Cohort / File(s) Summary
Configuration
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml
Added async_grpo configuration block with enabled, max_trajectory_age_steps, in_flight_weight_updates, and recompute_kv_cache_after_weight_updates parameters.
Training Execution
examples/nemo_gym/run_grpo_nemo_gym.py
Introduced conditional async GRPO training path with validation guards for unsupported features, dynamic import of async_grpo_train, and fallback to synchronous training.
Async Rollout Utilities
nemo_rl/algorithms/async_utils.py, nemo_rl/environments/nemo_gym.py
Extended per-prompt group rollout to conditionally route through Nemo Gym-based rollout; added enhanced error handling for empty generation data with actionable diagnostics.
vLLM Integration
nemo_rl/models/generation/vllm/vllm_worker_async.py
Added targeted error handling for maximum context length exceptions, log suppression filters to reduce noise, and increased HTTP server keep-alive timeout to 120 seconds.
Functional Testing
tests/functional/L1_Functional_Tests_GPU.sh, tests/functional/grpo_async_gym.sh
Added new async GRPO Nemo Gym functional test script with dataset preparation, model configuration, coverage tracking, and KL divergence metric validation.

Sequence Diagram(s)

sequenceDiagram
    participant Script as Training Script
    participant Config as Configuration
    participant TrainLogic as Training Logic
    participant AsyncRollout as Async Rollout Manager
    participant NemoGym as Nemo Gym Environment
    participant vLLM as vLLM Worker
    
    Script->>Config: Load async_grpo config
    Config-->>Script: async_grpo.enabled status
    
    alt Async GRPO Enabled
        Script->>TrainLogic: Invoke async_grpo_train
        TrainLogic->>AsyncRollout: Start async rollout generation
        AsyncRollout->>NemoGym: Check _should_use_nemo_gym()
        alt Use Nemo Gym
            AsyncRollout->>NemoGym: run_async_nemo_gym_rollout()
            NemoGym->>vLLM: Generate responses
            vLLM-->>NemoGym: Response batch
            NemoGym-->>AsyncRollout: final_batch + metrics
        else Use Multi-turn Rollout
            AsyncRollout->>vLLM: run_async_multi_turn_rollout()
            vLLM-->>AsyncRollout: Rollout results
        end
        AsyncRollout-->>TrainLogic: Processed trajectories
    else Synchronous GRPO
        Script->>TrainLogic: Invoke grpo_train
        TrainLogic->>vLLM: Standard training rollout
        vLLM-->>TrainLogic: Results
    end
    
    TrainLogic-->>Script: Training complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

asyncRL

Suggested reviewers

  • parthchadha
  • yuki-97
🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major changes (+220/-23 lines) including async GRPO features and Nemo Gym integration, but the PR description lacks documented test results, performance metrics, or validation evidence. Add test results demonstrating async GRPO convergence validation, performance metrics, functional test pass confirmation, and regression analysis for synchronous GRPO path to PR description.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements async GRPO with Nemo Gym support as indicated by issue #1928 [super-pr] reference. Changes include async configuration, conditional async/sync training paths, Nemo Gym rollout integration, and new functional tests covering the async workflow.
Out of Scope Changes check ✅ Passed All code changes are directly related to implementing async GRPO with Nemo Gym: configuration additions, training path branching, worker rollout integration, error handling improvements, and functional tests—all aligned with the stated objective.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: async grpo + nemo gym' directly matches the main changes in the pull request, which introduce async GRPO training with Nemo Gym integration across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/async-gym

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/nemo_gym/run_grpo_nemo_gym.py (1)

237-256: Feature validation could be more robust.

The validation logic uses hardcoded branching based on feature name (use_dynamic_sampling vs. everything else). If a future feature is added to unsupported_features that's a boolean (not a dict with enabled), the else branch would raise a TypeError instead of a clear NotImplementedError.

Consider a more uniform approach:

♻️ Suggested refactor
-        for feature in unsupported_features:
-            if feature not in config["grpo"]:
-                continue
-
-            if feature == "use_dynamic_sampling":
-                if config["grpo"][feature]:
-                    raise NotImplementedError(
-                        f"{feature} is not supported with async GRPO"
-                    )
-            else:
-                if config["grpo"][feature]["enabled"]:
-                    raise NotImplementedError(
-                        f"{feature} is not supported with async GRPO"
-                    )
+        for feature in unsupported_features:
+            if feature not in config["grpo"]:
+                continue
+            val = config["grpo"][feature]
+            is_enabled = val if isinstance(val, bool) else val.get("enabled", False)
+            if is_enabled:
+                raise NotImplementedError(
+                    f"{feature} is not supported with async GRPO"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/nemo_gym/run_grpo_nemo_gym.py` around lines 237 - 256, The current
validation loop around unsupported_features in run_grpo_nemo_gym.py branches on
feature name and assumes non-boolean features are dicts with an "enabled" key;
update the loop that iterates over unsupported_features to fetch the value via
config["grpo"].get(feature) and normalize it to a boolean (e.g., if it's a dict,
use value.get("enabled", False); otherwise coerce with bool(value)); then raise
NotImplementedError only when the normalized enabled flag is True. This change
centralizes the check and prevents TypeError when future unsupported features
are simple booleans.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/algorithms/async_utils.py`:
- Line 654: The variable env_cfg assigned in async_utils.py via env_cfg =
self.master_config.get("env") is unused and should be removed; delete that
assignment (or replace with a deliberately ignored name like _env if you want to
signal intention) so Ruff F841 no longer flags it, leaving any needed logic that
depends on self.master_config.get("env") unchanged elsewhere in the file.

---

Nitpick comments:
In `@examples/nemo_gym/run_grpo_nemo_gym.py`:
- Around line 237-256: The current validation loop around unsupported_features
in run_grpo_nemo_gym.py branches on feature name and assumes non-boolean
features are dicts with an "enabled" key; update the loop that iterates over
unsupported_features to fetch the value via config["grpo"].get(feature) and
normalize it to a boolean (e.g., if it's a dict, use value.get("enabled",
False); otherwise coerce with bool(value)); then raise NotImplementedError only
when the normalized enabled flag is True. This change centralizes the check and
prevents TypeError when future unsupported features are simple booleans.

terrykong and others added 2 commits February 21, 2026 01:43
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 21, 2026
@terrykong terrykong enabled auto-merge (squash) February 21, 2026 04:44
@terrykong terrykong changed the title feat: async nemo gym feat: async grpo + nemo gym Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[super-pr] log data in async path [super-pr] Async RL + Nemo gym changes

3 participants