feat: async grpo + nemo gym by terrykong · Pull Request #1985 · NVIDIA-NeMo/RL

terrykong · 2026-02-18T06:05:35Z

What does this PR do ?

This PR adds async RL to the gym path and adds a functional test to test that path that tests a slimmed down workspace assistant enviroment.

This PR also makes several QoL logging improvements:

suppress numerous strict field complaints

(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}
(VllmAsyncGenerationWorker pid=1980549) WARNING 02-19 20:28:04 [protocol.py:126] The following fields were present in the request but ignored: {'strict'}

supress max seqlen logging

(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     ) = await self._preprocess_chat(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/workspaces/nemo-rl/nemo_rl/models/generation/vllm/vllm_worker_async.py", line 374, in _preprocess_chat
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     res = await super()._preprocess_chat(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1133, in _preprocess_chat
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     prompt_inputs = await self._tokenize_prompt_input_async(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 990, in _tokenize_prompt_input_async
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     async for result in self._tokenize_prompt_inputs_async(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1011, in _tokenize_prompt_inputs_async
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     yield await self._normalize_prompt_text_to_input(
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 881, in _normalize_prompt_text_to_input
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     return self._validate_input(request, input_ids, input_text)
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]   File "/opt/ray_venvs/nemo_rl.models.generation.vllm.vllm_worker_async.VllmAsyncGenerationWorker/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 962, in _validate_input
(VllmAsyncGenerationWorker pid=2124680) ERROR 02-19 20:51:17 [serving_chat.py:257]     raise ValueError(

and replace with a single line warning for brevity:

(VllmAsyncGenerationWorker pid=2143685) WARNING:nemo_rl.models.generation.vllm.vllm_worker_async:vllm_worker_async.py:392: Prompt exceeds max_model_len: This model's maximum context length is 512 tokens. However, your request has 4202 input tokens. Please reduce the length of the input messages.

Also add a log explaining exactly what can happen if vllm's model length is shorter than the first prompt. In this case vllm returns nothing and we gym returns an empty response which crashed RL before. Now it'll print:

raise ValueError(
                f"NeMo Gym returned a result with no generation data. "
                f"This typically means the prompt for the first turn already exceeds the vLLM max_model_len, "
                f"so vLLM rejected the request before any tokens could be generated.\n"
                f"  Prompt length: {len(prompt_token_ids)} tokens.\n"
                f"  → Fix: increase `policy.max_total_sequence_length` and `policy.generation.vllm_cfg.max_model_len` "
                f"to a value larger than {len(prompt_token_ids)}."
            )

This wasn't an issue for multiturn examples where we hit the model length since there was some intermediate generations, but if the very first one was too long, there wasn't an error instructing users what to do.

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

closes #1928
closes #1948

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

…genrm change) Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

This reverts commit 4e2f0a4. Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

coderabbitai · 2026-02-20T08:00:50Z

📝 Walkthrough

Walkthrough

The PR introduces async GRPO training with Nemo Gym environment integration. It adds configuration options for async training, implements conditional branching between synchronous and asynchronous execution paths, extends rollout utilities to support Nemo Gym environments, enhances error handling for context length issues, and adds comprehensive functional testing.

Changes

Cohort / File(s)	Summary
Configuration `examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml`	Added `async_grpo` configuration block with `enabled`, `max_trajectory_age_steps`, `in_flight_weight_updates`, and `recompute_kv_cache_after_weight_updates` parameters.
Training Execution `examples/nemo_gym/run_grpo_nemo_gym.py`	Introduced conditional async GRPO training path with validation guards for unsupported features, dynamic import of `async_grpo_train`, and fallback to synchronous training.
Async Rollout Utilities `nemo_rl/algorithms/async_utils.py`, `nemo_rl/environments/nemo_gym.py`	Extended per-prompt group rollout to conditionally route through Nemo Gym-based rollout; added enhanced error handling for empty generation data with actionable diagnostics.
vLLM Integration `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Added targeted error handling for maximum context length exceptions, log suppression filters to reduce noise, and increased HTTP server keep-alive timeout to 120 seconds.
Functional Testing `tests/functional/L1_Functional_Tests_GPU.sh`, `tests/functional/grpo_async_gym.sh`	Added new async GRPO Nemo Gym functional test script with dataset preparation, model configuration, coverage tracking, and KL divergence metric validation.

Sequence Diagram(s)

sequenceDiagram
    participant Script as Training Script
    participant Config as Configuration
    participant TrainLogic as Training Logic
    participant AsyncRollout as Async Rollout Manager
    participant NemoGym as Nemo Gym Environment
    participant vLLM as vLLM Worker
    
    Script->>Config: Load async_grpo config
    Config-->>Script: async_grpo.enabled status
    
    alt Async GRPO Enabled
        Script->>TrainLogic: Invoke async_grpo_train
        TrainLogic->>AsyncRollout: Start async rollout generation
        AsyncRollout->>NemoGym: Check _should_use_nemo_gym()
        alt Use Nemo Gym
            AsyncRollout->>NemoGym: run_async_nemo_gym_rollout()
            NemoGym->>vLLM: Generate responses
            vLLM-->>NemoGym: Response batch
            NemoGym-->>AsyncRollout: final_batch + metrics
        else Use Multi-turn Rollout
            AsyncRollout->>vLLM: run_async_multi_turn_rollout()
            vLLM-->>AsyncRollout: Rollout results
        end
        AsyncRollout-->>TrainLogic: Processed trajectories
    else Synchronous GRPO
        Script->>TrainLogic: Invoke grpo_train
        TrainLogic->>vLLM: Standard training rollout
        vLLM-->>TrainLogic: Results
    end
    
    TrainLogic-->>Script: Training complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

cp: feat: add capability to update weights inflight during generation (1381) into r0.4.0 #1464: Adds the same async_grpo configuration parameters (in_flight_weight_updates, recompute_kv_cache_after_weight_updates) and implements async GRPO runtime execution paths.
feat: Necessary changes for Gym GRPO tutorial #1630: Modifies the Nemo-Gym GRPO stack including the same YAML configuration file and async GRPO training/worker paths.
feat: Adding perf metrics #1183: Changes async_grpo_train and async rollout mechanics that directly support this PR's new async GRPO training flow.

Suggested labels

asyncRL

Suggested reviewers

parthchadha
yuki-97

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces major changes (+220/-23 lines) including async GRPO features and Nemo Gym integration, but the PR description lacks documented test results, performance metrics, or validation evidence.	Add test results demonstrating async GRPO convergence validation, performance metrics, functional test pass confirmation, and regression analysis for synchronous GRPO path to PR description.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR implements async GRPO with Nemo Gym support as indicated by issue `#1928` [super-pr] reference. Changes include async configuration, conditional async/sync training paths, Nemo Gym rollout integration, and new functional tests covering the async workflow.
Out of Scope Changes check	✅ Passed	All code changes are directly related to implementing async GRPO with Nemo Gym: configuration additions, training path branching, worker rollout integration, error handling improvements, and functional tests—all aligned with the stated objective.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: async grpo + nemo gym' directly matches the main changes in the pull request, which introduce async GRPO training with Nemo Gym integration across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/async-gym

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/nemo_gym/run_grpo_nemo_gym.py (1)

237-256: Feature validation could be more robust.

The validation logic uses hardcoded branching based on feature name (use_dynamic_sampling vs. everything else). If a future feature is added to unsupported_features that's a boolean (not a dict with enabled), the else branch would raise a TypeError instead of a clear NotImplementedError.

Consider a more uniform approach:

♻️ Suggested refactor

-        for feature in unsupported_features:
-            if feature not in config["grpo"]:
-                continue
-
-            if feature == "use_dynamic_sampling":
-                if config["grpo"][feature]:
-                    raise NotImplementedError(
-                        f"{feature} is not supported with async GRPO"
-                    )
-            else:
-                if config["grpo"][feature]["enabled"]:
-                    raise NotImplementedError(
-                        f"{feature} is not supported with async GRPO"
-                    )
+        for feature in unsupported_features:
+            if feature not in config["grpo"]:
+                continue
+            val = config["grpo"][feature]
+            is_enabled = val if isinstance(val, bool) else val.get("enabled", False)
+            if is_enabled:
+                raise NotImplementedError(
+                    f"{feature} is not supported with async GRPO"
+                )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/nemo_gym/run_grpo_nemo_gym.py` around lines 237 - 256, The current
validation loop around unsupported_features in run_grpo_nemo_gym.py branches on
feature name and assumes non-boolean features are dicts with an "enabled" key;
update the loop that iterates over unsupported_features to fetch the value via
config["grpo"].get(feature) and normalize it to a boolean (e.g., if it's a dict,
use value.get("enabled", False); otherwise coerce with bool(value)); then raise
NotImplementedError only when the normalized enabled flag is True. This change
centralizes the check and prevents TypeError when future unsupported features
are simple booleans.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/algorithms/async_utils.py`:
- Line 654: The variable env_cfg assigned in async_utils.py via env_cfg =
self.master_config.get("env") is unused and should be removed; delete that
assignment (or replace with a deliberately ignored name like _env if you want to
signal intention) so Ruff F841 no longer flags it, leaving any needed logic that
depends on self.master_config.get("env") unchanged elsewhere in the file.

---

Nitpick comments:
In `@examples/nemo_gym/run_grpo_nemo_gym.py`:
- Around line 237-256: The current validation loop around unsupported_features
in run_grpo_nemo_gym.py branches on feature name and assumes non-boolean
features are dicts with an "enabled" key; update the loop that iterates over
unsupported_features to fetch the value via config["grpo"].get(feature) and
normalize it to a boolean (e.g., if it's a dict, use value.get("enabled",
False); otherwise coerce with bool(value)); then raise NotImplementedError only
when the normalized enabled flag is True. This change centralizes the check and
prevents TypeError when future unsupported features are simple booleans.

nemo_rl/algorithms/async_utils.py

Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong changed the title ~~feat: async gym + genRM~~ feat: async nemo gym Feb 18, 2026

FROM 72e87b1bcff34bc1c2a67f352202d3a2cdbb3b84 async+gym (but missing …

4ea3217

…genrm change) Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the tk/async-gym branch from a1b530f to 4ea3217 Compare February 18, 2026 06:32

HeyyyyyyG and others added 3 commits February 18, 2026 06:45

add genrm rlhf to gym (just the non genrm things)

4e2f0a4

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

Revert "add genrm rlhf to gym (just the non genrm things)"

e925ca6

This reverts commit 4e2f0a4. Signed-off-by: Terry Kong <terryk@nvidia.com>

fix a lot of logging stuff and add functional

8e2952f

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong marked this pull request as ready for review February 20, 2026 07:55

terrykong requested review from a team as code owners February 20, 2026 07:55

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Feb 20, 2026

terrykong temporarily deployed to nemo-ci February 20, 2026 07:55 — with GitHub Actions Inactive

terrykong requested review from bxyu-nvidia and yfw February 20, 2026 08:00

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

nemo_rl/algorithms/async_utils.py Show resolved Hide resolved

terrykong temporarily deployed to nemo-ci February 20, 2026 10:22 — with GitHub Actions Inactive

terrykong had a problem deploying to nemo-ci February 20, 2026 12:49 — with GitHub Actions Failure

terrykong and others added 2 commits February 21, 2026 01:43

add workplace download

e82518a

Signed-off-by: Terry Kong <terryk@nvidia.com>

log data in async path

affd793

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 21, 2026

terrykong enabled auto-merge (squash) February 21, 2026 04:44

terrykong temporarily deployed to nemo-ci February 21, 2026 04:44 — with GitHub Actions Inactive

yfw approved these changes Feb 21, 2026

View reviewed changes

terrykong temporarily deployed to nemo-ci February 21, 2026 07:12 — with GitHub Actions Inactive

terrykong changed the title ~~feat: async nemo gym~~ feat: async grpo + nemo gym Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: async grpo + nemo gym#1985

feat: async grpo + nemo gym#1985
terrykong wants to merge 6 commits intomainfrom
tk/async-gym

terrykong commented Feb 18, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

terrykong commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terrykong commented Feb 18, 2026 •

edited

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading