Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,29 @@ ckpts/
coverage.json
.coverage*
test_assets/
.nrl_remote_map.json
.nrl_remote_state.json
# Test biproducts
tests/functional/*/

# Gym
/3rdparty/Gym-workspace/Gym/cache/uv/
/3rdparty/Gym-workspace/Gym/res*/*/.venv/
/3rdparty/Gym-workspace/Gym/res*/*/.venv/
/3rdparty/Gym-workspace/Gym/.venv/

# Cache
uv_cache/
hf_home/
hf_datasets_cache/
*logs/
datasets/
/datasets/
wandb/
checkpoints/
results/
code_snapshots/
code_snapshots*/
.cache/

# Runtime env
*runtime_env.yaml
!default_runtime_env.yaml
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ ckpts/
# Test
coverage.json
.coverage*
unit_results.json
unit_results/
test_assets/
.nrl_remote_map.json
.nrl_remote_state.json
# Test biproducts
tests/functional/*/

# Cache
uv_cache/
Expand Down
2 changes: 1 addition & 1 deletion 3rdparty/Gym-workspace/Gym
Submodule Gym updated 211 files
42 changes: 25 additions & 17 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,10 @@ ENV RAY_USAGE_STATS_ENABLED=0
# need to be compiled, so NeMo RL has an implementation in nemo_rl/utils/venv.py that does it once per node as opposed to once per task.
ENV RAY_ENABLE_UV_RUN_RUNTIME_ENV=0
ENV NEMO_RL_VENV_DIR=/opt/ray_venvs
ENV NEMO_GYM_VENV_DIR=/opt/gym_venvs
# Config paths (relative to repo root) whose NeMo Gym venvs should be prefetched.
# Override to prefetch venvs for different configs, or set to empty to skip.
ARG NEMO_GYM_PREFETCH_CONFIGS="examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml examples/nemo_gym/grpo_nanov3.yaml"


FROM base AS hermetic
Expand Down Expand Up @@ -112,23 +116,22 @@ ENV UV_LINK_MODE=copy
# Ensure DeepEP is built for H100 and B200 (also mcore inference unified memory API now invokes a torch API that requires these to be set)
ENV TORCH_CUDA_ARCH_LIST="9.0 10.0"

# First copy only the dependency files
COPY --from=nemo-rl pyproject.toml uv.lock ./
# Copy in the top level __init__.py/package_info.py since build-custom-vllm.sh needs the nemo_rl package to exist.
COPY --from=nemo-rl nemo_rl/__init__.py nemo_rl/package_info.py ./nemo_rl/
COPY --from=nemo-rl tools/build-custom-vllm.sh ./tools/build-custom-vllm.sh
COPY --from=nemo-rl tools/build-custom-flashinfer.sh ./tools/build-custom-flashinfer.sh
COPY --from=nemo-rl --link research/ ./research/
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/
# Copy in source from build context (defaults to cloned repo, can be overridden)
COPY --from=nemo-rl . /opt/nemo-rl
# Unshallow the repo to get the full history (in the case it was from the scratch layer).
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
# so do a quick check before trying to unshallow.
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true

RUN --mount=type=ssh <<"EOF" bash -exu
uv venv --seed
# The custom build scripts will alter the pyproject.toml and uv.lock
if [[ -n "${BUILD_CUSTOM_VLLM:-}" ]]; then
bash tools/build-custom-vllm.sh ${BUILD_CUSTOM_VLLM_URL} ${BUILD_CUSTOM_VLLM_REF} ${BUILD_CUSTOM_VLLM_PRECOMPILED_WHEEL_LOCATION}
UV_LINK_MODE=symlink bash tools/build-custom-vllm.sh ${BUILD_CUSTOM_VLLM_URL} ${BUILD_CUSTOM_VLLM_REF} ${BUILD_CUSTOM_VLLM_PRECOMPILED_WHEEL_LOCATION}
source 3rdparty/vllm/nemo-rl.env
fi
if [[ -n "${BUILD_CUSTOM_FLASHINFER:-}" ]]; then
bash tools/build-custom-flashinfer.sh ${BUILD_CUSTOM_FLASHINFER_URL} ${BUILD_CUSTOM_FLASHINFER_REF}
UV_LINK_MODE=symlink bash tools/build-custom-flashinfer.sh ${BUILD_CUSTOM_FLASHINFER_URL} ${BUILD_CUSTOM_FLASHINFER_REF}
fi
# uv sync has a more reliable resolver than simple uv pip install which can fail

Expand All @@ -148,6 +151,18 @@ uv sync --link-mode symlink --locked --extra mcore --no-install-project
uv sync --link-mode symlink --locked --extra automodel --no-install-project
uv sync --link-mode symlink --locked --all-groups --no-install-project

# Prefetch NeMo Gym internal venvs (for gym servers like code_gen, math, etc.)
if [[ -n "${NEMO_GYM_PREFETCH_CONFIGS:-}" ]]; then
UV_LINK_MODE=symlink uv run python examples/nemo_gym/prefetch_venvs.py $NEMO_GYM_PREFETCH_CONFIGS
fi

# Remove /tmp/ray because the previous script starts up a local ray cluster which creates a session
# that we can just clean up.
rm -rf /tmp/ray
Comment on lines +154 to +161
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prefetch script and nemo_rl source are both absent in the hermetic stage — build breaks by default

Two problems combine to break the Docker build:

  1. Missing file: examples/nemo_gym/prefetch_venvs.py is never copied in the hermetic stage (the COPY commands at lines 120–126 only bring in pyproject.toml, uv.lock, two nemo_rl entry files, and tools/). Python will exit with "No such file or directory", failing the RUN layer.

  2. Incomplete nemo_rl source: Even if the file were present, the script imports nemo_rl.environments.nemo_gym, nemo_rl.distributed.virtual_cluster, etc. None of these sub-packages are present in hermetic (only nemo_rl/__init__.py and nemo_rl/package_info.py are copied), so the imports would raise ModuleNotFoundError.

Because once a build argument is declared in a stage it is automatically inherited by child stages, NEMO_GYM_PREFETCH_CONFIGS has its non-empty default value when the if condition is evaluated in hermetic, so this code path executes on every default build.

Recommended fix: Move the prefetch block to the release stage (after line 194, where the full source tree is copied), and re-declare the ARG there — matching the pattern already used for nemo_rl/utils/prefetch_venvs.py.

🐛 Proposed fix — move prefetch to `release` stage

Remove lines 155–162 from the hermetic RUN block, then in the release stage add:

 # Re-declare build args for this stage
+ARG NEMO_GYM_PREFETCH_CONFIGS
 ARG SKIP_VLLM_BUILD
 ...

 # (after the full COPY at line 194 and after uv sync installs nemo_rl)
+# Prefetch NeMo Gym internal venvs (for gym servers like code_gen, math, etc.)
+RUN if [[ -n "${NEMO_GYM_PREFETCH_CONFIGS:-}" ]]; then \
+        UV_LINK_MODE=symlink uv run python examples/nemo_gym/prefetch_venvs.py \
+            $NEMO_GYM_PREFETCH_CONFIGS; \
+        rm -rf /tmp/ray; \
+    fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile` around lines 155 - 162, The hermetic stage runs a prefetch
step referencing examples/nemo_gym/prefetch_venvs.py and
NEMO_GYM_PREFETCH_CONFIGS but that script and nemo_rl subpackages aren't copied
into hermetic, causing the build to fail; remove the prefetch RUN block from the
hermetic stage and relocate it into the release stage (after the full source
copy) and re-declare ARG NEMO_GYM_PREFETCH_CONFIGS there so the prefetch
invocation (UV_LINK_MODE=symlink uv run python
examples/nemo_gym/prefetch_venvs.py $NEMO_GYM_PREFETCH_CONFIGS) executes only
when the full source (including nemo_rl and examples) is present.


# Prune unreachable cache entries
uv cache prune

# Remove the aiohttp in this uv cache dir to fully address CVE GHSA-mqqc-3gqh-h2x8
# The ray install will include the older aiohttp version in its cache
find /root/.cache/uv -type d -path "*ray/_private/runtime_env/agent/thirdparty_files/aiohttp*" -exec rm -rf {} +
Expand Down Expand Up @@ -176,13 +191,6 @@ LABEL com.nvidia.build.ref="${NVIDIA_BUILD_REF}"

ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

# Copy in source from build context (defaults to cloned repo, can be overridden)
# Exclude pyproject.toml and uv.lock since those may be altered by build-custom-vllm.sh
COPY --from=nemo-rl --exclude=pyproject.toml --exclude=uv.lock . /opt/nemo-rl
# Unshallow the repo to get the full history (in the case it was from the scratch layer).
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
# so do a quick check before trying to unshallow.
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true
RUN <<"EOF" bash -exu
NEGATIVE_FILTERS=""
if [[ -n "${SKIP_VLLM_BUILD:-}" ]]; then
Expand Down
134 changes: 134 additions & 0 deletions examples/nemo_gym/prefetch_venvs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copyright year should be 2026 for this new file

As per coding guidelines, the copyright header for new files should carry the current year (2026).

✏️ Proposed fix
-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION.  All rights reserved.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/nemo_gym/prefetch_venvs.py` at line 1, The file's copyright header
currently reads "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights
reserved."—update that header to use the current year by replacing "2025" with
"2026" so the top-of-file copyright line reflects 2026.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Prefetch NeMo Gym internal venvs by doing a dry run of NemoGym initialization.

This complements nemo_rl/utils/prefetch_venvs.py (which prefetches Ray actor venvs)
by also triggering NeMo Gym's own internal venv creation for its servers (code_gen,
math, etc.). It reuses the real code path (create_env -> NemoGym.__init__) with
dry_run=True so no actual policy model is needed.
"""

import argparse
import sys

import ray
from omegaconf import OmegaConf

from nemo_rl.distributed.virtual_cluster import init_ray
from nemo_rl.environments.nemo_gym import (
NemoGymConfig,
get_nemo_gym_uv_cache_dir,
get_nemo_gym_venv_dir,
)
from nemo_rl.environments.utils import create_env
from nemo_rl.utils.config import load_config, register_omegaconf_resolvers


def prefetch_nemo_gym_venvs(config_paths: list[str]) -> None:
"""Prefetch NeMo Gym venvs for each config by doing a dry-run initialization.

Args:
config_paths: List of paths to NeMo RL config files that contain
an env.nemo_gym section.
"""
register_omegaconf_resolvers()
init_ray()

succeeded = []
failed = []

for config_path in config_paths:
print(f"\n{'=' * 60}")
print(f"Processing config: {config_path}")
print("=" * 60)

try:
config = load_config(config_path)
config = OmegaConf.to_container(config, resolve=True)

nemo_gym_dict = dict(config["env"]["nemo_gym"])
nemo_gym_dict["dry_run"] = True
uv_cache_dir = get_nemo_gym_uv_cache_dir()
if uv_cache_dir is not None:
nemo_gym_dict.setdefault("uv_cache_dir", uv_cache_dir)
uv_venv_dir = get_nemo_gym_venv_dir()
if uv_venv_dir is not None:
nemo_gym_dict.setdefault("uv_venv_dir", uv_venv_dir)

nemo_gym_config = NemoGymConfig(
model_name="dummy-model",
base_urls=["http://localhost:8000"],
initial_global_config_dict=nemo_gym_dict,
)

print("Creating NeMo Gym environment (dry_run=True)...")
nemo_gym = create_env(env_name="nemo_gym", env_config=nemo_gym_config)

print("Waiting for NeMo Gym to finish initialization...")
ray.get(nemo_gym.health_check.remote())
Comment on lines +79 to +80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ray.get() without a timeout can cause an infinite hang in Docker builds

If NemoGym.__init__ hangs during dry_run initialization (e.g., network issue, external process deadlock), the Docker build layer stalls with no way to recover.

🛡️ Proposed fix — add a generous build-time timeout
-            ray.get(nemo_gym.health_check.remote())
+            ray.get(nemo_gym.health_check.remote(), timeout=1800)  # 30-minute cap for Docker builds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/nemo_gym/prefetch_venvs.py` around lines 79 - 80, The current
blocking call ray.get(nemo_gym.health_check.remote()) can hang indefinitely
during Docker builds; modify the call to include a generous timeout (e.g.,
timeout=300) and wrap it in a try/except to catch ray.exceptions.GetTimeoutError
(and other Ray exceptions) so you can log/raise a clear error or exit cleanly
instead of stalling the build; locate the call to nemo_gym.health_check.remote()
in prefetch_venvs.py and update that invocation and its error handling
accordingly.

print("NeMo Gym initialized successfully.")

# TODO: Hangs... (DONT MERGE UNTIL FIXED - but kill may be fine)
# print("Shutting down NeMo Gym environment...")
# ray.get(nemo_gym.shutdown.remote())
print("Killing NeMo Gym actor...")
ray.kill(nemo_gym)

succeeded.append(config_path)
print(f"Done with config: {config_path}")

except Exception as e:
print(f"Error processing {config_path}: {e}")
failed.append((config_path, str(e)))

print(f"\n{'=' * 60}")
print("NeMo Gym venv prefetch summary")
print("=" * 60)
print(f" Succeeded: {len(succeeded)}")
for path in succeeded:
print(f" - {path}")
if failed:
print(f" Failed: {len(failed)}")
for path, err in failed:
print(f" - {path}: {err}")

if failed:
sys.exit(1)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Prefetch NeMo Gym internal venvs via dry-run initialization.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""\
Examples:
# Prefetch venvs for a single config
uv run python examples/nemo_gym/prefetch_venvs.py \\
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml

# Prefetch venvs for multiple configs sequentially
uv run python examples/nemo_gym/prefetch_venvs.py \\
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml \\
examples/nemo_gym/grpo_qwen3_30ba3b_instruct.yaml
""",
)
parser.add_argument(
"configs",
nargs="+",
help="One or more NeMo RL config file paths containing an env.nemo_gym section.",
)
args = parser.parse_args()

prefetch_nemo_gym_venvs(args.configs)
11 changes: 10 additions & 1 deletion examples/nemo_gym/run_grpo_nemo_gym.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@
from nemo_rl.distributed.virtual_cluster import init_ray
from nemo_rl.environments.nemo_gym import (
NemoGymConfig,
get_nemo_gym_uv_cache_dir,
get_nemo_gym_venv_dir,
setup_nemo_gym_config,
)
from nemo_rl.environments.utils import create_env
Expand Down Expand Up @@ -207,10 +209,17 @@ def main() -> None:
is_trajectory_collection = (
config["env"]["nemo_gym"].pop("is_trajectory_collection", False) or False
)
nemo_gym_dict = config["env"]["nemo_gym"]
uv_cache_dir = get_nemo_gym_uv_cache_dir()
if uv_cache_dir is not None:
nemo_gym_dict.setdefault("uv_cache_dir", uv_cache_dir)
uv_venv_dir = get_nemo_gym_venv_dir()
if uv_venv_dir is not None:
nemo_gym_dict.setdefault("uv_venv_dir", uv_venv_dir)
nemo_gym_config = NemoGymConfig(
model_name=policy_generation.cfg["model_name"],
base_urls=policy_generation.dp_openai_server_base_urls,
initial_global_config_dict=config["env"]["nemo_gym"],
initial_global_config_dict=nemo_gym_dict,
)
nemo_gym = create_env(env_name="nemo_gym", env_config=nemo_gym_config)
# Blocking wait for NeMo-Gym to spin up
Expand Down
28 changes: 27 additions & 1 deletion nemo_rl/environments/nemo_gym.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import subprocess
from pathlib import Path
from typing import Any, Dict, List, TypedDict
from typing import Any, Dict, List, Optional, TypedDict

import ray
import torch
Expand All @@ -23,6 +25,30 @@
from nemo_rl.utils.timer import Timer


def get_nemo_gym_uv_cache_dir() -> Optional[str]:
"""Return the uv cache directory inside a container, or None outside one.

Inside a container (NRL_CONTAINER=1), returns the uv cache location so Gym
stores its caches in the expected shared path. Returns None outside a
container, meaning the caller should omit this arg and let Gym create the
cache locally (the default when you may not be able to write to /opt).
"""
if not os.environ.get("NRL_CONTAINER"):
return None
return subprocess.check_output(["uv", "cache", "dir"]).decode().strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Partial executable path for uv subprocess (S607)

subprocess.check_output(["uv", ...]) resolves uv via PATH, making the call susceptible to PATH-based spoofing. Additionally there is no timeout, so if uv hangs the caller blocks indefinitely.

🛡️ Proposed fix
+import shutil
 ...
-    return subprocess.check_output(["uv", "cache", "dir"]).decode().strip()
+    uv_bin = shutil.which("uv") or "uv"
+    return subprocess.check_output([uv_bin, "cache", "dir"], timeout=30).decode().strip()
🧰 Tools
🪛 Ruff (0.15.1)

[error] 38-38: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/environments/nemo_gym.py` at line 38, The subprocess call uses a
relative executable name and no timeout; change the call site where
subprocess.check_output(["uv", "cache", "dir"]) is invoked so it resolves an
absolute executable path (use shutil.which to find "uv" and raise/log if
missing) and replace check_output with subprocess.run (or check_output with
timeout) supplying a reasonable timeout and check=True, capturing stdout/stderr;
also handle and log FileNotFoundError and subprocess.TimeoutExpired so the
caller doesn't block or get spoofed by PATH.



def get_nemo_gym_venv_dir() -> Optional[str]:
"""Return the NeMo Gym venv directory from NEMO_GYM_VENV_DIR, or None.

Returns the value of NEMO_GYM_VENV_DIR if set, otherwise None. When None
the caller should omit this arg and let Gym create venvs locally (the
default when a container is not used since you may not be able to write
to /opt).
"""
return os.environ.get("NEMO_GYM_VENV_DIR")


class NemoGymConfig(TypedDict):
model_name: str
base_urls: List[str]
Expand Down
Loading
Loading