Skip to content

Comments

Fsiino/pipecleaning rl envs#744

Draft
fsiino-nvidia wants to merge 108 commits intomainfrom
fsiino/pipecleaning-rl-envs
Draft

Fsiino/pipecleaning rl envs#744
fsiino-nvidia wants to merge 108 commits intomainfrom
fsiino/pipecleaning-rl-envs

Conversation

@fsiino-nvidia
Copy link
Contributor

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@fsiino-nvidia fsiino-nvidia force-pushed the fsiino/pipecleaning-rl-envs branch from d7e72a1 to 9d5845a Compare February 21, 2026 01:59
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

fsiino-nvidia and others added 27 commits February 20, 2026 20:19
Adds `ng_pip_list` command to see the underlying uv pip list of the
specified environment.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This implements the `ng_status` command to list all running servers on
the system and ping for health check.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Peter Jin <pjin@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
function calling resources server based on
https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
need to set uv pip install python flag in colab environments when
launching servers

usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true `

defaults to false

For #370

Needed for notebook here:
https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
adds a section for single-step training with unsloth and trl

not sure if these should be broken into separate sections. Left as one
since the same notebook works for both, but could be confusing.

not sure if we should also add more info about multi-step (hopefully)
coming soon.

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
remove trl from docs, leaving just unsloth.

was unclear that they are together.

will make a trl section when we have a standalone trl notebook, or a
section on trl's docs too.

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079

Initially in #290 , the `response_class=PlainTextResponse` was added to
the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt
to debug parsing server info for the `ng_status` command. This lead to a
parsing error in `load_from_global_config`. This command now uses it's
own separate endpoint `server_instances`, so this needs to be removed.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Overall coverage failure threhsold is 95%, and test coverage is too low
for train_data_utils which brings down overall coverage of the
ng_dev_test suite. This covers some of those lingering test cases to
bring it from 89% to 97%.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This PR enables running Gym on Aviary environments. The two main
concepts:

- `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and
manages multiple environments
- Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs,
but an integer index into the `TaskDataset`. Otherwise we'd have data
defined in two places
- Instead of tool-specific endpoints, we have one `/step` endpoint. This
is because:
- Aviary environments define their transition function in `step()`.
Simply calling the bare tools can have undefined behavior (e.g. state
isn't updated properly)
- Aviary tools are not guaranteed to be available until `reset()` is
called.
  - A `/close` endpoint is added to tear down resources
- `AviaryAgent`: analogous to `SimpleAgent`, but:
- Request is an integer index (which is forwarded to
`AviaryResourcesServer`). In general, we expect `env.reset()` to provide
the first messages, not the calling code
  - All tool calls are sent to `/step`
  - We rely on the environment to tell us when we're done

Two concrete Aviary datasets/environments are integrated: GSM8k with a
calculator environment and BixBench with a notebook environment. Adding
new ones is pretty lightweight (most of the code in `notebook_app.py` is
from defining a BixBench-compatible environment, not the integration).

---------

Signed-off-by: Siddharth Narayanan <sid@futurehouse.org>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com>
Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Adds more descriptive readme, reward profiling, and option for
fractional or binary reward.

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This PR adds new environments for SWE tasks. The environments can be
used for single-step patch generation, test generation, and
LLM-as-a-judge. They have been tested for instances from SWE-bench,
SWE-Gym, and SWE-rebench. Patch and test generation environment runs
them against unittests in a containerized environment (Singularity).

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Integrating a new dataset using existing equivalency llm judge resource
server.

Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash
License:
https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE
Train: 8040 unique samples
Validation: 50 unique, randomly sampled from train
Augmentation on the source (minimal): Added system prompt, output
formatting requirement

Example of env validation:
- base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint)
- Step 30 -> 12.50% on Terminal Bench Core
- https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm

Train:  nl2bash-super-train-0901.jsonl
Validation:  nl2bash-super-validation-0901.jsonl

https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/
```
ng_download_dataset_from_gitlab \
    +dataset_name=nl2bash-equivalency-judge \
    +version=0.0.1 \
    +artifact_fpath=nl2bash-super-train-0901.jsonl \
    +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl
```

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
# Make `agent_name` optional in CLI rollout collection

## Summary

Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to
use `agent_ref` from each data row instead.

## Motivation

The NeMo-RL training code already respects per-row `agent_ref`, but the
Gym CLI (`ng_collect_rollouts`) required a single hardcoded
`agent_name`. This prevented multi-agent rollout collection via CLI.

## Changes

- `rollout_collection.py`: Made `agent_name` field optional with
`default=None`
- Use `config.agent_name` if specified; otherwise fall back to
`row["agent_ref"]["name"]`
- Added validation error if neither source provides an agent name

## Behavior

| Before | After |
|--------|-------|
| `+agent_name=...` required | `+agent_name=...` optional |
| All rows use same agent | Rows can use different agents via
`agent_ref` |

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The default artifact paths for the math_with_judge resource server
doesn't match the filenames for the provided dataset
(nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging
Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main).
This results in an error when attempting to download the files
automatically from Hugging Face. The artifact paths for both training
and validation need to be updated with the names as shown on Hugging
Face for proper downloading.

Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The competitive coding resource config is missing a Hugging Face
identifier which prevents it from being downloaded via Hugging Face
using the data preparation tools.

Without the HF identifier run the following:

```
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface
```

This will throw a warning:

```
Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend
```

And eventually this error:

```
Traceback (most recent call last):
  File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module>
    sys.exit(prepare_data())
             ^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data
    data_processor.run(global_config_dict)
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run
    dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics
    state = self._validate_samples_and_aggregate_metrics_single_dataset(d)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset
    for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)):
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines
    with open(dataset_config.jsonl_fpath) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl'
```

This fix will download the validation file as intended and resolve the
errors.

Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
The train and val data paths are swapped in the config. This PR updates
them.

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
# PR: Add ns_tools Resources Server

## Description

Adds a new resources server that integrates NeMo Skills tools (e.g.,
stateful Python code execution) with NeMo Gym's verification system.

**Key features:**
- Executes NeMo Skills tools via the ToolManager (e.g.,
`stateful_python_code_exec`)
- Delegates verification to other resources servers (e.g.,
`math_with_judge`)

## Verifier Delegation

The `ns_tools` server acts as a pass-through for verification. When
`verify()` is called, it delegates to the configured verifier (default:
`math_with_judge`):

```
ns_tools.verify(request)
    → POST to math_with_judge/verify
    → returns reward from math_with_judge
```

This allows using NeMo Skills tools while leveraging existing
verification infrastructure.

## Example Data Format

```json
{
  "id": "aime25-0",
  "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.",
  "expected_answer": "70",
  "verifier_type": "math_with_judge",
  "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"},
  "responses_create_params": {
    "input": [
      {"role": "user", "content": "Solve the following math problem..."}
    ],
    "tools": [{
      "type": "function",
      "name": "stateful_python_code_exec",
      "description": "Execute Python code in a stateful environment.",
      "parameters": {
        "type": "object",
        "properties": {"code": {"type": "string"}},
        "required": ["code"]
      }
    }]
  }
}
```

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary

- Adds new `math_formal_lean` resource server for Lean4 formal theorem
proving
- Implements `/verify` endpoint that compiles proofs via sandbox
container and returns reward 1.0/0.0
- Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned
prompt format
- Comprehensive test suite (31 tests)

## Components

| File | Description |
|------|-------------|
| `app.py` | Resource server with verify endpoint |
| `sandbox_client.py` | HTTP client for Lean4 sandbox |
| `proof_utils.py` | Proof extraction/building utilities |
| `prepare_minif2f.py` | Dataset preparation script |
| `README.md` | Documentation with licensing info |

## Test plan

- [x] Unit tests pass (31/31)
- [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5
samples)
- [x] Tested with gpt-5.1-codex-max model
- [x] Pre-commit lint checks pass

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Per title. This PR retains the current default of returning transitions,
but it is reasonable to change that default to match the other Gym
agents.

Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Refactoring the equivalency llm judge resource server into another
judge-based resource server. Main changes include removing regex logic
and cleaning up related configs to that.

Train data for this environment is still TBD, but a working version:
Data source: Sliced terminus prompts from different sources
train_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl`
validation_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl`
example train config:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml`

Example of env validation:

base model: early sft checkpoint of nano v3
(`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`)
Step 50 -> 21.25% on Terminal Bench Core
https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi

Next steps:
Will expand this PR with configurable verification options including
string matching, string similarity and openapi-based output schema
validation.

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Added new doc directories/article stubs for the topics identified in
0.2.0 IA. generated initial pass of structure and some starter content.
This will enable contributors to focus more on the topic itself rather
than the site build/toctree elements. **Feel free to blow away any
initial content in these pages**.

All stubbed pages have been marked with 🟡 in the toctree for easy
discovery. remove 🟡 once the page is finished.

<img width="1800" height="1009" alt="image"
src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db"
/>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Added a complete example of preparing a custom dataset for usage with
NeMo Gym. The tutorial walks through downloading a dataset from Hugging
Face or modifying from a different source, adding the
"responses_create_params" field, writing a new resource server config,
and preparing the data with "ng_prepare_data". This tutorial can be used
as a guide for taking most arbitrary text-based datasets and modifying
them to a format that is compatible with NeMo Gym for post-training.

Signed-off-by: Robert Clark <roclark@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
cmunley1 and others added 28 commits February 20, 2026 20:19
enables using environments hub envs in NeMo Gym with NeMo RL for
training.

#446

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary
- Adds `.claude/skills/add-benchmark/` with a guided workflow for
contributing new benchmarks and training environments
- Covers the full lifecycle: scaffolding, data preparation, `verify()`
implementation, YAML config, testing, reward profiling, and PR
submission
- Includes `references/patterns.md` with code templates for resource
servers, agents, Ray subprocess execution, external tool auto-install,
and dataset registry workflows
- All content is generic (no benchmark-specific references)

## Test plan
- [x] Verify skill files render correctly on GitHub
- [x] Spot-check code patterns against existing resource servers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jeff Farris <jfarris@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
## Summary
- Adds `CLAUDE.md` to provide project context for Claude Code sessions
- Covers architecture, CLI commands, configuration patterns, JSONL data
schema, benchmark contribution workflow, code style, async patterns,
external tool auto-install, and cluster gotchas
- All content is generic (no benchmark-specific references)

## Test plan
- [x] Verify CLAUDE.md renders correctly on GitHub
- [x] Spot-check CLI commands against `pyproject.toml` entry points

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jeff Farris <jfarris@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…llection

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…r handling, simplified judge prompt

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…racted answer for judge, add truncation and warmup support

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…er normalization, numeric fallback, max_steps limit

Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…l-envs

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@fsiino-nvidia fsiino-nvidia force-pushed the fsiino/pipecleaning-rl-envs branch from 3c21c09 to d4efed2 Compare February 21, 2026 04:23
…l-envs

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

# Conflicts:
#	resources_servers/math_with_judge/configs/math_with_judge.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.