feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. by ffrujeri · Pull Request #674 · NVIDIA-NeMo/Gym

ffrujeri · 2026-02-11T22:49:34Z

What does this PR do?

Adds GenRM support via a dedicated Response API Model package. The package provides a single local variant: a GenRM model that uses a locally managed vLLM server (download model + start vLLM, e.g. via Ray). It handles GenRM-specific roles (response_1, response_2, principle) without changing the base LocalVLLMModel. Aligns with genrm_reward_model_refactoring.md.

Issues

Related to PR #523. Part of #516.

Architecture

Package layout (`responses_api_models/genrm_model/`)

genrm_model/
├── __init__.py
├── app.py
├── pyproject.toml
├── setup.py
├── README.md
├── configs/
│   └── genrm_model.yaml
└── tests/
    └── test_app.py      # 2 tests (config)

Components

GenRMModelConfig: extends LocalVLLMModelConfig with supports_principle_role.
GenRMConverter: extends VLLMConverter, overrides _format_message() for custom roles (response_1, response_2, principle).
GenRMModelMixin: provides get_converter() for GenRM; used by GenRMModel.
GenRMModel: GenRMModelMixin + LocalVLLMModel — downloads the model and starts a vLLM server (e.g. via Ray); same message formatting and config surface as the base local vLLM model, with GenRM-specific options.

Type support (`nemo_gym/openai_utils.py`)

NeMoGymChatCompletionCustomRoleMessageParam: TypedDict for non-standard roles.
Extended role literals in NeMoGymEasyInputMessage, NeMoGymMessage, and NeMoGymChatCompletionMessageParam.

Usage

Config key: genrm_model under responses_api_models, with entrypoint: app.py. Supports supports_principle_role, return_token_id_information, uses_reasoning_parser, model, vllm_serve_kwargs, vllm_serve_env_vars, etc. See configs/genrm_model.yaml.

Import:

from responses_api_models.genrm_model.app import GenRMModel, GenRMModelConfig

Request roles: user, principle, response_1, response_2. Model returns comparison output (e.g. score_1, score_2, ranking). Intended for use by the GenRM Compare Resource Server (Phase 2).

Testing

Launching a multi-node config

    genrm_model:
      responses_api_models:
        local_vllm_model:
          entrypoint: app.py
          model: <MODEL_PATH>
          uses_reasoning_parser: True
          return_token_id_information: False
          debug: true
          hf_home: null
          
          vllm_serve_env_vars:
            VLLM_RAY_DP_PACK_STRATEGY: strict
          
          vllm_serve_kwargs:
            tensor_parallel_size: 8
            data_parallel_size: 2
            data_parallel_size_local: 1
            pipeline_parallel_size: 1
            reasoning_parser: deepseek_r1
            gpu_memory_utilization: 0.85
            max_model_len: 60000
            model_loader_extra_config:
              enable_multithread_load: true
              num_threads: 108

We get a runnable server:


All 1 / 1 servers ready! Polling every 60s

####################################################################################################
#
# Server Instances
#
####################################################################################################

[1] genrm_model (responses_api_models/genrm_model)
{
    'config_path': 'genrm_model',
    'dir_path': (
        '/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/Gym/responses_api_mo'
        'dels/genrm_model'
    ),
    'entrypoint': 'app.py',
    'host': '127.0.0.1',
    'name': 'genrm_model',
    'pid': 36569,
    'port': 11093,
    'process_name': 'genrm_model',
    'server_type': 'responses_api_models',
    'url': 'http://127.0.0.1:11093',
}
####################################################################################################

And a successful response

curl -s -X POST http://127.0.0.1:11093/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      {"role": "user", "content": "What is the capital of France?", "type": "message"},
      {"role": "principle", "content": "Judge which response is better.", "type": "message"},
      {"role": "response_1", "content": "The capital of France is Paris.", "type": "message"},
      {"role": "response_2", "content": "Paris is the capital city of France.", "type": "message"}
    ],
    "temperature": 0.0,
    "max_output_tokens": 512
  }'
{"id":"resp_fa1765e1cf34473da829e6890bd6422e","created_at":1771381677.0,"error":null,"incomplete_details":{"reason":"max_output_tokens"},"instructions":null,"metadata":null,"model":"/lustre/fsw/portfolios/llmservice/users/jiaqiz/models/qwen235b_principle_comparison_genrm_step1230","object":"response","output":[{"id":"rs_36feaac12c3c48f9b1268fd594a66afa","summary":[{"text":"\nOkay, let's see. I need to evaluate these two responses from the assistants. The user asked, \"What is the capital of France?\" and both responses are pretty straightforward.\n\nFirst, looking at Response 1: \"The capital of France is Paris.\" That's a direct answer. It's clear and correct. No extra fluff, just the fact. \n\nThen Response 2: \"Paris is the capital city of France.\" Also correct. It's structured a bit differently, starting with Paris instead of the question structure. But both are accurate.\n\nNow, checking the scoring guidelines. For helpfulness, both seem to be exactly what the user asked for. The user wanted the capital, and both provided it without any errors. So maybe both should get a 5? But wait, the guidelines say 5 is \"Extremely Helpful - Completely aligned with what the user was asking for.\" Since both answers are correct and concise, they both deserve 5s. \n\nBut the comparative ranking part is tricky. The options are from 1 to 6, where 1 means Response 1 is much better, and 6 means Response 2 is much better. Since both answers are almost identical in content and correctness, the difference is minimal. The only difference is the sentence structure. Response 1 uses \"The capital of France is Paris,\" which directly mirrors the question structure. The user asked \"What is the capital of France?\" so answering with \"The capital of France is Paris\" matches the question's phrasing. Response 2 says \"Paris is the capital city of France,\" which is also correct but slightly rephrased. \n\nIn terms of helpfulness, maybe Response 1 is a tiny bit better because it directly answers the question as phrased. But is that significant enough? The scoring guidelines for comparative ranking have options like \"slightly better.\" So maybe Response 1 is slightly better (ranking 2) or they're equal. Wait, the ranking options don't have a tie. The middle option is 3: Response 1 is slightly better than Response 2? Wait no, looking back: the comparative ranking options are 1-6 where 3 is \"Response 1 is slightly better than Response 2\" and 4 is \"Response 2 is slightly better than Response 1\". Wait, no: the list says:\n\n1: R1 much better than R2\n\n2: R1 better than R2\n\n3: R1 slightly better than R2","type":"summary_text"}],"type":"reasoning","encrypted_content":null}],"parallel_tool_calls":true,"temperature":0.0,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":512,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":null,"text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null}root@gpu-659:/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/nemo-rl-internal#

copy-pr-bot · 2026-02-11T22:49:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

…ployment. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

…nch. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri changed the title ~~feat: Adds GenRM (Generative Reward Model) Response API Model with support for custom roles used in pairwise response comparison.~~ feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. Feb 12, 2026

ffrujeri force-pushed the ffrujeri/genrm-model branch from c22967f to dd172a0 Compare February 18, 2026 02:33

ffrujeri changed the base branch from main to ffrujeri/multi-node-local-vllm February 18, 2026 02:36

ffrujeri marked this pull request as ready for review February 18, 2026 02:38

ffrujeri requested a review from a team as a code owner February 18, 2026 02:38

bxyu-nvidia linked an issue Feb 18, 2026 that may be closed by this pull request

feat: Reward model support #516

Open

ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 7d3a839 to 2971f31 Compare February 18, 2026 16:58

ffrujeri added 5 commits February 18, 2026 16:58

Add GenRM Response API Model with custom role support.

45cf57c

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Refactor genrm_model into a local and a remote version of the VLLM de…

c3a443f

…ployment. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Add dependencies for the local genrm_model.

708289d

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Remove remote version of genrm for now.

cc4ea99

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

Add some patches to local_vllm_server. Rebase from the multi-node bra…

7458914

…nch. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>

ffrujeri force-pushed the ffrujeri/genrm-model branch from dd172a0 to 7458914 Compare February 18, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674
ffrujeri wants to merge 5 commits intoffrujeri/multi-node-local-vllmfrom
ffrujeri/genrm-model

ffrujeri commented Feb 11, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ffrujeri commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Issues

Architecture

Package layout (responses_api_models/genrm_model/)

Components

Type support (nemo_gym/openai_utils.py)

Usage

Testing

Uh oh!

copy-pr-bot bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ffrujeri commented Feb 11, 2026 •

edited

Loading

Package layout (`responses_api_models/genrm_model/`)

Type support (`nemo_gym/openai_utils.py`)