Skip to content

Comments

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674

Open
ffrujeri wants to merge 5 commits intoffrujeri/multi-node-local-vllmfrom
ffrujeri/genrm-model
Open

feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674
ffrujeri wants to merge 5 commits intoffrujeri/multi-node-local-vllmfrom
ffrujeri/genrm-model

Conversation

@ffrujeri
Copy link
Contributor

@ffrujeri ffrujeri commented Feb 11, 2026

What does this PR do?

Adds GenRM support via a dedicated Response API Model package. The package provides a single local variant: a GenRM model that uses a locally managed vLLM server (download model + start vLLM, e.g. via Ray). It handles GenRM-specific roles (response_1, response_2, principle) without changing the base LocalVLLMModel. Aligns with genrm_reward_model_refactoring.md.

Issues

Related to PR #523. Part of #516.

Architecture

Package layout (responses_api_models/genrm_model/)

genrm_model/
├── __init__.py
├── app.py
├── pyproject.toml
├── setup.py
├── README.md
├── configs/
│   └── genrm_model.yaml
└── tests/
    └── test_app.py      # 2 tests (config)

Components

  • GenRMModelConfig: extends LocalVLLMModelConfig with supports_principle_role.
  • GenRMConverter: extends VLLMConverter, overrides _format_message() for custom roles (response_1, response_2, principle).
  • GenRMModelMixin: provides get_converter() for GenRM; used by GenRMModel.
  • GenRMModel: GenRMModelMixin + LocalVLLMModel — downloads the model and starts a vLLM server (e.g. via Ray); same message formatting and config surface as the base local vLLM model, with GenRM-specific options.

Type support (nemo_gym/openai_utils.py)

  • NeMoGymChatCompletionCustomRoleMessageParam: TypedDict for non-standard roles.
  • Extended role literals in NeMoGymEasyInputMessage, NeMoGymMessage, and NeMoGymChatCompletionMessageParam.

Usage

Config key: genrm_model under responses_api_models, with entrypoint: app.py. Supports supports_principle_role, return_token_id_information, uses_reasoning_parser, model, vllm_serve_kwargs, vllm_serve_env_vars, etc. See configs/genrm_model.yaml.

Import:

from responses_api_models.genrm_model.app import GenRMModel, GenRMModelConfig

Request roles: user, principle, response_1, response_2. Model returns comparison output (e.g. score_1, score_2, ranking). Intended for use by the GenRM Compare Resource Server (Phase 2).

Testing

  • Launching a multi-node config
    genrm_model:
      responses_api_models:
        local_vllm_model:
          entrypoint: app.py
          model: <MODEL_PATH>
          uses_reasoning_parser: True
          return_token_id_information: False
          debug: true
          hf_home: null
          
          vllm_serve_env_vars:
            VLLM_RAY_DP_PACK_STRATEGY: strict
          
          vllm_serve_kwargs:
            tensor_parallel_size: 8
            data_parallel_size: 2
            data_parallel_size_local: 1
            pipeline_parallel_size: 1
            reasoning_parser: deepseek_r1
            gpu_memory_utilization: 0.85
            max_model_len: 60000
            model_loader_extra_config:
              enable_multithread_load: true
              num_threads: 108

We get a runnable server:


All 1 / 1 servers ready! Polling every 60s

####################################################################################################
#
# Server Instances
#
####################################################################################################

[1] genrm_model (responses_api_models/genrm_model)
{
    'config_path': 'genrm_model',
    'dir_path': (
        '/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/Gym/responses_api_mo'
        'dels/genrm_model'
    ),
    'entrypoint': 'app.py',
    'host': '127.0.0.1',
    'name': 'genrm_model',
    'pid': 36569,
    'port': 11093,
    'process_name': 'genrm_model',
    'server_type': 'responses_api_models',
    'url': 'http://127.0.0.1:11093',
}
####################################################################################################

And a successful response

curl -s -X POST http://127.0.0.1:11093/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      {"role": "user", "content": "What is the capital of France?", "type": "message"},
      {"role": "principle", "content": "Judge which response is better.", "type": "message"},
      {"role": "response_1", "content": "The capital of France is Paris.", "type": "message"},
      {"role": "response_2", "content": "Paris is the capital city of France.", "type": "message"}
    ],
    "temperature": 0.0,
    "max_output_tokens": 512
  }'
{"id":"resp_fa1765e1cf34473da829e6890bd6422e","created_at":1771381677.0,"error":null,"incomplete_details":{"reason":"max_output_tokens"},"instructions":null,"metadata":null,"model":"/lustre/fsw/portfolios/llmservice/users/jiaqiz/models/qwen235b_principle_comparison_genrm_step1230","object":"response","output":[{"id":"rs_36feaac12c3c48f9b1268fd594a66afa","summary":[{"text":"\nOkay, let's see. I need to evaluate these two responses from the assistants. The user asked, \"What is the capital of France?\" and both responses are pretty straightforward.\n\nFirst, looking at Response 1: \"The capital of France is Paris.\" That's a direct answer. It's clear and correct. No extra fluff, just the fact. \n\nThen Response 2: \"Paris is the capital city of France.\" Also correct. It's structured a bit differently, starting with Paris instead of the question structure. But both are accurate.\n\nNow, checking the scoring guidelines. For helpfulness, both seem to be exactly what the user asked for. The user wanted the capital, and both provided it without any errors. So maybe both should get a 5? But wait, the guidelines say 5 is \"Extremely Helpful - Completely aligned with what the user was asking for.\" Since both answers are correct and concise, they both deserve 5s. \n\nBut the comparative ranking part is tricky. The options are from 1 to 6, where 1 means Response 1 is much better, and 6 means Response 2 is much better. Since both answers are almost identical in content and correctness, the difference is minimal. The only difference is the sentence structure. Response 1 uses \"The capital of France is Paris,\" which directly mirrors the question structure. The user asked \"What is the capital of France?\" so answering with \"The capital of France is Paris\" matches the question's phrasing. Response 2 says \"Paris is the capital city of France,\" which is also correct but slightly rephrased. \n\nIn terms of helpfulness, maybe Response 1 is a tiny bit better because it directly answers the question as phrased. But is that significant enough? The scoring guidelines for comparative ranking have options like \"slightly better.\" So maybe Response 1 is slightly better (ranking 2) or they're equal. Wait, the ranking options don't have a tie. The middle option is 3: Response 1 is slightly better than Response 2? Wait no, looking back: the comparative ranking options are 1-6 where 3 is \"Response 1 is slightly better than Response 2\" and 4 is \"Response 2 is slightly better than Response 1\". Wait, no: the list says:\n\n1: R1 much better than R2\n\n2: R1 better than R2\n\n3: R1 slightly better than R2","type":"summary_text"}],"type":"reasoning","encrypted_content":null}],"parallel_tool_calls":true,"temperature":0.0,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"conversation":null,"max_output_tokens":512,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":null,"text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null}root@gpu-659:/scratch/fsw/portfolios/llmservice/projects/llmservice_modelalignment_ppo/users/ffrujeri/nemo-rl-internal#

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ffrujeri ffrujeri changed the title feat: Adds GenRM (Generative Reward Model) Response API Model with support for custom roles used in pairwise response comparison. feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. Feb 12, 2026
@ffrujeri ffrujeri force-pushed the ffrujeri/genrm-model branch from c22967f to dd172a0 Compare February 18, 2026 02:33
@ffrujeri ffrujeri changed the base branch from main to ffrujeri/multi-node-local-vllm February 18, 2026 02:36
@ffrujeri ffrujeri marked this pull request as ready for review February 18, 2026 02:38
@ffrujeri ffrujeri requested a review from a team as a code owner February 18, 2026 02:38
@bxyu-nvidia bxyu-nvidia linked an issue Feb 18, 2026 that may be closed by this pull request
@ffrujeri ffrujeri force-pushed the ffrujeri/multi-node-local-vllm branch from 7d3a839 to 2971f31 Compare February 18, 2026 16:58
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
…ployment.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
…nch.

Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
@ffrujeri ffrujeri force-pushed the ffrujeri/genrm-model branch from dd172a0 to 7458914 Compare February 18, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Reward model support

1 participant