feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison.#674
Open
ffrujeri wants to merge 5 commits intoffrujeri/multi-node-local-vllmfrom
Open
Conversation
c22967f to
dd172a0
Compare
7d3a839 to
2971f31
Compare
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
…ployment. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
…nch. Signed-off-by: Felipe Vieira Frujeri <ffrujeri@nvidia.com>
dd172a0 to
7458914
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds GenRM support via a dedicated Response API Model package. The package provides a single local variant: a GenRM model that uses a locally managed vLLM server (download model + start vLLM, e.g. via Ray). It handles GenRM-specific roles (
response_1,response_2,principle) without changing the base LocalVLLMModel. Aligns with genrm_reward_model_refactoring.md.Issues
Related to PR #523. Part of #516.
Architecture
Package layout (
responses_api_models/genrm_model/)Components
LocalVLLMModelConfigwithsupports_principle_role.VLLMConverter, overrides_format_message()for custom roles (response_1,response_2,principle).get_converter()for GenRM; used byGenRMModel.GenRMModelMixin+LocalVLLMModel— downloads the model and starts a vLLM server (e.g. via Ray); same message formatting and config surface as the base local vLLM model, with GenRM-specific options.Type support (
nemo_gym/openai_utils.py)NeMoGymEasyInputMessage,NeMoGymMessage, andNeMoGymChatCompletionMessageParam.Usage
Config key:
genrm_modelunderresponses_api_models, withentrypoint: app.py. Supportssupports_principle_role,return_token_id_information,uses_reasoning_parser,model,vllm_serve_kwargs,vllm_serve_env_vars, etc. Seeconfigs/genrm_model.yaml.Import:
Request roles:
user,principle,response_1,response_2. Model returns comparison output (e.g.score_1,score_2,ranking). Intended for use by the GenRM Compare Resource Server (Phase 2).Testing
We get a runnable server:
And a successful response