Skip to content

[train][tests] Add tests for RemoteInferenceClient#1211

Open
SumanthRH wants to merge 22 commits intomainfrom
migrate-inference-engine-test
Open

[train][tests] Add tests for RemoteInferenceClient#1211
SumanthRH wants to merge 22 commits intomainfrom
migrate-inference-engine-test

Conversation

@SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Feb 25, 2026

What does this PR do?

Adds generation, error handling and chat template related tests for the new inference servers codepath . With this PR, we should have test coverage for all the features around basic full param training with the inference servers (with E2E tests on the same in progress).

Changes

  1. Error handling: Previously, if errors were raised by vllm servers, the error message was not propagated to the client because we didn't read the body. This PR makes a change so that RemoteInferenceClient will read the error message details before raising a HTTP error.
  2. Served model name support: Propagated cfg.generator.served_model_name to vLLM args for the new inference server codepath
  3. Add generation, error handling and chat template tests: Added all relevant tests from the test_inference_engine_client_http_endpoint.py for the new inference server codepath.

Open with Devin

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the RemoteInferenceClient by enhancing error handling to propagate detailed messages from vLLM servers. It also adds support for served_model_name in the vLLM CLI arguments, allowing for more flexible model identification. Crucially, new tests have been added to cover generation, error handling, and chat template functionalities for the new inference server codepath, increasing test coverage and robustness. Minor updates to the CI script and test utilities ensure these new features are properly integrated and tested.

devin-ai-integration[bot]

This comment was marked as resolved.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
router = None
server_group = None
if _SKYRL_USE_NEW_INFERENCE:
if use_new_inference_servers or (use_new_inference_servers is None and _SKYRL_USE_NEW_INFERENCE):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep the _SKYRL_USE_NEW_INFERENCE as the gate and control it in the tests?

or if you want to change this part, I would just do

if use_new_inference_servers is True: ...

Then in the caller of those other places I would do

InferenceEngineState.create(..., use_new_inference_servers=os.environ.get("_SKYRL_USE_NEW_INFERENCE", False))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are tests specifically written for the new inference codepath. I don't think it makes sense to plan for both possible values of _SKYRL_USE_NEW_INFERENCE .

That is the main reason why the tests use the new inference codepath by default and are placed in inference_servers/

model=MODEL_QWEN3,
sleep_level=1,
engine_init_kwargs={"chat_template": TEMPLATE_PATH} if use_custom_template else None,
use_new_inference_servers=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? Can we just run all the inference_servers/ tests with the env var?

Copy link
Member Author

@SumanthRH SumanthRH Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just run all the inference_servers/ tests with the env var?

inference_servers/ tests are meant to test the new codepath in any case. They are written specifically for the new codepath, so I don't think it makes sense to have any gating with the env var.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more like a nit. but either way you'd want to change these tests after this path becomes the default, because this component in particular is shared between the old and new path. So the question is which one is simpler. I think all the inference_servers/ tests should have the env var set automatically so the underlying modules change behavior based on the env var and you don't do anything special at the interfaces.

assert not ("error" in output or output.get("object", "") == "error"), f"Error in output: {output}"
assert len(outputs) == num_samples
for response_data in outputs:
if test_type == "litellm":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we need litellm test_type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the new tests. The helper here is from the older tests for inference engine client's http endpoint.

I've ported over the litellm tests as well now, which makes this codepath useful in the new test file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels irrelevant to skyrl to test this in general. As long as you test openAI compatibility, isn't litellm compatible with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

litellm supports additional sampling parameters that vllm supports, and some of our first party integrations like Harbor rely on this. So I believe it is good to have the tests here.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

Open in Devin Review

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH
Copy link
Member Author

SumanthRH commented Feb 26, 2026

I've now refactored the test file test_new_inference_generation.py into two groups of tests:

Group A: Generation and error handling tests that interact directly with the router's OpenAI-compatible endpoints via requests or LiteLLM.

We use Litellm for passing some sampling parameters that are not supported by OpenAI's chat completion client.

Group B: Generation and error handling tests that use the RemoteInferenceClient.

Both of these are important right now.

Once we make progress in the migration #845 , we can remove the generation APIs in the remote inference client and have Generators interact directly with the router's HTTP endpoints. I've added a TODO for this case.

Copy link
Collaborator

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments / questions. Thank you Sumanth!

uv run --isolated --extra dev --extra fsdp pytest tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_new_inference_generation.py -m vllm -v
"""

# TODO (sumanthrh) (RemoteInferenceClient data-plane-deprecation): Remove the tests in Group B once we migrate all generation interactions to the router's HTTP API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean SkyRLGymGenerator would post HTTP requests as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct! This would make things more natural for users to also bring in any generators



@pytest.mark.vllm
def test_error_handling(vllm_server: InferenceEngineState):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing agaainst the test_http_endpoint_error_handling in tests/backends/skyrl_train/gpu/gpu_ci/test_inference_engine_client_http_endpoint.py:

  • We guard stream==False in the old path. Do we / should we support streaming in the new codepath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason why streaming shouldn't work for the router's HTTP endpoint. Let me add a test for this to be sure!

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants