[train][tests] Add tests for RemoteInferenceClient#1211
[train][tests] Add tests for RemoteInferenceClient#1211
RemoteInferenceClient#1211Conversation
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the RemoteInferenceClient by enhancing error handling to propagate detailed messages from vLLM servers. It also adds support for served_model_name in the vLLM CLI arguments, allowing for more flexible model identification. Crucially, new tests have been added to cover generation, error handling, and chat template functionalities for the new inference server codepath, increasing test coverage and robustness. Minor updates to the CI script and test utilities ensure these new features are properly integrated and tested.
...kends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_chat_template.py
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py
Outdated
Show resolved
Hide resolved
| router = None | ||
| server_group = None | ||
| if _SKYRL_USE_NEW_INFERENCE: | ||
| if use_new_inference_servers or (use_new_inference_servers is None and _SKYRL_USE_NEW_INFERENCE): |
There was a problem hiding this comment.
why not keep the _SKYRL_USE_NEW_INFERENCE as the gate and control it in the tests?
or if you want to change this part, I would just do
if use_new_inference_servers is True: ...
Then in the caller of those other places I would do
InferenceEngineState.create(..., use_new_inference_servers=os.environ.get("_SKYRL_USE_NEW_INFERENCE", False))
There was a problem hiding this comment.
These are tests specifically written for the new inference codepath. I don't think it makes sense to plan for both possible values of _SKYRL_USE_NEW_INFERENCE .
That is the main reason why the tests use the new inference codepath by default and are placed in inference_servers/
| model=MODEL_QWEN3, | ||
| sleep_level=1, | ||
| engine_init_kwargs={"chat_template": TEMPLATE_PATH} if use_custom_template else None, | ||
| use_new_inference_servers=True, |
There was a problem hiding this comment.
do we need this? Can we just run all the inference_servers/ tests with the env var?
There was a problem hiding this comment.
Can we just run all the inference_servers/ tests with the env var?
inference_servers/ tests are meant to test the new codepath in any case. They are written specifically for the new codepath, so I don't think it makes sense to have any gating with the env var.
There was a problem hiding this comment.
it's more like a nit. but either way you'd want to change these tests after this path becomes the default, because this component in particular is shared between the old and new path. So the question is which one is simpler. I think all the inference_servers/ tests should have the env var set automatically so the underlying modules change behavior based on the env var and you don't do anything special at the interfaces.
...kends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_chat_template.py
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
...kends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_chat_template.py
Outdated
Show resolved
Hide resolved
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
| assert not ("error" in output or output.get("object", "") == "error"), f"Error in output: {output}" | ||
| assert len(outputs) == num_samples | ||
| for response_data in outputs: | ||
| if test_type == "litellm": |
There was a problem hiding this comment.
when do we need litellm test_type?
There was a problem hiding this comment.
Please see the new tests. The helper here is from the older tests for inference engine client's http endpoint.
I've ported over the litellm tests as well now, which makes this codepath useful in the new test file
There was a problem hiding this comment.
it feels irrelevant to skyrl to test this in general. As long as you test openAI compatibility, isn't litellm compatible with that?
There was a problem hiding this comment.
litellm supports additional sampling parameters that vllm supports, and some of our first party integrations like Harbor rely on this. So I believe it is good to have the tests here.
...backends/skyrl_train/gpu/gpu_ci/inference_servers/test_remote_inference_client_generation.py
Outdated
Show resolved
Hide resolved
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
I've now refactored the test file Group A: Generation and error handling tests that interact directly with the router's OpenAI-compatible endpoints via We use Litellm for passing some sampling parameters that are not supported by OpenAI's chat completion client. Group B: Generation and error handling tests that use the Both of these are important right now. Once we make progress in the migration #845 , we can remove the generation APIs in the remote inference client and have Generators interact directly with the router's HTTP endpoints. I've added a TODO for this case. |
CharlieFRuan
left a comment
There was a problem hiding this comment.
Some minor comments / questions. Thank you Sumanth!
| uv run --isolated --extra dev --extra fsdp pytest tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_new_inference_generation.py -m vllm -v | ||
| """ | ||
|
|
||
| # TODO (sumanthrh) (RemoteInferenceClient data-plane-deprecation): Remove the tests in Group B once we migrate all generation interactions to the router's HTTP API. |
There was a problem hiding this comment.
Does this mean SkyRLGymGenerator would post HTTP requests as well?
There was a problem hiding this comment.
Yes that is correct! This would make things more natural for users to also bring in any generators
|
|
||
|
|
||
| @pytest.mark.vllm | ||
| def test_error_handling(vllm_server: InferenceEngineState): |
There was a problem hiding this comment.
Comparing agaainst the test_http_endpoint_error_handling in tests/backends/skyrl_train/gpu/gpu_ci/test_inference_engine_client_http_endpoint.py:
- We guard
stream==Falsein the old path. Do we / should we support streaming in the new codepath?
There was a problem hiding this comment.
There's no reason why streaming shouldn't work for the router's HTTP endpoint. Let me add a test for this to be sure!
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
What does this PR do?
Adds generation, error handling and chat template related tests for the new inference servers codepath . With this PR, we should have test coverage for all the features around basic full param training with the inference servers (with E2E tests on the same in progress).
Changes
RemoteInferenceClientwill read the error message details before raising a HTTP error.cfg.generator.served_model_nameto vLLM args for the new inference server codepathtest_inference_engine_client_http_endpoint.pyfor the new inference server codepath.