Conversation
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses two separate issues: a failing documentation build and a GPU out-of-memory error in the CI tests. The documentation fix correctly escapes special characters in an MDX file to prevent parsing errors. The OOM fix reduces the gpu_memory_utilization for vLLM in a specific test, which is a reasonable approach to make the test pass on CI machines with less GPU memory. The changes are logical and well-explained. I have one minor suggestion to improve code readability by replacing a magic number with a named constant.
| cfg.generator.inference_engine.run_engines_locally = True | ||
| # NOTE: We reduce the gpu memory used by vLLM because of the colocated tests | ||
| # that can OOM on L4s. For more details, see: https://github.com/NovaSky-AI/SkyRL/pull/1221 | ||
| cfg.generator.inference_engine.gpu_memory_utilization = 0.7 |
|
Added another fix in this PR:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <skyrl_gym.envs.search.env.SearchEnv object at 0x7755712f7aa0>
action = 'In <think> <information>Francis William Aston, Edward Mills Purcell, and William Bo推un that the first Nobel Prize in ...mil von Behring and Robert Koch.</information> </think>\n\nTherefore, the answer is <answer>Emil von Behring</answer>.'
def _validate_action(self, action: str):
stop_tags = ["</search>", "</answer>"]
# TODO (sumanthrh): This assertion should really be that the *last token* generated contains <answer>.
# The last token generated can have additional punctuation characters like periods, etc.
action = action.rstrip("\n") # strip out any trailing newlines and periods
for tag in stop_tags:
if tag in action:
> assert action.split(tag, 1)[1] == "", (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
f"{tag} detected in the response but it is not the last string generated. "
f"Use {stop_tags} as stop strings in the configuration."
)
E AssertionError: </answer> detected in the response but it is not the last string generated. Use ['</search>', '</answer>'] as stop strings in the configuration.
skyrl-gym/skyrl_gym/envs/search/env.py:75: AssertionErrorThe root cause is that there is a full-stop generated by the Qwen model in this case. The last token is I've added a fix to the validation code for now. In the future, the assertion should be more relaxed |
What does this PR do?
Fixes both the failing CI pipelines: the docs workflow
and the failing GPU tests
Technically these can be separated into two separate PRs, but the scope of the changes is small so I'm combining both fixes here
Docs fix
This was a simple issue of an escaped
>symbol in the mdx file. Mdx combines MD + TX and we need to escape>symbolsGPU test OOM
test_policy_local_engines_e2e.pywas failing for_SKYRL_USE_NEW_INFERENCE=1on CI (4xL4s) with a GPU OOM during weight sync:Only the tests for colocated engine and trainer failed.
When I added the test, I had only tested on H100s, which explained why this was seen.
Surprisingly, the new inference codepath used the exact same inference engine configuration as the old codepath, for which tests were passing without OOM. The only difference was that the old codepath performs a redundant
sleep+wake_upafter engine initialization:SkyRL/skyrl-train/tests/gpu/utils.py
Lines 513 to 515 in b041937
When I added the same
sleep+wake_upcalls to the new inference codepath, the tests passed!I looked into the memory usage differences on H100s: adding the redundant sleep + wakeup calls led to a ~ 6GB saving of memory. I believe this is related to clearing cuda cache memory that would be used temporarily during init.
The Fix
For the test itself, the fix would be to simply reduce
gpu_memory_utilizationparameter passed to vLLM so that it is appropriate for L4s. I would prefer this over adding redundantsleep+wake_upwhich is more brittle. I've reduced the gpu memory utiliization to 0.7 and confirmed that tests pass on 4xL4 with this change.