Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path" by ebronstein · Pull Request #1188 · NovaSky-AI/SkyRL

ebronstein · 2026-02-20T17:47:51Z

Problem description

When running fully async training with LoRA, vLLM sometimes crashes with:

FileNotFoundError: [Errno 2] No such file or directory: '/dummy_lora_path/adapter_config.json'

The error originates in vLLM's LRUCacheWorkerLoRAManager.add_adapter() (in vllm/lora/worker_manager.py), which is called during generation when the worker tries to activate a LoRA adapter for an incoming request. The config uses max_loras=1 (set in create_ray_wrapped_inference_engines_from_config in main_base.py), meaning the worker's LRUCacheWorkerLoRAManager can only hold one adapter at a time.

My understanding is that when using async training, a generation request may arrive when the LoRA adapter cache has been evicted and the new adapter hasn't been loaded yet (e.g., during weight sync). This cache miss makes LRUCacheWorkerLoRAManager.add_adapter() falls back to loading from the lora_path in the LoRARequest, which is "dummy_lora_path".

Proposed fix

This PR saves the last used LoRA path and uses that as the default instead of "dummy_lora_path". The LoRA adapter weights are saved to a persistent directory on disk during each weight sync, so the path should remain valid throughout training.

Full stack trace

[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Process EngineCore_0:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Traceback (most recent call last):
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 103, in _load_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     peft_helper = PEFTHelper.from_local_dir(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/peft_helper.py", line 107, in from_local_dir
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     with open(lora_config_path) as f:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m          ^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m FileNotFoundError: [Errno 2] No such file or directory: '/dummy_lora_path/adapter_config.json'
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m 
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m The above exception was the direct cause of the following exception:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m 
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Traceback (most recent call last):
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.run()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self._target(*self._args, **self._kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise e
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 693, in run_engine_core
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     engine_core.run_busy_loop()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self._process_engine_step()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     outputs, model_executed = self.step_fn()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                               ^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 288, in step
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     model_output = self.execute_model_with_error_logging(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise err
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return model_fn(scheduler_output)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     output = self.collective_rpc("execute_model",
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     answer = run_method(self.driver_worker, method, args, kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     output = self.model_runner.execute_model(scheduler_output,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1522, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     max_query_len) = (self._prepare_inputs(scheduler_output))
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 939, in _prepare_inputs
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.set_active_loras(self.input_batch, num_scheduled_tokens)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 84, in set_active_loras
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return self._set_active_loras(prompt_lora_mapping, token_lora_mapping,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 73, in _set_active_loras
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 167, in set_active_adapters
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     set_active_adapters_worker(requests, mapping, self._apply_adapters,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/adapter_commons/utils.py", line 55, in set_active_adapters_worker
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     apply_adapters_func(requests)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in _apply_adapters
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.add_adapter(lora)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 240, in add_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     lora = self._load_adapter(lora_request)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 136, in _load_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise ValueError(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m ValueError: Loading lora 549119024 failed: No adapter found for /dummy_lora_path

…tead of "dummy_lora_path".

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

gemini-code-assist

Code Review

This pull request aims to fix a FileNotFoundError during asynchronous LoRA training by using the last known LoRA path as a fallback. However, a security audit identified two high-severity Path Traversal vulnerabilities in skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py. These vulnerabilities arise from the use of an unsanitized lora_path variable when loading LoRA adapters, which could allow an attacker to read arbitrary files from the local filesystem. It is recommended to sanitize lora_path to ensure it resolves within an expected directory. Additionally, there's a high-priority blocking call within an asynchronous method that needs to be addressed to prevent performance issues, and a medium-priority suggestion to refactor duplicated logic for better maintainability.

gemini-code-assist · 2026-02-20T17:53:04Z

skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py

        lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path)
        result = self.llm.llm_engine.add_lora(lora_request)
+        self._last_lora_path = lora_path


This section introduces a high-severity Path Traversal vulnerability. The lora_path parameter, used in _load_lora_from_disk and passed to LoRARequest and self.llm.llm_engine.add_lora, is unsanitized. This allows an attacker to access arbitrary files via malicious paths (e.g., ../../../../etc/passwd). The newly added line self._last_lora_path = lora_path also propagates this tainted path. Furthermore, the synchronous call self.llm.llm_engine.add_lora(lora_request) blocks the event loop within this async method, which can lead to performance issues. It is crucial to sanitize lora_path and ensure the add_lora call is non-blocking.

gemini-code-assist · 2026-02-20T17:53:04Z

skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py

        lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path)
        result = await self.llm.add_lora(lora_request)
+        self._last_lora_path = lora_path


The lora_path parameter in the _load_lora_from_disk function is used to construct a file path for loading LoRA adapters without proper sanitization. An attacker could provide a malicious path (e.g., ../../../../etc/passwd) to access arbitrary files on the filesystem. This is a path traversal vulnerability. The vulnerable code is in the _load_lora_from_disk method, where lora_path is passed to LoRARequest and then used by self.llm.add_lora. The newly added line self._last_lora_path = lora_path also propagates this tainted path.

gemini-code-assist · 2026-02-20T17:53:04Z

skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py

-                # dummy_lora_path for placeholder (actual loading done in add_lora())
+                # Use last loaded LoRA path or a dummy path for placeholder
+                # (actual loading done in add_lora())
+                lora_path = self._last_lora_path or "/dummy_lora_path"


To improve maintainability and reduce code duplication, consider extracting this logic into a helper method in the BaseVLLMInferenceEngine class. The magic string "/dummy_lora_path" and the fallback logic are also used in AsyncVLLMInferenceEngine._collect_outputs.

For example, you could add a method to BaseVLLMInferenceEngine:

_DUMMY_LORA_PATH = "/dummy_lora_path" def _get_lora_path_for_request(self) -> str: return self._last_lora_path or self._DUMMY_LORA_PATH

Then you can call self._get_lora_path_for_request() here and in the other location to centralize the logic.

SumanthRH · 2026-02-26T22:56:55Z

Hi @ebronstein. Thanks for the PR!

We are currently working on completing the migration to the new skyrl/ package: #1145

We will get to this PR right after! (est ~ a few days)

In the meantime, it would be good if you could port your PR to use the new skyrl/ package!

Use the last LoRA path in the vLLM inference engine if one exists ins…

16e2eba

…tead of "dummy_lora_path".

devin-ai-integration bot reviewed Feb 20, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

Run black formatter.

d6255ce

SumanthRH self-assigned this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188

Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188
ebronstein wants to merge 2 commits intoNovaSky-AI:mainfrom
ebronstein:dummy_lora_path

ebronstein commented Feb 20, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

SumanthRH commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ebronstein commented Feb 20, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

Proposed fix

Full stack trace

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ebronstein commented Feb 20, 2026 •

edited by devin-ai-integration bot

Loading