Skip to content

Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188

Open
ebronstein wants to merge 2 commits intoNovaSky-AI:mainfrom
ebronstein:dummy_lora_path
Open

Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188
ebronstein wants to merge 2 commits intoNovaSky-AI:mainfrom
ebronstein:dummy_lora_path

Conversation

@ebronstein
Copy link
Contributor

@ebronstein ebronstein commented Feb 20, 2026

Problem description

When running fully async training with LoRA, vLLM sometimes crashes with:

FileNotFoundError: [Errno 2] No such file or directory: '/dummy_lora_path/adapter_config.json'

The error originates in vLLM's LRUCacheWorkerLoRAManager.add_adapter() (in vllm/lora/worker_manager.py), which is called during generation when the worker tries to activate a LoRA adapter for an incoming request. The config uses max_loras=1 (set in create_ray_wrapped_inference_engines_from_config in main_base.py), meaning the worker's LRUCacheWorkerLoRAManager can only hold one adapter at a time.

My understanding is that when using async training, a generation request may arrive when the LoRA adapter cache has been evicted and the new adapter hasn't been loaded yet (e.g., during weight sync). This cache miss makes LRUCacheWorkerLoRAManager.add_adapter() falls back to loading from the lora_path in the LoRARequest, which is "dummy_lora_path".

Proposed fix

This PR saves the last used LoRA path and uses that as the default instead of "dummy_lora_path". The LoRA adapter weights are saved to a persistent directory on disk during each weight sync, so the path should remain valid throughout training.

Full stack trace

[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Process EngineCore_0:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Traceback (most recent call last):
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 103, in _load_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     peft_helper = PEFTHelper.from_local_dir(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/peft_helper.py", line 107, in from_local_dir
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     with open(lora_config_path) as f:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m          ^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m FileNotFoundError: [Errno 2] No such file or directory: '/dummy_lora_path/adapter_config.json'
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m 
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m The above exception was the direct cause of the following exception:
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m 
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m Traceback (most recent call last):
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.run()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self._target(*self._args, **self._kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 704, in run_engine_core
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise e
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 693, in run_engine_core
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     engine_core.run_busy_loop()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self._process_engine_step()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     outputs, model_executed = self.step_fn()
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                               ^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 288, in step
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     model_output = self.execute_model_with_error_logging(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise err
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return model_fn(scheduler_output)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     output = self.collective_rpc("execute_model",
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     answer = run_method(self.driver_worker, method, args, kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3007, in run_method
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     output = self.model_runner.execute_model(scheduler_output,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return func(*args, **kwargs)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1522, in execute_model
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     max_query_len) = (self._prepare_inputs(scheduler_output))
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 939, in _prepare_inputs
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.set_active_loras(self.input_batch, num_scheduled_tokens)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 84, in set_active_loras
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     return self._set_active_loras(prompt_lora_mapping, token_lora_mapping,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/v1/worker/lora_model_runner_mixin.py", line 73, in _set_active_loras
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 167, in set_active_adapters
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     set_active_adapters_worker(requests, mapping, self._apply_adapters,
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/adapter_commons/utils.py", line 55, in set_active_adapters_worker
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     apply_adapters_func(requests)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 227, in _apply_adapters
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     self.add_adapter(lora)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 240, in add_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     lora = self._load_adapter(lora_request)
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m   File "/nas/ucb/ebronstein/venvs/code-assistant/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 136, in _load_adapter
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m     raise ValueError(
[36m(AsyncVLLMInferenceEngine pid=1010235)[0m [1;36m(EngineCore_0 pid=1010386)[0;0m ValueError: Loading lora 549119024 failed: No adapter found for /dummy_lora_path

Open with Devin

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a FileNotFoundError during asynchronous LoRA training by using the last known LoRA path as a fallback. However, a security audit identified two high-severity Path Traversal vulnerabilities in skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py. These vulnerabilities arise from the use of an unsanitized lora_path variable when loading LoRA adapters, which could allow an attacker to read arbitrary files from the local filesystem. It is recommended to sanitize lora_path to ensure it resolves within an expected directory. Additionally, there's a high-priority blocking call within an asynchronous method that needs to be addressed to prevent performance issues, and a medium-priority suggestion to refactor duplicated logic for better maintainability.

Comment on lines 270 to +272
lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path)
result = self.llm.llm_engine.add_lora(lora_request)
self._last_lora_path = lora_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This section introduces a high-severity Path Traversal vulnerability. The lora_path parameter, used in _load_lora_from_disk and passed to LoRARequest and self.llm.llm_engine.add_lora, is unsanitized. This allows an attacker to access arbitrary files via malicious paths (e.g., ../../../../etc/passwd). The newly added line self._last_lora_path = lora_path also propagates this tainted path. Furthermore, the synchronous call self.llm.llm_engine.add_lora(lora_request) blocks the event loop within this async method, which can lead to performance issues. It is crucial to sanitize lora_path and ensure the add_lora call is non-blocking.

Comment on lines 404 to +406
lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path)
result = await self.llm.add_lora(lora_request)
self._last_lora_path = lora_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The lora_path parameter in the _load_lora_from_disk function is used to construct a file path for loading LoRA adapters without proper sanitization. An attacker could provide a malicious path (e.g., ../../../../etc/passwd) to access arbitrary files on the filesystem. This is a path traversal vulnerability. The vulnerable code is in the _load_lora_from_disk method, where lora_path is passed to LoRARequest and then used by self.llm.add_lora. The newly added line self._last_lora_path = lora_path also propagates this tainted path.

# dummy_lora_path for placeholder (actual loading done in add_lora())
# Use last loaded LoRA path or a dummy path for placeholder
# (actual loading done in add_lora())
lora_path = self._last_lora_path or "/dummy_lora_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, consider extracting this logic into a helper method in the BaseVLLMInferenceEngine class. The magic string "/dummy_lora_path" and the fallback logic are also used in AsyncVLLMInferenceEngine._collect_outputs.

For example, you could add a method to BaseVLLMInferenceEngine:

_DUMMY_LORA_PATH = "/dummy_lora_path"

def _get_lora_path_for_request(self) -> str:
    return self._last_lora_path or self._DUMMY_LORA_PATH

Then you can call self._get_lora_path_for_request() here and in the other location to centralize the logic.

@SumanthRH SumanthRH self-assigned this Feb 26, 2026
@SumanthRH
Copy link
Member

Hi @ebronstein. Thanks for the PR!

We are currently working on completing the migration to the new skyrl/ package: #1145

We will get to this PR right after! (est ~ a few days)

In the meantime, it would be good if you could port your PR to use the new skyrl/ package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants