Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188
Use the last LoRA path in the vLLM inference engine instead of "dummy_lora_path"#1188ebronstein wants to merge 2 commits intoNovaSky-AI:mainfrom
Conversation
…tead of "dummy_lora_path".
There was a problem hiding this comment.
Code Review
This pull request aims to fix a FileNotFoundError during asynchronous LoRA training by using the last known LoRA path as a fallback. However, a security audit identified two high-severity Path Traversal vulnerabilities in skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py. These vulnerabilities arise from the use of an unsanitized lora_path variable when loading LoRA adapters, which could allow an attacker to read arbitrary files from the local filesystem. It is recommended to sanitize lora_path to ensure it resolves within an expected directory. Additionally, there's a high-priority blocking call within an asynchronous method that needs to be addressed to prevent performance issues, and a medium-priority suggestion to refactor duplicated logic for better maintainability.
| lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path) | ||
| result = self.llm.llm_engine.add_lora(lora_request) | ||
| self._last_lora_path = lora_path |
There was a problem hiding this comment.
This section introduces a high-severity Path Traversal vulnerability. The lora_path parameter, used in _load_lora_from_disk and passed to LoRARequest and self.llm.llm_engine.add_lora, is unsanitized. This allows an attacker to access arbitrary files via malicious paths (e.g., ../../../../etc/passwd). The newly added line self._last_lora_path = lora_path also propagates this tainted path. Furthermore, the synchronous call self.llm.llm_engine.add_lora(lora_request) blocks the event loop within this async method, which can lead to performance issues. It is crucial to sanitize lora_path and ensure the add_lora call is non-blocking.
| lora_request = LoRARequest(lora_name=f"{lora_id}", lora_int_id=lora_id, lora_path=lora_path) | ||
| result = await self.llm.add_lora(lora_request) | ||
| self._last_lora_path = lora_path |
There was a problem hiding this comment.
The lora_path parameter in the _load_lora_from_disk function is used to construct a file path for loading LoRA adapters without proper sanitization. An attacker could provide a malicious path (e.g., ../../../../etc/passwd) to access arbitrary files on the filesystem. This is a path traversal vulnerability. The vulnerable code is in the _load_lora_from_disk method, where lora_path is passed to LoRARequest and then used by self.llm.add_lora. The newly added line self._last_lora_path = lora_path also propagates this tainted path.
| # dummy_lora_path for placeholder (actual loading done in add_lora()) | ||
| # Use last loaded LoRA path or a dummy path for placeholder | ||
| # (actual loading done in add_lora()) | ||
| lora_path = self._last_lora_path or "/dummy_lora_path" |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, consider extracting this logic into a helper method in the BaseVLLMInferenceEngine class. The magic string "/dummy_lora_path" and the fallback logic are also used in AsyncVLLMInferenceEngine._collect_outputs.
For example, you could add a method to BaseVLLMInferenceEngine:
_DUMMY_LORA_PATH = "/dummy_lora_path"
def _get_lora_path_for_request(self) -> str:
return self._last_lora_path or self._DUMMY_LORA_PATHThen you can call self._get_lora_path_for_request() here and in the other location to centralize the logic.
|
Hi @ebronstein. Thanks for the PR! We are currently working on completing the migration to the new We will get to this PR right after! (est ~ a few days) In the meantime, it would be good if you could port your PR to use the new |
Problem description
When running fully async training with LoRA, vLLM sometimes crashes with:
The error originates in vLLM's
LRUCacheWorkerLoRAManager.add_adapter()(invllm/lora/worker_manager.py), which is called during generation when the worker tries to activate a LoRA adapter for an incoming request. The config usesmax_loras=1(set increate_ray_wrapped_inference_engines_from_configinmain_base.py), meaning the worker'sLRUCacheWorkerLoRAManagercan only hold one adapter at a time.My understanding is that when using async training, a generation request may arrive when the LoRA adapter cache has been evicted and the new adapter hasn't been loaded yet (e.g., during weight sync). This cache miss makes
LRUCacheWorkerLoRAManager.add_adapter()falls back to loading from thelora_pathin theLoRARequest, which is"dummy_lora_path".Proposed fix
This PR saves the last used LoRA path and uses that as the default instead of
"dummy_lora_path". The LoRA adapter weights are saved to a persistent directory on disk during each weight sync, so the path should remain valid throughout training.Full stack trace