feat(workspaces): add opinionated llm-inference workspace#176
Merged
Conversation
ec23cb7 to
7aa45fe
Compare
…erence configurator
bff3596 to
cd0a266
Compare
Contributor
|
I've rebased the branch and directly fixed some minor issues. Please have a look at my commits and see if that's how you would expect it. We could discuss to add a short interactive flow to e.g. ask the user for the huggingface token and model name instead of requiring it (similar to how it's done for the marimo or jupyter workspaces) but otherwise I would approve this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-integrate LLM Inference Workspace (llm-d based, experimental)
Summary
This PR re-integrates the LLM inference workspace, now built on top of the llm-d reference stack.
The feature should be considered experimental.
Compared to other “exalsius-native” workspaces, the setup is not yet as streamlined, but it already enables Exalsius members to deploy and serve LLMs directly inside a workspace.
The goal is to provide an early, practical way to run model inference workloads while we iterate on UX and integration.
What’s included
Workspace template integration
LLM_DNew configurator
LLMInferenceConfiguratorhf://<repo>/<model>)= num_gpus)CLI command
New command:
Options:
--huggingface-token--model-name--num-gpus--wait-for-readyIncludes:
<repo>/<model>)Example usage
exls workspaces deploy llm-inference \ --model-name Qwen/Qwen3-1.7B \ --hf-token $HF_TOKEN \ --num-gpus 2 \ --wait-for-readyBehavior
The configurator automatically translates inputs into template variables:
This removes most manual editing from the deployment process.
Helm chart
This workspace integrates the existing chart:
https://github.com/exalsius/exalsius-workspace-hub/tree/main/workspace-templates/llm-inference/llm-d-model
Notes