Skip to content

feat(workspaces): add opinionated llm-inference workspace#176

Merged
mcd01 merged 13 commits intomainfrom
feat/llm_inference_workspace
Feb 13, 2026
Merged

feat(workspaces): add opinionated llm-inference workspace#176
mcd01 merged 13 commits intomainfrom
feat/llm_inference_workspace

Conversation

@mcd01
Copy link
Contributor

@mcd01 mcd01 commented Feb 2, 2026

Re-integrate LLM Inference Workspace (llm-d based, experimental)

Summary

This PR re-integrates the LLM inference workspace, now built on top of the llm-d reference stack.

The feature should be considered experimental.
Compared to other “exalsius-native” workspaces, the setup is not yet as streamlined, but it already enables Exalsius members to deploy and serve LLMs directly inside a workspace.

The goal is to provide an early, practical way to run model inference workloads while we iterate on UX and integration.


What’s included

Workspace template integration

  • Add new integrated template: LLM_D
  • Add default editing comments for the template
  • Register template alongside existing ones (Jupyter, Marimo, Dev Pod, Dist Training)

New configurator

  • Introduce LLMInferenceConfigurator
  • Automatically configures:
    • Hugging Face token secret
    • Model URI (hf://<repo>/<model>)
    • Model labels
    • InferencePool label matching
    • Tensor parallelism (= num_gpus)
    • Accelerator type (AMD / NVIDIA)

CLI command

New command:

exls workspaces deploy llm-inference

Options:

  • --huggingface-token
  • --model-name
  • --num-gpus
  • --wait-for-ready

Includes:

  • Model name validation (<repo>/<model>)
  • GPU count validation
  • Resource selection based on cluster
  • Standard confirmation + deployment flow

Example usage

exls workspaces deploy llm-inference \
  --model-name Qwen/Qwen3-1.7B \
  --hf-token $HF_TOKEN \
  --num-gpus 2 \
  --wait-for-ready

Behavior

The configurator automatically translates inputs into template variables:

Input Effect
model name Sets model artifact URI + labels
HF token Creates auth secret reference
num GPUs Sets tensor parallelism
GPU vendor Sets accelerator type (nvidia/amd)

This removes most manual editing from the deployment process.


Helm chart

This workspace integrates the existing chart:

https://github.com/exalsius/exalsius-workspace-hub/tree/main/workspace-templates/llm-inference/llm-d-model


Notes

  • Experimental feature
  • Not yet as polished as other workspace types
  • Intended for early adopters and internal testing
  • Backwards compatible (pure addition)

@mcd01 mcd01 requested a review from alek-thunder February 6, 2026 17:06
@mcd01 mcd01 self-assigned this Feb 6, 2026
@mcd01 mcd01 force-pushed the feat/llm_inference_workspace branch from ec23cb7 to 7aa45fe Compare February 6, 2026 17:08
@mcd01 mcd01 marked this pull request as ready for review February 6, 2026 17:12
@mcd01 mcd01 changed the title feat: add opinionated llm-inference workspace feat(workspaces): add opinionated llm-inference workspace Feb 6, 2026
@srnbckr srnbckr force-pushed the feat/llm_inference_workspace branch from bff3596 to cd0a266 Compare February 12, 2026 13:20
@srnbckr
Copy link
Contributor

srnbckr commented Feb 12, 2026

I've rebased the branch and directly fixed some minor issues. Please have a look at my commits and see if that's how you would expect it.

We could discuss to add a short interactive flow to e.g. ask the user for the huggingface token and model name instead of requiring it (similar to how it's done for the marimo or jupyter workspaces) but otherwise I would approve this PR.

@mcd01 mcd01 merged commit 0f9428b into main Feb 13, 2026
1 check failed
@mcd01 mcd01 deleted the feat/llm_inference_workspace branch February 13, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants