Skip to content

Commit 7202087

Browse files
committed
defconfigs: Add composable fragments for Lambda Labs vLLM deployment
This introduces a fragment-based approach to defconfig composition, allowing users to combine infrastructure provisioning with workflow configurations. Two new config fragments are added to defconfigs/configs/: - lambdalabs-gpu-1x-a10.config: Terraform configuration for Lambda Labs A10 GPU instance provisioning with automatic region inference and SSH key generation. - vllm-production-stack-gpu.config: vLLM production stack configuration with GPU-accelerated inference, Kubernetes deployment via minikube, monitoring, autoscaling, and benchmarking capabilities. These fragments are combined into a new defconfig lambdalabs-vllm-gpu-1x-a10 which enables end-to-end deployment: provision a Lambda Labs A10 GPU instance ($0.75/hr) and deploy the vLLM production stack for LLM inference workloads. The fragment approach allows users to compose configurations by combining infrastructure providers (Lambda Labs, AWS, Azure, bare metal) with different workflows (vLLM, fstests, blktests) without maintaining separate defconfigs for every combination. Example usage: make defconfig-lambdalabs-vllm-gpu-1x-a10 make bringup # Provisions Lambda Labs A10 GPU instance make vllm # Deploys vLLM production stack make vllm-benchmark # Run performance benchmarks Generated-by: Claude AI Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
1 parent 5f9fa38 commit 7202087

File tree

3 files changed

+172
-0
lines changed

3 files changed

+172
-0
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Lambda Labs GPU 1x A10 instance configuration
2+
CONFIG_TERRAFORM=y
3+
CONFIG_TERRAFORM_LAMBDALABS=y
4+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
5+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_A10=y
6+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
7+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
8+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# vLLM Production Stack with GPU support
2+
CONFIG_WORKFLOWS=y
3+
CONFIG_WORKFLOWS_TESTS=y
4+
CONFIG_WORKFLOWS_LINUX_TESTS=y
5+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
6+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
7+
8+
# vLLM Production Stack with Kubernetes
9+
CONFIG_VLLM_PRODUCTION_STACK=y
10+
CONFIG_VLLM_K8S_MINIKUBE=y
11+
CONFIG_VLLM_VERSION_STABLE=y
12+
CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
13+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod"
14+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
15+
16+
# Production Stack components
17+
CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack"
18+
CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest"
19+
CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router"
20+
CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest"
21+
CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y
22+
CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=y
23+
CONFIG_VLLM_PROD_STACK_MIN_REPLICAS=2
24+
CONFIG_VLLM_PROD_STACK_MAX_REPLICAS=5
25+
CONFIG_VLLM_PROD_STACK_TARGET_GPU_UTILIZATION=80
26+
27+
# Model configuration
28+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
29+
CONFIG_VLLM_MODEL_NAME="opt-125m"
30+
31+
# GPU configuration - EXPLICITLY DISABLED CPU INFERENCE
32+
# CONFIG_VLLM_USE_CPU_INFERENCE is not set
33+
CONFIG_VLLM_REQUEST_GPU=1
34+
CONFIG_VLLM_GPU_TYPE=""
35+
CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.5"
36+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
37+
38+
# Engine configuration for GPU
39+
CONFIG_VLLM_REPLICA_COUNT=1
40+
CONFIG_VLLM_REQUEST_CPU=8
41+
CONFIG_VLLM_REQUEST_MEMORY="16Gi"
42+
CONFIG_VLLM_MAX_MODEL_LEN=1024
43+
CONFIG_VLLM_DTYPE="auto"
44+
45+
# Router and observability
46+
CONFIG_VLLM_ROUTER_ENABLED=y
47+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
48+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
49+
CONFIG_VLLM_GRAFANA_PORT=3000
50+
CONFIG_VLLM_PROMETHEUS_PORT=9090
51+
52+
# API configuration
53+
CONFIG_VLLM_API_PORT=8000
54+
CONFIG_VLLM_API_KEY=""
55+
CONFIG_VLLM_HF_TOKEN=""
56+
57+
# Benchmarking
58+
CONFIG_VLLM_BENCHMARK_ENABLED=y
59+
CONFIG_VLLM_BENCHMARK_DURATION=60
60+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
61+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
#
2+
# Lambda Labs vLLM Production Stack - 1x A10 GPU ($0.75/hr)
3+
#
4+
# This combines:
5+
# - defconfigs/configs/lambdalabs-gpu-1x-a10.config (Terraform provisioning)
6+
# - defconfigs/configs/vllm-production-stack-gpu.config (vLLM deployment)
7+
#
8+
# Provisions a Lambda Labs GPU instance with NVIDIA A10 (24GB) and deploys
9+
# the vLLM production stack for LLM inference workloads.
10+
#
11+
# ============================================================================
12+
# NVIDIA GPU COMPATIBILITY (CUDA):
13+
# ============================================================================
14+
#
15+
# vLLM v0.10.x uses FlashInfer CUDA kernels that require NVIDIA GPUs with
16+
# compute capability >= 8.0. Older NVIDIA GPUs will fail with:
17+
# "RuntimeError: TopPSamplingFromProbs failed with error code
18+
# too many resources requested for launch"
19+
#
20+
# NVIDIA A10 Compatibility:
21+
# - Compute Capability: 8.6 ✓ COMPATIBLE
22+
# - Memory: 24GB GDDR6
23+
# - Cost: $0.75/hour on Lambda Labs
24+
# - Perfect for: Production LLM inference, fine-tuning
25+
#
26+
# ============================================================================
27+
# Usage:
28+
# make defconfig-lambdalabs-vllm-gpu-1x-a10
29+
# make bringup # Provisions A10 GPU instance
30+
# make vllm # Deploys vLLM production stack
31+
# make vllm-benchmark # Run performance benchmarks
32+
# ============================================================================
33+
#
34+
# Lambda Labs GPU 1x A10 instance configuration
35+
CONFIG_TERRAFORM=y
36+
CONFIG_TERRAFORM_LAMBDALABS=y
37+
CONFIG_TERRAFORM_LAMBDALABS_REGION_SMART_INFER=y
38+
CONFIG_TERRAFORM_LAMBDALABS_INSTANCE_TYPE_GPU_1X_A10=y
39+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY=y
40+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_OVERWRITE=y
41+
CONFIG_TERRAFORM_SSH_CONFIG_GENKEY_EMPTY_PASSPHRASE=y
42+
43+
# vLLM Production Stack with GPU support
44+
CONFIG_WORKFLOWS=y
45+
CONFIG_WORKFLOWS_TESTS=y
46+
CONFIG_WORKFLOWS_LINUX_TESTS=y
47+
CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
48+
CONFIG_KDEVOPS_WORKFLOW_DEDICATE_VLLM=y
49+
50+
# vLLM Production Stack with Kubernetes
51+
CONFIG_VLLM_PRODUCTION_STACK=y
52+
CONFIG_VLLM_K8S_MINIKUBE=y
53+
CONFIG_VLLM_VERSION_STABLE=y
54+
CONFIG_VLLM_ENGINE_IMAGE_TAG="v0.10.2"
55+
CONFIG_VLLM_HELM_RELEASE_NAME="vllm-prod"
56+
CONFIG_VLLM_HELM_NAMESPACE="vllm-system"
57+
58+
# Production Stack components
59+
CONFIG_VLLM_PROD_STACK_REPO="https://vllm-project.github.io/production-stack"
60+
CONFIG_VLLM_PROD_STACK_CHART_VERSION="latest"
61+
CONFIG_VLLM_PROD_STACK_ROUTER_IMAGE="ghcr.io/vllm-project/production-stack/router"
62+
CONFIG_VLLM_PROD_STACK_ROUTER_TAG="latest"
63+
CONFIG_VLLM_PROD_STACK_ENABLE_MONITORING=y
64+
CONFIG_VLLM_PROD_STACK_ENABLE_AUTOSCALING=y
65+
CONFIG_VLLM_PROD_STACK_MIN_REPLICAS=2
66+
CONFIG_VLLM_PROD_STACK_MAX_REPLICAS=5
67+
CONFIG_VLLM_PROD_STACK_TARGET_GPU_UTILIZATION=80
68+
69+
# Model configuration
70+
CONFIG_VLLM_MODEL_URL="facebook/opt-125m"
71+
CONFIG_VLLM_MODEL_NAME="opt-125m"
72+
73+
# GPU configuration - EXPLICITLY DISABLED CPU INFERENCE
74+
# CONFIG_VLLM_USE_CPU_INFERENCE is not set
75+
CONFIG_VLLM_REQUEST_GPU=1
76+
CONFIG_VLLM_GPU_TYPE=""
77+
CONFIG_VLLM_GPU_MEMORY_UTILIZATION="0.5"
78+
CONFIG_VLLM_TENSOR_PARALLEL_SIZE=1
79+
80+
# Engine configuration for GPU
81+
CONFIG_VLLM_REPLICA_COUNT=1
82+
CONFIG_VLLM_REQUEST_CPU=8
83+
CONFIG_VLLM_REQUEST_MEMORY="16Gi"
84+
CONFIG_VLLM_MAX_MODEL_LEN=1024
85+
CONFIG_VLLM_DTYPE="auto"
86+
87+
# Router and observability
88+
CONFIG_VLLM_ROUTER_ENABLED=y
89+
CONFIG_VLLM_ROUTER_ROUND_ROBIN=y
90+
CONFIG_VLLM_OBSERVABILITY_ENABLED=y
91+
CONFIG_VLLM_GRAFANA_PORT=3000
92+
CONFIG_VLLM_PROMETHEUS_PORT=9090
93+
94+
# API configuration
95+
CONFIG_VLLM_API_PORT=8000
96+
CONFIG_VLLM_API_KEY=""
97+
CONFIG_VLLM_HF_TOKEN=""
98+
99+
# Benchmarking
100+
CONFIG_VLLM_BENCHMARK_ENABLED=y
101+
CONFIG_VLLM_BENCHMARK_DURATION=60
102+
CONFIG_VLLM_BENCHMARK_CONCURRENT_USERS=10
103+
CONFIG_VLLM_BENCHMARK_RESULTS_DIR="/data/vllm-benchmark"

0 commit comments

Comments
 (0)