-
Notifications
You must be signed in to change notification settings - Fork 47
Docker compose deployment #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
701e882
ab2915f
4aa26e9
bd23140
4339f5a
bc2e2f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Optional | ||
| # LLM= | ||
| # EMB_MODEL= | ||
| LOCAL_DOCKER=True | ||
| HUGGINGFACEHUB_API_TOKEN= | ||
| LLM_MAX_CONTEXT_LENGTH=32768 | ||
| LLM_SWAP_SPACE=16 | ||
| LLM_CPU_OFFLOAD_SPACE=8 | ||
| DATASET=cinderella | ||
| OUT_DIR=result/cinderella_vllm | ||
| SAVE_DIR=outputs/cinderella_vllm | ||
| OPENAI_API_KEY=DUMMY_KEY |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Install essential OS-level dependencies | ||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends \ | ||
| curl \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install requirements | ||
| COPY requirements.txt requirements.txt | ||
| RUN pip install -r requirements.txt | ||
| COPY . . | ||
|
|
||
| # Set environment variables | ||
| ENV PYTHONPATH=/app | ||
| ENV PYTHONUNBUFFERED=1 | ||
| ENV PORT=7373 | ||
| ENV HOSTNAME=0.0.0.0 | ||
|
|
||
| # Set CUDA-related environment variables | ||
| ENV NVIDIA_VISIBLE_DEVICES=all | ||
| ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility,video | ||
| ENV HF_HUB_ENABLE_HF_TRANSFER=1 | ||
|
|
||
| # Default command - will be overridden by specific environment Dockerfiles | ||
| CMD ["python", "main_docker.py"] |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -210,6 +210,113 @@ netstat -tlnp | grep 8000 | |||||
| curl http://localhost:8000/v1/models | ||||||
| ``` | ||||||
|
|
||||||
| ### Method 3: Using Local docker compose deployment (main_docker.py) ⚡ | ||||||
| This method deploys everything needed locally. Namely: | ||||||
| 1. vLLM openai server for language model inference | ||||||
| 2. 🤗 HugginFace's Text Embeddings Inference (HF TEI) | ||||||
| 3. como-app | ||||||
|
|
||||||
| #### Requirements | ||||||
| - docker | ||||||
| - nvidia device plugin | ||||||
|
|
||||||
| #### 1. Configure inference services 📝 | ||||||
|
|
||||||
| ```bash | ||||||
| cp .env.example .env | ||||||
| ``` | ||||||
| After cpoying the example environment file adjust environment variables as wanted. | ||||||
|
||||||
| After cpoying the example environment file adjust environment variables as wanted. | |
| After copying the example environment file adjust environment variables as wanted. |
Copilot
AI
Apr 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step numbering in this section jumps from "2" to "4". Renumber this heading to keep the instructions sequential (e.g., make this "#### 3. Check Deployment Services' Status"), so users can follow the setup steps without confusion.
| #### 4. Check Deployment Services' Status 🔍 | |
| #### 3. Check Deployment Services' Status 🔍 |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,169 @@ | ||||||
| volumes: | ||||||
| hf_cache: | ||||||
|
|
||||||
| services: | ||||||
| como-app: | ||||||
| container_name: como-app | ||||||
| build: . | ||||||
| # image: nvidia/cuda:11.8.0-base-ubuntu22.04 | ||||||
| # The key part for CDI GPU access: | ||||||
| device_cgroup_rules: | ||||||
| - "c 195:* rmw" | ||||||
| - "c 236:* rmw" | ||||||
| devices: | ||||||
| - nvidia.com/gpu=all | ||||||
| env_file: .env | ||||||
| environment: | ||||||
| - EMB_MODEL=${EMB_MODEL} | ||||||
| ipc: host | ||||||
| depends_on: | ||||||
| embeddings: | ||||||
| condition: service_healthy | ||||||
| vllm: | ||||||
| condition: service_healthy | ||||||
|
|
||||||
| embeddings: | ||||||
| deploy: | ||||||
| replicas: 1 | ||||||
| image: ghcr.io/huggingface/text-embeddings-inference:86-1.8 | ||||||
| volumes: | ||||||
| - hf_cache:/.hf_cache | ||||||
| ports: | ||||||
| - 8011:8080 | ||||||
| ipc: host | ||||||
| container_name: embeddings | ||||||
| # Replace runtime with devices for CDI | ||||||
| devices: | ||||||
| - nvidia.com/gpu=all | ||||||
| device_cgroup_rules: | ||||||
| - "c 195:* rmw" | ||||||
| - "c 236:* rmw" | ||||||
| environment: | ||||||
| - EMB_MODEL=${EMB_MODEL} | ||||||
| - NVIDIA_VISIBLE_DEVICES=all | ||||||
| - NVIDIA_DRIVER_CAPABILITIES=compute,utility | ||||||
| - USE_FLASH_ATTENTION=True | ||||||
| - HF_HUB_ENABLE_HF_TRANSFER=1 | ||||||
| - HF_HOME=/.hf_cache | ||||||
| - RUST_LOG=info | ||||||
| # Performance tuning for embedding models | ||||||
| - OMP_NUM_THREADS=8 | ||||||
| - MKL_NUM_THREADS=8 | ||||||
| - TOKENIZERS_PARALLELISM=true | ||||||
| restart: no | ||||||
| env_file: .env | ||||||
| command: | ||||||
| [ | ||||||
| "--model-id", | ||||||
| "${EMB_MODEL:-nomic-ai/nomic-embed-text-v1.5}", | ||||||
| "--hostname", | ||||||
| "0.0.0.0", | ||||||
| "--port", | ||||||
| "8080", | ||||||
| "--huggingface-hub-cache", | ||||||
| "/.hf_cache", | ||||||
| "--tokenization-workers", | ||||||
| "16", | ||||||
| "--max-concurrent-requests", | ||||||
| "1024", | ||||||
| "--max-batch-tokens", | ||||||
| "32768", | ||||||
| "--max-batch-requests", | ||||||
| "256", | ||||||
| "--max-client-batch-size", | ||||||
| "64", | ||||||
| "--auto-truncate", | ||||||
| "--payload-limit", | ||||||
| "4000000", | ||||||
|
||||||
| "4000000", | |
| "4000000" |
Copilot
AI
Apr 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restart: no is ambiguous in YAML (often parsed as a boolean) while Compose expects a string policy value. Quote it (restart: "no") or omit restart entirely to avoid config parsing surprises across YAML/Compose versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: "HugginFace's" should be "Hugging Face's".