production_cluster

Hosting mlflow server:

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export MLFLOW_S3_ENDPOINT_URL=

mlflow server \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root \
  --host 0.0.0.0 \
  --port 5000

Build docker image for vllm serving:

sudo docker build -t lora-vllm .

[Optional] export http for mlflow (using ngrok or zrok):

   ngrok http 5000

test docker image:

docker run --rm \
  -e MODEL_NAME="thinking" \
  -e MODEL_VERSION="5" \
  -e MODEL_ALIAS="champion" \
  -e MLFLOW_TRACKING_URI="" \
  -e AWS_ACCESS_KEY_ID="" \
  -e AWS_SECRET_ACCESS_KEY="" \
  -e MLFLOW_S3_ENDPOINT_URL="" \
  -e MLFLOW_TRACKING_USERNAME="admin" \
  -e MLFLOW_TRACKING_PASSWORD="" \
  -e VLLM_LOGGING_LEVEL=DEBUG \
  --gpus all \
  -p 8000:8000 \
  vllm

Hosting deployment API:

uvicorn serve_vllm_api:app --host 0.0.0.0 --port 6789

Send API for hosting

curl -X POST http://localhost:6789/start-vllm -H "Content-Type: application/json" -d '{"MODEL_NAME": "initial-sft", "MODEL_VERSION": "latest", "MLFLOW_TRACKING_URI": "", "AWS_ACCESS_KEY_ID": "", "AWS_SECRET_ACCESS_KEY": "", "MLFLOW_S3_ENDPOINT_URL": "", "VLLM_LOGGING_LEVEL": "DEBUG"}'

Now see magic on port 8000

python test_streaming.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
serve_vllm_api.py		serve_vllm_api.py
start.sh		start.sh
test_streaming.py		test_streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

production_cluster

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

production_cluster

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages