Skip to content

#102 vLLM subclass implementation as inference backend and example updated.#107

Open
solankinitish wants to merge 1 commit into
vitalops:mainfrom
solankinitish:issue-102-vllm-backend
Open

#102 vLLM subclass implementation as inference backend and example updated.#107
solankinitish wants to merge 1 commit into
vitalops:mainfrom
solankinitish:issue-102-vllm-backend

Conversation

@solankinitish

Copy link
Copy Markdown

Closes #102

Problem

The existing VLLM implementation (added in #106) was incomplete — it imported httpx mid-class to fetch max_model_len from the vLLM server at init time, making instantiation fail without a live server and introducing an undeclared dependency.

Changes

  • Reimplemented VLLM as a clean subclass following the same pattern as Ollamaapi_base and max_tokens passed explicitly, no network calls at init
  • max_tokens defaults to 4096 and can be overridden by the user to match their model's context length
  • Rate limiting and batch distribution handled by the existing LLM infrastructure
  • Added vLLM to the LLM provider section in examples/Getting_started.ipynb

Usage

from datatune.llm.llm import VLLM

llm = VLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    api_base="http://localhost:8000/v1",
    max_tokens=4096
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VLLM as Inference Backend

1 participant