Nano vLLM with vLLM v1's request scheduling strategy and chunked prefill
-
Updated
Jan 26, 2026 - Python
Nano vLLM with vLLM v1's request scheduling strategy and chunked prefill
Lightweight LLM inference engine inspired by nano-vllm, with radix-tree based prefix cache, tp & pp, cuda graph, openai api, async scheduling, and more.
High-throughput TTS server based on vLLM continuous batching. VoxCPM2 and future Transformer TTS models. Optimized for cloud deployment and multi-tenant serving.
Personal text-to-speech webapp powered by VoxCPM2 — voice design, controllable cloning, and ultimate cloning. Next.js on Vercel + Modal GPU.
Add a description, image, and links to the nano-vllm topic page so that developers can more easily learn about it.
To associate your repository with the nano-vllm topic, visit your repo's landing page and select "manage topics."