Releases: zenprocess/servingcard
Releases · zenprocess/servingcard
v0.1.0 — First release
servingcard v0.1.0
Hardware-specific LLM serving configurations. Model cards for serving.
What's included
- Spec: ServingCard YAML v1.0 with JSON Schema
- CLI:
servingcard benchmark,servingcard apply,servingcard validate,servingcard info - Registry: 3 seed configs (Qwen3-coder on NVIDIA GB10)
- FP8 + Eagle3 speculative: 69 tok/s
- FP8 baseline: 42 tok/s
- NVFP4: 42 tok/s, 262K context
- PawBench integration: benchmark harness for producing serving cards
- 57 tests, CI on Python 3.10/3.11/3.12
Install
git clone https://github.com/zenprocess/servingcard
cd servingcard/packages/python
pip install -e .Quick start
# Apply a community config
servingcard apply qwen3-coder/gb10-fp8-eagle3-spec3
# Benchmark your setup
servingcard benchmark --model qwen3-coder --hardware nvidia-gb10 --endpoint http://localhost:8000PyPI package coming in v0.2.0.