Skip to content

Releases: zenprocess/servingcard

v0.1.0 — First release

26 Mar 23:42

Choose a tag to compare

servingcard v0.1.0

Hardware-specific LLM serving configurations. Model cards for serving.

What's included

  • Spec: ServingCard YAML v1.0 with JSON Schema
  • CLI: servingcard benchmark, servingcard apply, servingcard validate, servingcard info
  • Registry: 3 seed configs (Qwen3-coder on NVIDIA GB10)
    • FP8 + Eagle3 speculative: 69 tok/s
    • FP8 baseline: 42 tok/s
    • NVFP4: 42 tok/s, 262K context
  • PawBench integration: benchmark harness for producing serving cards
  • 57 tests, CI on Python 3.10/3.11/3.12

Install

git clone https://github.com/zenprocess/servingcard
cd servingcard/packages/python
pip install -e .

Quick start

# Apply a community config
servingcard apply qwen3-coder/gb10-fp8-eagle3-spec3

# Benchmark your setup
servingcard benchmark --model qwen3-coder --hardware nvidia-gb10 --endpoint http://localhost:8000

PyPI package coming in v0.2.0.