Initial Release

Latest

Latest

Bhabesh-Rath released this 09 Apr 05:41

· 2 commits to main since this release

56311ee

v0.1.1 — Initial release

What's included

GGUF quantization (Q2_K through Q8_0) via llama.cpp
GPTQ quantization (INT4/INT8) via gptqmodel — runs on Kaggle T4
TFLite conversion via onnx2tf — runs on Google Colab
Real benchmark: tok/s + perplexity via llama-cpp-python
Simulated mobile benchmark: MAC count + latency across 7 SoC profiles
Pareto frontier chart (interactive Plotly HTML)
Gradio web UI with 4 tabs
CLI: run and ui commands

Verified on

Qwen2-0.5B, GTX 1060 6GB, Windows, CPU inference

Known limitations

TFLite not supported on Windows (use Colab notebook)
GPTQ requires 16GB+ VRAM locally (use Kaggle notebook)

Assets 4