bitnet-rs

BitNet-rs is a high-performance Rust inference engine for 1-bit BitNet LLMs.

Features

SIMD/CUDA/Metal/Vulkan kernels — AVX2/AVX-512/NEON on CPU; CUDA (gpu), Metal (metal, macOS), Vulkan (vulkan), Intel Arc OpenCL (opencl) GPU backends
Multiple quantization formats — I2_S BitNet32-F16, I2_S QK256 (GGML 256-element blocks), TL1, TL2, IQ2_S via FFI
Cross-validation — per-token cosine-similarity comparison against Microsoft's C++ reference (>0.99)
Honest-compute receipts — schema v1.0.0 with 8 validation gates; compute_path must be "real"
Chat templates — 59+ template variants (LLaMA-3, Phi-4, Qwen, Gemma, Mistral, DeepSeek, and more); auto-detected from GGUF metadata or tokenizer path
SLM model support — load and run Phi-4, Qwen, Gemma, Mistral, LLaMA, and SmolLM2 via SafeTensors (quickstart guide)
SafeTensors → GGUF export — bitnet-st2gguf preserves F16 LayerNorm weights

v0.2.1-dev (pre-alpha): QK256 uses scalar kernels (~0.1 tok/s on 2B models); use --max-tokens 4–16 for validation. AVX2 dequantization is merged; ≥3× uplift planned. Significant correctness, performance, and validation work remains.

Quick Start

# 1. Download a model
cargo run -p xtask -- download-model --id microsoft/bitnet-b1.58-2B-4T-gguf

# 2. Run inference  (always specify --no-default-features --features cpu|gpu)
RUST_LOG=warn cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- run \
  --model  models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
  --prompt "What is 2+2?" --max-tokens 8

# 3. Interactive chat
RUST_LOG=warn cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- chat \
  --model  models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json

Workspace default features are empty — always pass --no-default-features --features cpu (or gpu). bitnet-cli defaults to cpu,full-cli when built standalone.

Status

Feature	State	Notes
CPU inference — I2_S BitNet32	✅	Production path; 10–20× faster than QK256 scalar
CPU inference — I2_S QK256	✅	Scalar kernels (~0.1 tok/s on 2B); AVX2 foundation merged
GPU inference — CUDA	🔶	Scaffolded; receipt validation pending
GPU inference — Metal	🧪	Feature gate + kernel stubs; not validated end-to-end
GPU inference — Vulkan	🧪	Runtime probing compiled; not validated end-to-end
GPU inference — Intel oneAPI	🧪	Intel CPU/GPU feature gate; not validated end-to-end
AMD ROCm detection	🧪	Device detection only; inference kernels not yet validated
GPU HAL — multi-backend	🔧	`bitnet-gpu-hal`: OpenCL, Vulkan, Metal, ROCm backends; ~780 tests (scaffold; CPU-only validation)
Interactive chat (REPL)	✅	`/help`, `/clear`, `/metrics`, auto-template detection
Cross-validation vs C++	✅	Cosine similarity > 0.99, per-token comparison
Honest-compute receipts	✅	Schema v1.0.0, 8 validation gates
Strict mode	✅	Runtime guards prevent mock fallback
SafeTensors → GGUF export	✅	`bitnet-st2gguf` with F16 LayerNorm preservation
Server / HTTP API	🚧	Health endpoints wired; inference endpoints have TODOs

GPU Multi-Backend Support

BitNet-rs supports inference on multiple GPU platforms:

Backend	Feature Flag	Status	Hardware
NVIDIA CUDA	`--features gpu`	🔶 Alpha	GeForce/Tesla/A100+
Intel Arc (OpenCL)	`--features opencl`	🧪 Experimental	Arc A770/A750
AMD ROCm	`--features rocm`	🧪 Scaffold	Unvalidated target: RDNA3-class AMD GPUs
Vulkan	`--features vulkan`	🧪 Scaffold	Any Vulkan 1.3 GPU
Apple Metal	`--features metal`	🧪 Scaffold	M1/M2/M3+
WebGPU	N/A (sub-crate only)	🧪 Experimental	Browser/wgpu (`bitnet-wgpu`)
CPU (SIMD)	`--features cpu`	✅ Production	x86-64/ARM64

Quick Start (Intel Arc)

# Install Intel compute runtime (Ubuntu)
sudo apt install intel-opencl-icd clinfo

# Build with Intel GPU support
cargo build --release --no-default-features --features opencl,full-cli

# Run inference
cargo run --release --no-default-features --features opencl,full-cli -- run \
  --model models/model.gguf --device opencl --prompt "Hello" --max-tokens 32

See docs/INTEL_GPU_SETUP.md for detailed setup instructions.

Device Selection

--device auto     # Auto-detect best available (default)
--device cpu      # Force CPU
--device cuda     # Force NVIDIA CUDA
--device opencl   # Force Intel OpenCL
--device vulkan   # Force Vulkan

Architecture

Data flows top-to-bottom through the workspace:

bitnet-tokenizers ──────────────────────────────────────┐
                                                         │
bitnet-models  (GGUF loader, dual I2_S flavor detection) │
  └── bitnet-quantization  (I2_S / TL1 / TL2 / IQ2_S)  │
        └── bitnet-kernels (AVX2 / AVX-512 / NEON / CUDA)│
                                                         ▼
                        bitnet-inference  (autoregressive engine)
                          ├── bitnet-logits       (temperature / top-k / top-p)
                          ├── bitnet-sampling     (greedy, nucleus, repetition penalty)
                          ├── bitnet-generation   (decode loop, stop criteria)
                          ├── bitnet-prompt-templates  (59+ template variants; auto-detection)
                          └── bitnet-receipts     (honest-compute receipt schema)
                                                         │
                                          ┌──────────────┴──────────────┐
                                     bitnet-cli                  bitnet-server

SRP microcrates (bitnet-logits, bitnet-sampling, bitnet-generation, bitnet-engine-core, bitnet-device-probe, bitnet-gguf, bitnet-prompt-templates, bitnet-receipts) keep coupling low and are re-exported from their original locations for zero breaking changes.

GPU Backend Crates

bitnet-opencl — Intel GPU compute via OpenCL 3.0
bitnet-vulkan — Cross-vendor Vulkan compute
bitnet-wgpu / bitnet-wgpu-runner — WebGPU/WGSL compute shaders
bitnet-rocm — AMD ROCm/HIP backend
bitnet-metal — Apple Metal compute
bitnet-gpu-hal — Unified Hardware Abstraction Layer (includes Level Zero backend module)

Documentation

Organised by Diátaxis:

Section	Contents
Tutorials	Getting started, first inference, tokenizer discovery
How-to	Install, run inference, export GGUF, cross-validate, validate models
Explanation	Architecture, quantization formats, dual-backend cross-val, feature flags
Reference	CLI flags, environment variables, API, quantization support

Key guides: Quickstart · SLM models · Environment variables · GPU setup · Intel GPU setup · C++ cross-validation · Quantization support · Validation gates · Honest-compute receipts · QK256 usage · macOS 26 Apple Silicon roadmap

Building

cargo build --no-default-features --features cpu           # CPU (development)
cargo build --no-default-features --features gpu           # GPU (requires CUDA 12.x)
RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=thin" \
  cargo build --release --no-default-features --features cpu,full-cli  # optimised release

# Nix (reproducible, identical to CI)
nix develop && nix build .#bitnet-cli && nix flake check

Feature flags

Flag	Purpose
`cpu`	SIMD-optimised CPU inference (AVX2 / AVX-512 / NEON)
`gpu`	Umbrella GPU feature — enables all compiled GPU backends
`cuda`	CUDA acceleration (preferred; requires CUDA 12.x); backward-compat alias for `gpu`
`metal`	Metal GPU backend (macOS/iOS Apple Silicon)
`vulkan`	Vulkan compute backend (cross-platform)
`ffi`	C++ FFI bridge for cross-validation
`fixtures`	GGUF fixture-based integration tests (test-only)
`full-cli`	Enable all CLI subcommands
`rocm`	AMD ROCm/HIP inference backend (experimental; kernels not yet validated end-to-end)
`npu`	NPU detection via `bitnet-device-probe`
`opencl`	Intel Arc OpenCL backend (experimental; `bitnet-opencl` crate)

Always use the unified GPU predicate in Rust code:

#[cfg(any(feature = "gpu", feature = "cuda"))]

Testing

# Run all enabled tests (recommended — 5-minute timeout)
cargo nextest run --workspace --no-default-features --features cpu

# CI profile (4 threads, no retries)
cargo nextest run --profile ci

# Skip slow QK256 scalar-kernel tests
BITNET_SKIP_SLOW_TESTS=1 cargo nextest run --workspace --no-default-features --features cpu

# BDD compile-coverage check
cargo run -p xtask -- grid-check

# Fixture-based integration tests
cargo test -p bitnet-models --test qk256_dual_flavor_tests --no-default-features --features fixtures

# Lint before pushing
cargo fmt --all && cargo clippy --all-targets --no-default-features --features cpu -- -D warnings

# Quick local CI smoke test (replicates the 4 required CI gates)
./ci/local.sh

The suite has tens of thousands of tests spanning unit, property-based (proptest), snapshot (insta), fixture, fuzz (109 targets; 37 in nightly CI matrix), and BDD grid categories. ~2,800+ tests are intentionally #[ignore]-d — TDD scaffolds, resource-gated tests, slow tests, and crossval tests. See #[ignore = "..."] justification strings.

See docs/development/test-suite.md for full details.

Contributing

See CONTRIBUTING.md. Issues and pull requests welcome.

Before opening a PR, run:

# Option 1: Quick local CI smoke test (recommended)
./ci/local.sh

# Option 2: Manual checks
cargo fmt --all && cargo clippy --all-targets --no-default-features --features cpu -- -D warnings
cargo nextest run --workspace --no-default-features --features cpu

Note: ~2,800+ tests are intentionally #[ignore]-d. This is expected — they are TDD scaffolds, resource-gated tests (model files, GPU hardware), slow tests, and crossval tests. See #[ignore = "..."] justification strings.

License

Dual-licensed under MIT and Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3,193 Commits
.agent/receipts		.agent/receipts
.cargo		.cargo
.claude		.claude
.config		.config
.copilot/notes		.copilot/notes
.githooks		.githooks
.github		.github
.jules		.jules
.kiro/specs		.kiro/specs
archive		archive
assets		assets
baselines		baselines
benches		benches
benchmarks/baselines/pr-448		benchmarks/baselines/pr-448
bin		bin
ci		ci
config		config
crates		crates
crossval		crossval
docker		docker
docs		docs
examples		examples
fuzz		fuzz
include		include
infra		infra
media		media
models		models
patches		patches
scripts		scripts
src		src
tests-new		tests-new
tests		tests
tools		tools
xtask-build-helper		xtask-build-helper
xtask		xtask
.coderabbit.yaml		.coderabbit.yaml
.crates.toml		.crates.toml
.crates2.json		.crates2.json
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.lychee.toml		.lychee.toml
.markdownlint.jsonc		.markdownlint.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
.tokeignore		.tokeignore
AC10_ERROR_HANDLERS_COMPLETE.rs		AC10_ERROR_HANDLERS_COMPLETE.rs
BITNET_CPP_EXPLORATION_SUMMARY.txt		BITNET_CPP_EXPLORATION_SUMMARY.txt
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPATIBILITY.md		COMPATIBILITY.md
CONTRIBUTING.md		CONTRIBUTING.md
CROSSVAL_FFI_MANIFEST.txt		CROSSVAL_FFI_MANIFEST.txt
CROSSVAL_FFI_SUMMARY.txt		CROSSVAL_FFI_SUMMARY.txt
CROSSVAL_SUMMARY.txt		CROSSVAL_SUMMARY.txt
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
EXPLORATION_SUMMARY.txt		EXPLORATION_SUMMARY.txt
IMPLEMENTATION_ROADMAP_SUMMARY.txt		IMPLEMENTATION_ROADMAP_SUMMARY.txt
Justfile		Justfile
KEY_FINDINGS.txt		KEY_FINDINGS.txt
LICENSE		LICENSE
Makefile		Makefile
Makefile.ci		Makefile.ci
Makefile.minimal		Makefile.minimal
README.md		README.md
REPORT_SUMMARY.txt		REPORT_SUMMARY.txt
SECURITY.md		SECURITY.md
TEST_FILTERING_SUMMARY.txt		TEST_FILTERING_SUMMARY.txt
THIRD_PARTY.md		THIRD_PARTY.md
VERIFICATION_SUMMARY.txt		VERIFICATION_SUMMARY.txt
build		build
build.rs		build.rs
bulk_close_commands.sh		bulk_close_commands.sh
check_avx512		check_avx512
clippy.toml		clippy.toml
deny.toml		deny.toml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
execute_phase1_actions.sh		execute_phase1_actions.sh
file-lock-network-retry.patch		file-lock-network-retry.patch
fix_quantizer_errors.py		fix_quantizer_errors.py
fix_tl2_test.py		fix_tl2_test.py
fix_unused_import.py		fix_unused_import.py
flake.lock		flake.lock
flake.nix		flake.nix
libplatform_utils.rlib		libplatform_utils.rlib
manual_fix.py		manual_fix.py
mutants.toml		mutants.toml
new_create_quantized_model.rs		new_create_quantized_model.rs
new_tl2_test_body.rs		new_tl2_test_body.rs
patch_clippy.py		patch_clippy.py
patch_clippy_quant.py		patch_clippy_quant.py
patch_concurrency.py		patch_concurrency.py
patch_conv.py		patch_conv.py
patch_ensure17.py		patch_ensure17.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bitnet-rs

Features

Quick Start

Status

GPU Multi-Backend Support

Quick Start (Intel Arc)

Device Selection

Architecture

GPU Backend Crates

Documentation

Building

Feature flags

Testing

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bitnet-rs

Features

Quick Start

Status

GPU Multi-Backend Support

Quick Start (Intel Arc)

Device Selection

Architecture

GPU Backend Crates

Documentation

Building

Feature flags

Testing

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages