Minimal Rust benchmarking suite for Number Theoretic Transforms (NTTs) over two STARK-style fields: BabyBear (31-bit) and Goldilocks (64-bit).
This project implements:
- A generic
NttFieldtrait - Radix-2 Cooley–Tukey NTT (forward)
- Inverse NTT (INTT)
- Parallel NTT using
rayon - Criterion benches for sizes up to
2^16
Inspired by Jim Posen’s talk High-Performance Engineering for SNARKs (ZK Hack S3M2).
-
BabyBear — 31-bit prime
– Fast 32-bit muls
– High cache density
– Great for CPU SIMD (NEON/SSE/AVX) -
Goldilocks — 64-bit prime
– Efficient reduction rule
– Requires 128-bit intermediates
– Common in many modern proof systems
- In-place radix-2 Cooley–Tukey
- Iterative stages with bit-reversal permutation
- Generic over any
NttField - Shared codepath for forward and inverse NTT
- Parallel version via
rayon::par_chunks_mut
Run:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" cargo benchMeasured mean latencies (n = 2^10 … 2^16):
- BabyBear is 10–17× faster than Goldilocks in serial mode.
- Parallel NTT only wins for large domains (≥ 2^16).
- Both fields reach ~2× speedup on a 4-core CPU at n = 65,536.
See the full write-up:
👉 https://reymom.xyz/blog/cryptography/2025-11-16-ntt-bench
cargo test
cargo run --bin main- ZK Hack S3M2 — High-Performance Engineering for SNARKs
- Additive NTT (Fractalyze)
- NTT overview: https://2π.com/23/ntt/