Skip to content

reymom/ntt-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ntt-bench

Minimal Rust benchmarking suite for Number Theoretic Transforms (NTTs) over two STARK-style fields: BabyBear (31-bit) and Goldilocks (64-bit).

This project implements:

  • A generic NttField trait
  • Radix-2 Cooley–Tukey NTT (forward)
  • Inverse NTT (INTT)
  • Parallel NTT using rayon
  • Criterion benches for sizes up to 2^16

Inspired by Jim Posen’s talk High-Performance Engineering for SNARKs (ZK Hack S3M2).

🔧 Features

Fields

  • BabyBear — 31-bit prime
    – Fast 32-bit muls
    – High cache density
    – Great for CPU SIMD (NEON/SSE/AVX)

  • Goldilocks — 64-bit prime
    – Efficient reduction rule
    – Requires 128-bit intermediates
    – Common in many modern proof systems

NTT Implementation

  • In-place radix-2 Cooley–Tukey
  • Iterative stages with bit-reversal permutation
  • Generic over any NttField
  • Shared codepath for forward and inverse NTT
  • Parallel version via rayon::par_chunks_mut

📊 Benchmarks

Run:

RUSTFLAGS="-C target-cpu=native -C opt-level=3" cargo bench

Measured mean latencies (n = 2^10 … 2^16):

  • BabyBear is 10–17× faster than Goldilocks in serial mode.
  • Parallel NTT only wins for large domains (≥ 2^16).
  • Both fields reach ~2× speedup on a 4-core CPU at n = 65,536.

See the full write-up:

👉 https://reymom.xyz/blog/cryptography/2025-11-16-ntt-bench

🏗 Running Tests

cargo test
cargo run --bin main

📚 Background Reading

  • ZK Hack S3M2 — High-Performance Engineering for SNARKs
  • Additive NTT (Fractalyze)
  • NTT overview: https://2π.com/23/ntt/

About

Minimal Rust NTT benchmarking suite over BabyBear and Goldilocks fields.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages