ZeroPyBench is a Python benchmarking library with zero overhead, designed for multidimensional performance analysis.
- Context manager API: Benchmark any code block with
with bench(...): ... - Multidimensional: Tag benchmarks with arbitrary keyword arguments
- Zero overhead: Code is passed directly to
timeit.Timer, no wrapper function - Auto-scaling: Automatically determines the number of iterations for reliable measurements
- Multiple exports: CSV, Parquet, Markdown
- Plotting: Built-in visualization with matplotlib
from zeropybench import Benchmark
bench = Benchmark()
for n in [100, 1000, 10000]:
data = list(range(n))
with bench(method='sum', n=n):
sum(data)
with bench(method='len', n=n):
len(data)Output:
method=sum, n=100: 575.124 ns ± 3.35% (median of 7 runs, 500000 loops each)
method=len, n=100: 19.037 ns ± 0.85% (median of 7 runs, 20000000 loops each)
method=sum, n=1000: 2.961 µs ± 36.70% (median of 7 runs, 50000 loops each)
method=len, n=1000: 19.844 ns ± 38.63% (median of 7 runs, 10000000 loops each)
method=sum, n=10000: 50.208 µs ± 9.89% (median of 7 runs, 5000 loops each)
method=len, n=10000: 28.686 ns ± 1.22% (median of 7 runs, 20000000 loops each)
print(bench)┌───┬────────┬────────┬────────────────────────────┬───────────┐
│ ┆ method ┆ n ┆ median_execution_time (ns) ┆ ± (%) │
╞═══╪════════╪════════╪════════════════════════════╪═══════════╡
│ 0 ┆ sum ┆ 100 ┆ 575.124442 ┆ 3.353129 │
│ 1 ┆ len ┆ 100 ┆ 19.036998 ┆ 0.854601 │
│ 2 ┆ sum ┆ 1_000 ┆ 2_961.25732 ┆ 36.698258 │
│ 3 ┆ len ┆ 1_000 ┆ 19.844193 ┆ 38.63371 │
│ 4 ┆ sum ┆ 10_000 ┆ 50_207.584997 ┆ 9.894165 │
│ 5 ┆ len ┆ 10_000 ┆ 28.686439 ┆ 1.22376 │
└───┴────────┴────────┴────────────────────────────┴───────────┘
ZeroPyBench automatically detects JAX arrays and optimizes benchmarking accordingly:
import jax.numpy as jnp
from zeropybench import Benchmark
bench = Benchmark(repeat=20, verbose=True)
x = jnp.ones(1_000_000)
y = jnp.ones(1_000_000)
with bench():
x + ySetup code:
@jax.jit
def __bench_func(x, y):
return x + y
Benchmarked code:
__bench_func(x, y).block_until_ready()
943.426 µs ± 3.98% (median of 20 runs, 500 loops each)
When JAX code is detected, ZeroPyBench:
- Wraps the code in a JIT-compiled function to measure optimized execution
- Separates compilation from execution by reporting
compilation_timeseparately - Captures the StableHLO representation of the compiled function in the
hlofield - Uses
block_until_readyto ensure accurate timing of asynchronous operations
The benchmark report includes additional fields for JAX:
first_execution_time: Time of the initial (possibly uncompiled) executioncompilation_time: Time to lower and compile the functiongenerated_code_size: Total size of the generated machine code in bytes, including embedded constantstemp_size: Size of the preallocated temporary buffer arena in bytes, excluding input arguments, outputs, and constantshlo: The StableHLO text representation of the compiled computation
report = bench[0]
print(report['compilation_time']) # e.g., 12345.67 ns
print(report['hlo'][:100]) # HLO module "jit___bench_func" ...pip install zeropybench# Export results
bench.write_csv('results.csv')
bench.write_parquet('results.parquet')
bench.write_markdown('results.md')
# Plot results
bench.plot()
bench.write_plot('results.pdf')Benchmark(
repeat=7, # Number of measurement repetitions
min_duration_per_repeat=0.2, # Minimum duration per repeat (seconds)
verbose=True, # Print the setup and benchmarked code
)MIT