diff --git a/benchmarks/microbenchmarks/README.md b/benchmarks/microbenchmarks/README.md new file mode 100644 index 000000000..1f8ca3ee6 --- /dev/null +++ b/benchmarks/microbenchmarks/README.md @@ -0,0 +1,66 @@ +# Transformer Engine Microbenchmarks + +This directory contains lightweight Python microbenchmarks for selected +Transformer Engine kernels and helper scripts for comparing benchmark CSVs. + +## Benchmarks + +- `benchmark_gemm.py`: dense BF16 GEMM benchmark +- `benchmark_gemm_fp8.py`: dense FP8 GEMM benchmark using `fp8_autocast` +- `benchmark_grouped_gemm.py`: grouped GEMM benchmark for MoE-style shapes +- `benchmark_casting.py`: BF16 `<->` FP8 casting benchmark +- `benchmark_normalization.py`: LayerNorm and RMSNorm benchmark + +Run a benchmark directly from this directory. Pass `--csv` to write results. +When no filename is provided, `run_benchmarks` derives the CSV name from the +benchmark script file name. + +```bash +python benchmark_gemm.py --csv +python benchmark_grouped_gemm.py --csv grouped_results.csv +``` + +## Shared configuration + +Common benchmark settings live in `utils.py`. + +- `M_SIZE_LIST`: default token-count sweep for dense and elementwise kernels +- `DTYPE_LIST`: shared dtype sweep for TE activation benchmarks +- `MODEL_CONFIGS`: dense GEMM model shapes +- `MODEL_HIDDEN_SIZES`: hidden sizes for elementwise kernels + +Grouped GEMM keeps its own smaller M sweep because its working set scales with +expert count `B` in addition to `M`. + +## Adding a benchmark + +Use `run_benchmarks(test_cases, bench_fn, param_columns)`. + +- `test_cases` is a list of dictionaries containing benchmark inputs. +- `param_columns` lists the case fields that should appear in stdout headers + and CSV output. +- `bench_fn(**case)` must return a list of metric records created by + `make_metric_record(...)` or `make_forward_backward_metric_records(...)`. + +Each metric record represents one benchmark line such as `GEMM Forward`. The +runner prints that line to stdout and expands it into two CSV columns: + +- `