A collection of GPU experiments and benchmarks for my personal understanding and research.
- ThunderKittens
- CUDA 12.8+
- NVIDIA Hopper (H100) or Blackwell (B200) GPUs
- Python 3.11+ with PyTorch 2.8+ and pybind11
- Run
git submodule update --init --recursive. - In the desired subdirectory, edit the Makefile to target the correct source file, build configuration, and run settings.
- Run
make run.
hopper/: Experiments targeting H100blackwell/: Experiments targeting B200
I try my best to keep things organized, but please don’t expect perfect structure / accurate comments / fully working code / etc. Sometimes I make incorrect observations, realize the mistake later, and forget to update the code that led to it.