Build native cubins for Ada (sm_89) and Hopper (sm_90)#62
Conversation
CMAKE_CUDA_ARCHITECTURES was 75 only (Turing/T4 CI runner), so the wheel ships an sm_75 cubin + compute_75 PTX and nothing else. On a non-Turing GPU the driver must JIT that PTX, and a JIT requires a driver at least as new as the build toolkit. On an L4 (sm_89) with Dataflow's R535 / CUDA 12.2 driver, JITing this package's CUDA 12.x PTX fails at kernel launch with cudaErrorUnsupportedPtxVersion (code 222). Add sm_89 (Ada: L4, RTX 40-series) and sm_90 (Hopper: H100/H200) so those GPUs load native SASS with no JIT, independent of driver version. Native cubins are gated only by hardware support, so an R535-class driver runs them fine. Verified locally: ptxas compiles this package's PTX to both sm_89 and sm_90, and a beamform on sm_89 hardware (RTX 4090) produces bit-identical output to the JIT'd sm_75+PTX path.
Code Review by Qodo
1. Unpinned CMake for sm_90
|
| # 90 = Hopper (H100/H200) | ||
| # https://developer.nvidia.com/cuda-gpus | ||
| set(CMAKE_CUDA_ARCHITECTURES 75) | ||
| set(CMAKE_CUDA_ARCHITECTURES 75 89 90) |
There was a problem hiding this comment.
1. Unpinned cmake for sm_90 🐞 Bug ☼ Reliability
The PR adds sm_89/sm_90 to CMAKE_CUDA_ARCHITECTURES, but the wheel build configuration doesn’t install/pin CMake, so the build can fail depending on the builder image’s preinstalled CMake version. This makes wheel/CI builds fragile and environment-dependent after this change.
Agent Prompt
## Issue description
The project now hard-codes `CMAKE_CUDA_ARCHITECTURES` to include `89` and `90`, but the wheel build flow does not explicitly install CMake (nor a minimum version). As a result, the wheel build may succeed or fail depending on the CMake version baked into the manylinux/builder image.
## Issue Context
- CMake is a build-time dependency and the repo already treats it as such in developer workflows (e.g., `make compile`).
- `cibuildwheel` installs `scikit-build-core`, `nanobind`, and `ninja`, but not `cmake`.
## Fix Focus Areas
- Add `cmake` (with a minimum version) to the build environment used for wheel builds (and ideally to PEP517 build requirements too).
### Recommended changes
1. Update `pyproject.toml`:
- Add `cmake>=<min_version>` (and optionally `ninja`) to `[build-system].requires`, **or**
- Add `cmake>=<min_version>` to `[tool.cibuildwheel].before-build` (since you already install other build tools there).
2. Keep versions aligned with the CUDA builder images used in CI.
## Fix Focus Areas (exact locations)
- pyproject.toml[1-3]
- pyproject.toml[230-252]
- CMakeLists.txt[55-68]
- Makefile[42-50]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
Code review by qodo was updated up to the latest commit f1bbe33 |
Problem
CMAKE_CUDA_ARCHITECTURESis75only (Turing/T4, for the GitHub CI runner), so the published wheel ships ansm_75cubin pluscompute_75PTX — and nothing else.On a non-Turing GPU there's no matching cubin, so the driver must JIT the PTX at load time. A PTX JIT requires a driver at least as new as the toolkit that emitted the PTX (this package's is CUDA 12.x). That breaks on Google Cloud Dataflow, whose GPU workers run an R535 / CUDA 12.2 driver.
sm_89→ no native cubin → driver tries to JITcompute_75PTXWhy it wasn't caught
The failure needs two things at once: a GPU that isn't
sm_75(forces a JIT) and a driver too old to JIT the PTX. Neither dev nor CI hits both:sm_75sm_75cubin, no JITsm_120sm_89Fix
set(CMAKE_CUDA_ARCHITECTURES 75 89 90)75Turing (T4 / CI) — unchanged89Ada (L4, A40, RTX 40-series)90Hopper (H100/H200)Verification
On
sm_89hardware (RTX 4090):ptxascompiles this package's PTX cleanly to bothsm_89andsm_90.sm_89-native_cuda_impl.soproduces bit-identical output (max abs diff 0.0) to the originalsm_75+PTX (JIT'd) path — same math, just no JIT.Notes
0.1.2) is needed for downstream consumers (mergelabs pinsmach-beamform) to pick this up.