Build native cubins for Ada (sm_89) and Hopper (sm_90) by cmcneil · Pull Request #62 · Forest-Neurotech/mach

cmcneil · 2026-06-02T18:13:55Z

Problem

CMAKE_CUDA_ARCHITECTURES is 75 only (Turing/T4, for the GitHub CI runner), so the published wheel ships an sm_75 cubin plus compute_75 PTX — and nothing else.

On a non-Turing GPU there's no matching cubin, so the driver must JIT the PTX at load time. A PTX JIT requires a driver at least as new as the toolkit that emitted the PTX (this package's is CUDA 12.x). That breaks on Google Cloud Dataflow, whose GPU workers run an R535 / CUDA 12.2 driver.

L4 = sm_89 → no native cubin → driver tries to JIT compute_75 PTX
the PTX is ISA 8.3 (CUDA 12.3); R535 only parses ≤ 8.2 (CUDA 12.2)

Why it wasn't caught

The failure needs two things at once: a GPU that isn't sm_75 (forces a JIT) and a driver too old to JIT the PTX. Neither dev nor CI hits both:

Environment	GPU	Driver	Path	Result
GitHub CI	T4 `sm_75`	—	native `sm_75` cubin, no JIT	✅
pathfinder	RTX 5090 `sm_120`	R580 / CUDA 13	JITs PTX (new driver can)	✅
Dataflow	L4 `sm_89`	R535 / CUDA 12.2	must JIT, old driver can't	❌ 222

Fix

set(CMAKE_CUDA_ARCHITECTURES 75 89 90)

75 Turing (T4 / CI) — unchanged
89 Ada (L4, A40, RTX 40-series)
90 Hopper (H100/H200)

Verification

On sm_89 hardware (RTX 4090):

ptxas compiles this package's PTX cleanly to both sm_89 and sm_90.
A beamform with a locally-rebuilt sm_89-native _cuda_impl.so produces bit-identical output (max abs diff 0.0) to the original sm_75+PTX (JIT'd) path — same math, just no JIT.

Notes

A release (e.g. 0.1.2) is needed for downstream consumers (mergelabs pins mach-beamform) to pick this up.

CMAKE_CUDA_ARCHITECTURES was 75 only (Turing/T4 CI runner), so the wheel ships an sm_75 cubin + compute_75 PTX and nothing else. On a non-Turing GPU the driver must JIT that PTX, and a JIT requires a driver at least as new as the build toolkit. On an L4 (sm_89) with Dataflow's R535 / CUDA 12.2 driver, JITing this package's CUDA 12.x PTX fails at kernel launch with cudaErrorUnsupportedPtxVersion (code 222). Add sm_89 (Ada: L4, RTX 40-series) and sm_90 (Hopper: H100/H200) so those GPUs load native SASS with no JIT, independent of driver version. Native cubins are gated only by hardware support, so an R535-class driver runs them fine. Verified locally: ptxas compiles this package's PTX to both sm_89 and sm_90, and a beamform on sm_89 hardware (RTX 4090) produces bit-identical output to the JIT'd sm_75+PTX path.

qodo-code-review · 2026-06-02T18:14:02Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0)

1. Unpinned CMake for sm_90 🐞 Bug ☼ Reliability

Description

The PR adds sm_89/sm_90 to CMAKE_CUDA_ARCHITECTURES, but the wheel build configuration doesn’t
install/pin CMake, so the build can fail depending on the builder image’s preinstalled CMake
version. This makes wheel/CI builds fragile and environment-dependent after this change.

Code

CMakeLists.txt[67]

Evidence

The PR change requires CMake to handle additional CUDA architectures, but the repo’s wheel build
configuration does not ensure CMake is present/up-to-date (it’s neither in PEP517 build requirements
nor installed in cibuildwheel’s pre-build step). The repository’s developer build path explicitly
installs CMake, indicating it’s a required build tool that CI should also pin/install.

CMakeLists.txt[55-68]
pyproject.toml[1-3]
pyproject.toml[230-252]
Makefile[42-50]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The project now hard-codes `CMAKE_CUDA_ARCHITECTURES` to include `89` and `90`, but the wheel build flow does not explicitly install CMake (nor a minimum version). As a result, the wheel build may succeed or fail depending on the CMake version baked into the manylinux/builder image.
## Issue Context
- CMake is a build-time dependency and the repo already treats it as such in developer workflows (e.g., `make compile`).
- `cibuildwheel` installs `scikit-build-core`, `nanobind`, and `ninja`, but not `cmake`.
## Fix Focus Areas
- Add `cmake` (with a minimum version) to the build environment used for wheel builds (and ideally to PEP517 build requirements too).
### Recommended changes
1. Update `pyproject.toml`:
- Add `cmake>=<min_version>` (and optionally `ninja`) to `[build-system].requires`, **or**
- Add `cmake>=<min_version>` to `[tool.cibuildwheel].before-build` (since you already install other build tools there).
2. Keep versions aligned with the CUDA builder images used in CI.
## Fix Focus Areas (exact locations)
- pyproject.toml[1-3]
- pyproject.toml[230-252]
- CMakeLists.txt[55-68]
- Makefile[42-50]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Spurious driver-age warning 🐞 Bug ◔ Observability

Description

Even after adding sm_89/sm_90 cubins specifically to avoid PTX JIT on older drivers, the extension
still warns at import-time whenever driver < NVCC version, which can be misleading/noisy when the
native cubin path works. This can also become an actual failure in environments that treat warnings
as errors.

Code

CMakeLists.txt[R55-67]

Evidence

The PR change documents that native cubins are being added specifically to avoid PTX JIT on older
drivers, but the runtime code still warns solely based on driver vs NVCC version and runs the check
on every import regardless of whether a matching cubin exists.

CMakeLists.txt[55-67]
src/mach/kernel.cu[351-388]
src/mach/kernel.cu[1211-1220]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`checkCudaDriverCompatibility()` currently warns whenever `cudaDriverGetVersion()` reports a driver version older than the NVCC version used to compile the module. With this PR, we now intentionally ship native cubins for the deployed architectures (75/89/90) to avoid PTX JIT on older drivers; the unconditional driver-vs-NVCC warning becomes misleading/noisy.
## Issue Context
- The new CMake comment explicitly states native cubins avoid PTX JIT requirements on older drivers.
- The nanobind module calls `checkCudaDriverCompatibility()` on every import.
## Fix Focus Areas
- src/mach/kernel.cu[351-388]
- src/mach/kernel.cu[1211-1220]
- CMakeLists.txt[55-67]
## Suggested implementation direction
- Only warn about driver < NVCC when the current device does *not* have a matching compiled cubin (i.e., when a PTX JIT is likely/required).
- To do that robustly, pass the compiled architecture list into the binary via `target_compile_definitions` (e.g., `BUILT_CUDA_ARCHS="75;89;90"`) and compare it against `cudaGetDeviceProperties()` major/minor at import time.
- Alternatively, soften the message to explicitly state it only matters when PTX JIT is needed (no matching cubin).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. NVCC min gate mismatched 🐞 Bug ⚙ Maintainability

Description

NVCC_MIN_VERSION is still set to 11.0 even though the project’s CI/wheel builds are standardized
on CUDA 12.x and the PR now targets Ada/Hopper explicitly, so the configure-time guard no longer
reflects the project’s real toolchain expectations. This increases the chance that unsupported build
environments get past configuration and fail later in compilation.

Code

CMakeLists.txt[67]

Evidence

The repo’s own CI and wheel build configuration standardize on CUDA 12.x, while the CMake gate still
permits NVCC 11.0; after adding more modern architecture targets, the mismatch becomes more likely
to confuse builders and defer errors to later compilation steps instead of failing at configure time
with a clear requirement.

CMakeLists.txt[55-80]
.github/actions/setup-cuda-python-env/action.yml[13-21]
pyproject.toml[248-252]
README.md[36-51]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The CMake configure-time NVCC version gate (`NVCC_MIN_VERSION`) is set to `11.0`, but the repo’s CI and wheel builds clearly target CUDA 12.x toolchains. With the PR adding explicit Ada/Hopper targets, it’s better to fail fast with a minimum NVCC version that matches the supported build toolchains.
## Issue Context
- CI installs CUDA 12.6.3 for GPU tests.
- Wheel builds use a manylinux image tagged with CUDA 12.3.
- CMake currently allows NVCC 11.0.
## Fix Focus Areas
- Update `NVCC_MIN_VERSION` to match the minimum CUDA toolkit used/supported by the project’s build pipelines (e.g., 12.3 if that’s the wheel image baseline).
- Keep README/build docs consistent with the enforced minimum.
## Fix Focus Areas (exact locations)
- CMakeLists.txt[55-80]
- .github/actions/setup-cuda-python-env/action.yml[13-21]
- pyproject.toml[248-252]
- README.md[36-51]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-02T18:18:20Z

+#   90  = Hopper  (H100/H200)
 # https://developer.nvidia.com/cuda-gpus
-set(CMAKE_CUDA_ARCHITECTURES 75)
+set(CMAKE_CUDA_ARCHITECTURES 75 89 90)


1. Unpinned cmake for sm_90 🐞 Bug ☼ Reliability

The PR adds sm_89/sm_90 to CMAKE_CUDA_ARCHITECTURES, but the wheel build configuration doesn’t install/pin CMake, so the build can fail depending on the builder image’s preinstalled CMake version. This makes wheel/CI builds fragile and environment-dependent after this change.

Agent Prompt

## Issue description The project now hard-codes `CMAKE_CUDA_ARCHITECTURES` to include `89` and `90`, but the wheel build flow does not explicitly install CMake (nor a minimum version). As a result, the wheel build may succeed or fail depending on the CMake version baked into the manylinux/builder image. ## Issue Context - CMake is a build-time dependency and the repo already treats it as such in developer workflows (e.g., `make compile`). - `cibuildwheel` installs `scikit-build-core`, `nanobind`, and `ninja`, but not `cmake`. ## Fix Focus Areas - Add `cmake` (with a minimum version) to the build environment used for wheel builds (and ideally to PEP517 build requirements too). ### Recommended changes 1. Update `pyproject.toml`: - Add `cmake>=<min_version>` (and optionally `ninja`) to `[build-system].requires`, **or** - Add `cmake>=<min_version>` to `[tool.cibuildwheel].before-build` (since you already install other build tools there). 2. Keep versions aligned with the CUDA builder images used in CI. ## Fix Focus Areas (exact locations) - pyproject.toml[1-3] - pyproject.toml[230-252] - CMakeLists.txt[55-68] - Makefile[42-50]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-02T18:18:22Z

Code review by qodo was updated up to the latest commit f1bbe33

charlesbmi

LGTM, thanks!

qodo-code-review Bot reviewed Jun 2, 2026

View reviewed changes

Shrink excessive comment.

ad6cf68

cmcneil requested review from charlesbmi and dperdios June 2, 2026 18:36

charlesbmi approved these changes Jun 2, 2026

View reviewed changes

cmcneil removed the request for review from dperdios June 2, 2026 19:05

cmcneil merged commit e21b5fe into main Jun 2, 2026
8 checks passed

cmcneil deleted the build-native-sm89-sm90 branch June 2, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build native cubins for Ada (sm_89) and Hopper (sm_90)#62

Build native cubins for Ada (sm_89) and Hopper (sm_90)#62
cmcneil merged 2 commits into
mainfrom
build-native-sm89-sm90

cmcneil commented Jun 2, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot Jun 2, 2026

Uh oh!

cmcneil Jun 2, 2026

Uh oh!

qodo-code-review Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

charlesbmi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cmcneil commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Why it wasn't caught

Fix

Verification

Notes

Uh oh!

qodo-code-review Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

cmcneil Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlesbmi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmcneil commented Jun 2, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 2, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 2, 2026 •

edited

Loading