-
Notifications
You must be signed in to change notification settings - Fork 7
Cuda Python #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Cuda Python #28
Conversation
Ptt param change
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
This is a pretty significant change to the code base. It's a totally different API and hopefully more python-user oriented. I could of done most of this with the old setup, but I didn't like depending on pybind and a c compiler, and also I just wanted to try out CUDA Python. I hope these changes will make it so more people use this! I find this software super useful still, but it has not caught on yet in the wider community. We are planning on setting the current master to version 1.0, then merging this and calling this 2.0. Then, we plan to put this on pypi. I am wondering, does this sound good to you two? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR represents a major architectural refactoring that removes the C++/pybind11 build pipeline in favor of CUDA Python with JIT compilation. The changes include:
- Migration from C++ bindings to pure Python with CUDA Python for GPU code compilation
- Streamlined Python API with new
GPUTrackercontext manager and direction getter classes - Improved PTT algorithm implementation with better parallelization
- Separation of bootstrapping code from probabilistic/PTT tracking code
- Simplified build system using setuptools instead of scikit-build-core
Key Changes:
- Removes ~2400 lines of C++/CUDA code and replaces with organized Python + CUDA modules
- Introduces runtime compilation via CUDA Python instead of pre-compilation
- Adds new helper classes (BootDirectionGetter, ProbDirectionGetter, PttDirectionGetter)
- Updates API to be more user-friendly with better batching and TRX support
Reviewed changes
Copilot reviewed 28 out of 31 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| setup.py | New build script that auto-generates Python constants from globals.h |
| pyproject.toml | Updated to use setuptools with cuda-python dependencies |
| cuslines/init.py | New package entry point exposing GPUTracker and direction getters |
| cuslines/cuda_python/*.py | New Python modules implementing GPU tracking with CUDA Python |
| cuslines/cuda_c/*.cu | Refactored CUDA kernels separated by algorithm type |
| run_gpu_streamlines.py | Simplified example using new API |
| globals.h | Updated constants including REAL_SIZE change to 4 (float32) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This is a large PR that does a lot of things:
(1) removes c++ and pybind layer in favor of cuda python. This means no more need for c compiler and that the package can now be pip installed, and should be put on pypi. For now, cuda python also uses the real time nvidia compiler. This seems to work. It only takes a second or two to compile, and the compiled code is not meaningfully slower. However, we can easily switch back to using nvcc if we want to, because cuda python can also interact with code that's already compiled. As a part of removing the C++ code and dependencies, I have also removed the dump streamlines call, and I am working on doing that in python instead. For now, anyone with a large tractography should really be using TRX anyways, which is implemented.
(2) streamlines the python API, such that users can more easily pass in their data and let GPUstreamlines handle things like trx generation, batching, some of the model parameters, etc.
(3) Updates PTT algorithm, which is now working great! After this PR, I can start optimizing / adding features to this new feature which will hopefully attract more users
(4) refactors the cuda c code (without really changing it) to separate the original bootstrapping code from the PTT and probabilistic code. These approaches require different inputs and are sufficiently different such that I think it makes sense to keep them separate in the code base. From the users perspective, this change does not matter.
(5) sets floats to single precision by default. This can still be changed in the header files back to double at any time. For now, this increases performance without changing streamline outputs meaningfully.