-
Notifications
You must be signed in to change notification settings - Fork 19
Add ROCm support for xobjects. #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
TODO: Test the same setup on different PCs with different rocm versions. Ideally with newer GPUs that support ROCm 7, which allows for this pre-built wheel to be used, significantly reducing installation complexity: https://rocm.blogs.amd.com/artificial-intelligence/cupy-v13/README.html |
|
DONE: Add documentation on procedure to set up ROCm and build CuPy from source in the xsuite docs |
|
BUG: When running pytest, the memory is not being freed-up in between tests. Patch might be required for this. EDIT: This appears to happen on nvidia as well |
|
Related: |
szymonlopaciuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good, I don't see why we shouldn't merge it as-is, as it definitely won't disturb the current workflows.
|
Successfully tested this on GSI HPC with AMD MI100 and ROCm version 6.8.5 using the container prepared by @ekatralis here: Lattice is a simple FODO lattice (Drift, Multipole, Drift, Multipole), tracking 1e+06 particles over 1000 turns, experiment repeated N=5 times for uncertainty.
Full output |
|
Very interesting! The speed of cupy us promising but still painful. Do you have some explanation? Can you do the same excessive, but with nvidia? |
A plausible explanation for these results could be that we are using an older version of ROCm (6.x) and building CuPy from source. On ROCm 7.x AMD has their own cupy fork (which is supposed to be merged in v14) which should offer improved performance: https://rocm.blogs.amd.com/artificial-intelligence/cupy-v13/README.html I repeated the same test on a Titan V (TR 2970WX for CPU) for Nvidia using the same methodology (average over 5 runs):
Full outputFor reference here is the same test on a Radeon VII (TR 1950X CPU) as well:
Full output |
Description
This pull request adds rocm support for xobjects when using ContextCupy(). Includes changes to the headers so that they are compatible with the rocm definitions. This has been tested in the following configuration:
CuPy can be configured as follows:
xobjects tets are passing. xtrack tests are passing as well.
Checklist
Mandatory:
Optional: