Skip to content

HIP Library, main branch (2025.11.13.)#1193

Open
krasznaa wants to merge 3 commits intoacts-project:mainfrom
krasznaa:HIPBackend-main-20251113
Open

HIP Library, main branch (2025.11.13.)#1193
krasznaa wants to merge 3 commits intoacts-project:mainfrom
krasznaa:HIPBackend-main-20251113

Conversation

@krasznaa
Copy link
Copy Markdown
Member

This all started out with me trying to see if CoPilot would be able to do all of this automatically / by itself. Unfortunately that all worked very poorly, so I rather did the whole migration by hand in the end. 😦

But still, I wanted to start making a traccc::hip library finally.

To start things off, I converted traccc::cuda::clusterization_algorithm into a traccc::hip::clusterization_algorithm. Code-wise it was pretty much a copy-paste, with a cuda => hip replacement applied. The only tricky part was to convince the build system to behave more or less reliably.

Which for me specifically it by now does. But I'll need some additional eyes as well. @StewMH, please have a look. Specifically at what ( did to the setup of rocThrust. As it was giving me a world of hurt. 😦

For now I didn't do anything specific to the CI setup. But (for me a bit) surprisingly TRACCC_BUILD_HIP is already exercised by the "ALPAKA HIP SYCL" build. 😕 @StewMH, should we take things a bit more apart? The Alpaka build should really only test builds through Alpaka. I'd rather test the build of "native HIP code" in a separate CI job. Or rather, ideally we'd add 2 new CI jobs. One building the HIP code with its AMD backend, and another one using the NVIDIA backend. As one can very easily break either one of these without breaking the other. 😦

@krasznaa krasznaa added the hip Changes related to AMD HIP label Nov 13, 2025
@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 08d35fa to dcedeab Compare November 13, 2025 16:00
@StewMH
Copy link
Copy Markdown
Contributor

StewMH commented Nov 17, 2025

Hi Attila,

Great to see this. TRACCC_BUILD_HIP is enabled for Alpaka-HIP so that the rocThrust tests are also built. Didn't seem worth having its own build just for this, but now that we have all this code it definitely makes sense to test it properly.

@stephenswat
Copy link
Copy Markdown
Member

We need to have a serious conversation about the number of backends that we have now. If this PR goes in, we have four different methods of targetting AMD GPUs (HIP, Kokkos, Alpaka, SYCL), or which three (HIP, Kokkos, Alpaka) deliver exactly the same code to the ROCm compiler and can therefore, almost by definition, not lead to any meaningful performance benefits.

As with all the backends for traccc, this puts the maintenance burden on the people actually developing the algorithms. We've gone from having to duplicate all our changes twice to three times to four times and now to five times. It's not a sustainable development model.

@StewMH
Copy link
Copy Markdown
Contributor

StewMH commented Nov 17, 2025

Alpaka and hip tests work fine for me, without getting into bigger issues.

@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from dcedeab to 9abb85d Compare November 17, 2025 15:16
@krasznaa krasznaa marked this pull request as ready for review November 17, 2025 15:16
@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 9abb85d to e410533 Compare November 20, 2025 08:43
@sonarqubecloud
Copy link
Copy Markdown

@krasznaa
Copy link
Copy Markdown
Member Author

krasznaa commented Jan 16, 2026

Unfortunately I was having a very difficult time building traccc::hip::kernels::ccl_kernel using ROCm 6.2 on Ubuntu 24.04. (It's a weird incompatibility between ROCm 6.2's version of clang and the GCC 14 version that it finds on my private machine.) But with ROCm 7.1 the build succeeds happily. So as long as the CI is happy as well, I don't want to complicate things further.

So... 🤞

@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 931e7bf to 18f06c2 Compare January 17, 2026 12:01
Added mostly infrastructure code to it at first, along with a
clusterization algorithm. All absolutely based on the CUDA code,
with very minimal changes.
@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 18f06c2 to 30e9eee Compare January 21, 2026 09:39
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hip Changes related to AMD HIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants