Add dejagnu tests for cooperative group GWS debugging#116
Add dejagnu tests for cooperative group GWS debugging#116spatrang wants to merge 1 commit intoamd-stagingfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new ROCm dejagnu coverage to exercise ROCgdb debugging of cooperative-group HIP kernels that synchronize via GWS, covering both single-device this_grid().sync() and multi-device this_multi_grid().sync() scenarios.
Changes:
- Introduces a single-device cooperative-kernel test that breaks before/after
grid.sync()and validates waves/dispatch visibility. - Introduces a multi-device cooperative-kernel non-stop test that breaks inside a multi-grid kernel and runs through grid + multi-grid barriers to completion.
- Adds two HIP C++ test programs that implement the cooperative-group synchronization patterns and validate results on the host side.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| gdb/testsuite/gdb.rocm/coop-group-grid-sync.exp | DejaGnu test for single-device cooperative kernel debugging around this_grid().sync(). |
| gdb/testsuite/gdb.rocm/coop-group-grid-sync.cpp | HIP program implementing single-device cooperative grid.sync() and host-side validation. |
| gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp | DejaGnu non-stop test for multi-device cooperative kernel debugging through this_grid().sync() + this_multi_grid().sync(). |
| gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.cpp | HIP program implementing multi-device cooperative launch with cross-device aggregation and validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af78fac to
33f1926
Compare
lancesix
left a comment
There was a problem hiding this comment.
Hi,
Thanks a lot for this, this is a great starting point.
My main concern for now is gfx110x. We do not support debugging cooperative group on those (documented limitation), the testcase should look for them to not FAIL. This is known that the test will not pass even if the arch do support GWS.
I have added a couple of small comments, I'll get back to a more detailed review after the gfx11 concern has been addressed.
|
|
||
| constexpr unsigned int group_size = 64; | ||
| constexpr unsigned int num_groups = 2; | ||
| constexpr int total_threads = group_size * num_groups; |
There was a problem hiding this comment.
Why are group_size and num_groups unsigned, but not total_threads?
| } | ||
| -re "\\\[Inferior 1 \[^\r\n\]* exited normally\\\]\[^\r\n\]*\r\n$::gdb_prompt " { | ||
| # Program skipped without our diagnostic line, but still | ||
| # exited cleanly. Treat as unsupported. |
There was a problem hiding this comment.
Which case is this covering? Shouldn't there be an XFAIL if the environment is broken to a point where we do not see the message?
|
|
||
| # The kernel must hit the pre-sync breakpoint. | ||
| if {![continue_to_kernel_or_skip "stop before grid.sync"]} { | ||
| return |
There was a problem hiding this comment.
We should report UNSUPPORTED here.
| # architectures). | ||
| set waves [info_thread_get_wave_list] | ||
| gdb_assert {[llength $waves] >= 2} \ | ||
| "at least two AMDGPU waves are present pre-sync" |
There was a problem hiding this comment.
Can there be a race where, when running on wave32, we just have the 2 waves of a single workgroup, but not the waves of the other?
| [gdb_get_line_number "before-sync line"] allow-pending | ||
|
|
||
| if {![continue_to_kernel_or_skip "stop in cooperative kernel"]} { | ||
| return |
| hipLaunchCooperativeKernelMultiDevice. The kernel uses both | ||
| cooperative_groups::this_grid ().sync () (intra-device GWS) and | ||
| cooperative_groups::this_multi_grid ().sync () (cross-device sync), | ||
| matching the historical CQE GWS test exercised on multiple ASICs. |
There was a problem hiding this comment.
I would probably say "historical out of tree" or something, CQE does not really mean much outside of AMD.
| constexpr int MAX_GPUS = 8; | ||
| constexpr int N_PER_DEVICE = 256; | ||
| constexpr unsigned int group_size = 64; | ||
| constexpr unsigned int num_groups = 2; |
There was a problem hiding this comment.
Why mix signed and unsigned?
| /* Use small but non-trivial launch dimensions so the test runs quickly | ||
| while still creating multiple waves participating in the GWS | ||
| barrier. GROUP_SIZE = 64 means 1 wave on wave64 architectures and | ||
| 2 waves on wave32 architectures, for a total of 2 (or 4) waves. |
There was a problem hiding this comment.
can,t we check at runtime if we have wave32 or wave64? Could also build the testsuite with -mwavefrontsize64 to force wave64.
| if {[build_executable "failed to prepare" $testfile $srcfile {debug hip}]} { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
We do not support debugging cooperative groups on gfx110x (see documented limitations). Testing this on gfx1100, I see:
FAIL: gdb.rocm/coop-group-grid-sync.exp: test_break_around_grid_sync: stop before grid.sync (the program exited)
FAIL: gdb.rocm/coop-group-grid-sync.exp: test_threads_in_coop_kernel: stop in cooperative kernel (the program exited)
The testcase should probably detect for those devices and issue a XFAIL.
Summary
Add dejagnu coverage for debugging AMD GPU cooperative-group kernels —
i.e. kernels launched via
hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevicethat synchronize at the grid /multi-grid level. On AMD GPUs these synchronization primitives are
implemented in hardware via Global Wave Sync (GWS), and they have a
distinct wave/scheduling model that has historically only been covered by
out-of-tree tests. This PR brings that coverage into the dejagnu testsuite
so it runs as part of the regular ROCgdb regression suite.
Tests added
gdb.rocm/coop-group-grid-sync.{cpp,exp}cooperative_groups::this_grid().sync()(intra-device GWS), launched viahipLaunchCooperativeKernel.gdb.rocm/coop-group-multi-grid-sync.{cpp,exp}this_grid().sync()andcooperative_groups::this_multi_grid().sync()(intra + cross-device GWS), launched viahipLaunchCooperativeKernelMultiDevice.What gets verified
coop-group-grid-sync.exp— two sub-tests:test_break_around_grid_syncgrid.sync()inside a cooperative dispatch.AMDGPU Wavethreads are stopped (waves participatingin the GWS barrier).
info dispatcheslists the cooperative dispatch.grid.sync()and continue: it mustfire (proves GWS-protected code can be debugged across the barrier).
test_threads_in_coop_kernelAMDGPU Waveparked inside the kernel, switch to it andconfirm
bt 1reports a frame insidecoop_grid_sync_kernel.coop-group-multi-grid-sync.exp— runs in non-stop mode:continue -a &, confirm a kernel-side breakpoint fires insidecoop_multi_grid_sync_kernel. Per-GPU child breakpoint instances(
Breakpoint X.Y) are observed for every participating GPU.this_grid().sync()andthis_multi_grid().sync()release correctlyunder the debugger.
The host-side post-conditions in the
.cppprograms additionally validatethe cooperative semantics numerically (cross-workgroup data dependency for
the single-device case, cross-device sum aggregation for the multi-device
case), so any regression in GWS behavior under the debugger turns into a
test failure rather than a silent miscompare.
Skip / unsupported handling
The tests degrade cleanly on systems that cannot run them:
cooperativeLaunch; if unsupported theprogram prints a recognizable message and exits, and the
.expmarksthe test
UNSUPPORTED.>= 2GPUs andcooperativeMultiDeviceLaunchon every device. It is also gated bythe existing
hip_devices_support_debug_multi_processrequirement.Any of those missing →
UNSUPPORTED.No new dejagnu helpers are required; both
.expfiles use existinginfrastructure in
lib/rocm.exp.Out of scope / follow-ups
Intentionally left out of this PR; happy to extend if reviewers ask:
next/step/stepi) acrossgrid.sync()/mgrid.sync()boundaries.lane apply/ per-lane register inspection while waves are parked atthe GWS barrier.