Skip to content

Add dejagnu tests for cooperative group GWS debugging#116

Open
spatrang wants to merge 1 commit intoamd-stagingfrom
users/spatrang/coop-group-gws-tests
Open

Add dejagnu tests for cooperative group GWS debugging#116
spatrang wants to merge 1 commit intoamd-stagingfrom
users/spatrang/coop-group-gws-tests

Conversation

@spatrang
Copy link
Copy Markdown

@spatrang spatrang commented May 7, 2026

Summary

Add dejagnu coverage for debugging AMD GPU cooperative-group kernels —
i.e. kernels launched via hipLaunchCooperativeKernel /
hipLaunchCooperativeKernelMultiDevice that synchronize at the grid /
multi-grid level. On AMD GPUs these synchronization primitives are
implemented in hardware via Global Wave Sync (GWS), and they have a
distinct wave/scheduling model that has historically only been covered by
out-of-tree tests. This PR brings that coverage into the dejagnu testsuite
so it runs as part of the regular ROCgdb regression suite.

Tests added

File Scenario
gdb.rocm/coop-group-grid-sync.{cpp,exp} Single-device cooperative kernel using cooperative_groups::this_grid().sync() (intra-device GWS), launched via hipLaunchCooperativeKernel.
gdb.rocm/coop-group-multi-grid-sync.{cpp,exp} Multi-device cooperative kernel using both this_grid().sync() and cooperative_groups::this_multi_grid().sync() (intra + cross-device GWS), launched via hipLaunchCooperativeKernelMultiDevice.

What gets verified

coop-group-grid-sync.exp — two sub-tests:

  • test_break_around_grid_sync
    • Hit a breakpoint before grid.sync() inside a cooperative dispatch.
    • Confirm multiple AMDGPU Wave threads are stopped (waves participating
      in the GWS barrier).
    • Confirm info dispatches lists the cooperative dispatch.
    • Move the breakpoint to after grid.sync() and continue: it must
      fire (proves GWS-protected code can be debugged across the barrier).
    • Continue to clean program exit.
  • test_threads_in_coop_kernel
    • For every AMDGPU Wave parked inside the kernel, switch to it and
      confirm bt 1 reports a frame inside coop_grid_sync_kernel.

coop-group-multi-grid-sync.exp — runs in non-stop mode:

  • After continue -a &, confirm a kernel-side breakpoint fires inside
    coop_multi_grid_sync_kernel. Per-GPU child breakpoint instances
    (Breakpoint X.Y) are observed for every participating GPU.
  • Continue all threads to program exit, which only succeeds if both
    this_grid().sync() and this_multi_grid().sync() release correctly
    under the debugger.

The host-side post-conditions in the .cpp programs additionally validate
the cooperative semantics numerically (cross-workgroup data dependency for
the single-device case, cross-device sum aggregation for the multi-device
case), so any regression in GWS behavior under the debugger turns into a
test failure rather than a silent miscompare.

Skip / unsupported handling

The tests degrade cleanly on systems that cannot run them:

  • Single-device test: queries cooperativeLaunch; if unsupported the
    program prints a recognizable message and exits, and the .exp marks
    the test UNSUPPORTED.
  • Multi-device test: requires >= 2 GPUs and
    cooperativeMultiDeviceLaunch on every device. It is also gated by
    the existing hip_devices_support_debug_multi_process requirement.
    Any of those missing → UNSUPPORTED.

No new dejagnu helpers are required; both .exp files use existing
infrastructure in lib/rocm.exp.

Out of scope / follow-ups

Intentionally left out of this PR; happy to extend if reviewers ask:

  • Stepping (next / step / stepi) across grid.sync() /
    mgrid.sync() boundaries.
  • Conditional breakpoints inside cooperative kernels.
  • lane apply / per-lane register inspection while waves are parked at
    the GWS barrier.
  • Watchpoints on cooperative shared buffers.

@spatrang spatrang requested review from Copilot and lumachad May 7, 2026 13:08
@spatrang spatrang marked this pull request as ready for review May 7, 2026 13:14
@spatrang spatrang requested a review from a team as a code owner May 7, 2026 13:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new ROCm dejagnu coverage to exercise ROCgdb debugging of cooperative-group HIP kernels that synchronize via GWS, covering both single-device this_grid().sync() and multi-device this_multi_grid().sync() scenarios.

Changes:

  • Introduces a single-device cooperative-kernel test that breaks before/after grid.sync() and validates waves/dispatch visibility.
  • Introduces a multi-device cooperative-kernel non-stop test that breaks inside a multi-grid kernel and runs through grid + multi-grid barriers to completion.
  • Adds two HIP C++ test programs that implement the cooperative-group synchronization patterns and validate results on the host side.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
gdb/testsuite/gdb.rocm/coop-group-grid-sync.exp DejaGnu test for single-device cooperative kernel debugging around this_grid().sync().
gdb/testsuite/gdb.rocm/coop-group-grid-sync.cpp HIP program implementing single-device cooperative grid.sync() and host-side validation.
gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp DejaGnu non-stop test for multi-device cooperative kernel debugging through this_grid().sync() + this_multi_grid().sync().
gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.cpp HIP program implementing multi-device cooperative launch with cross-device aggregation and validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp Outdated
Comment thread gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp Outdated
@spatrang spatrang force-pushed the users/spatrang/coop-group-gws-tests branch from af78fac to 33f1926 Compare May 7, 2026 13:35
Copy link
Copy Markdown
Collaborator

@lancesix lancesix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks a lot for this, this is a great starting point.

My main concern for now is gfx110x. We do not support debugging cooperative group on those (documented limitation), the testcase should look for them to not FAIL. This is known that the test will not pass even if the arch do support GWS.

I have added a couple of small comments, I'll get back to a more detailed review after the gfx11 concern has been addressed.


constexpr unsigned int group_size = 64;
constexpr unsigned int num_groups = 2;
constexpr int total_threads = group_size * num_groups;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are group_size and num_groups unsigned, but not total_threads?

}
-re "\\\[Inferior 1 \[^\r\n\]* exited normally\\\]\[^\r\n\]*\r\n$::gdb_prompt " {
# Program skipped without our diagnostic line, but still
# exited cleanly. Treat as unsupported.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which case is this covering? Shouldn't there be an XFAIL if the environment is broken to a point where we do not see the message?


# The kernel must hit the pre-sync breakpoint.
if {![continue_to_kernel_or_skip "stop before grid.sync"]} {
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should report UNSUPPORTED here.

# architectures).
set waves [info_thread_get_wave_list]
gdb_assert {[llength $waves] >= 2} \
"at least two AMDGPU waves are present pre-sync"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a race where, when running on wave32, we just have the 2 waves of a single workgroup, but not the waves of the other?

[gdb_get_line_number "before-sync line"] allow-pending

if {![continue_to_kernel_or_skip "stop in cooperative kernel"]} {
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Report UNSUPPORTED here.

hipLaunchCooperativeKernelMultiDevice. The kernel uses both
cooperative_groups::this_grid ().sync () (intra-device GWS) and
cooperative_groups::this_multi_grid ().sync () (cross-device sync),
matching the historical CQE GWS test exercised on multiple ASICs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably say "historical out of tree" or something, CQE does not really mean much outside of AMD.

constexpr int MAX_GPUS = 8;
constexpr int N_PER_DEVICE = 256;
constexpr unsigned int group_size = 64;
constexpr unsigned int num_groups = 2;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mix signed and unsigned?

/* Use small but non-trivial launch dimensions so the test runs quickly
while still creating multiple waves participating in the GWS
barrier. GROUP_SIZE = 64 means 1 wave on wave64 architectures and
2 waves on wave32 architectures, for a total of 2 (or 4) waves.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can,t we check at runtime if we have wave32 or wave64? Could also build the testsuite with -mwavefrontsize64 to force wave64.

if {[build_executable "failed to prepare" $testfile $srcfile {debug hip}]} {
return
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not support debugging cooperative groups on gfx110x (see documented limitations). Testing this on gfx1100, I see:

FAIL: gdb.rocm/coop-group-grid-sync.exp: test_break_around_grid_sync: stop before grid.sync (the program exited)
FAIL: gdb.rocm/coop-group-grid-sync.exp: test_threads_in_coop_kernel: stop in cooperative kernel (the program exited)

The testcase should probably detect for those devices and issue a XFAIL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants