I am trying to understand the current state of source-level debugging for OpenCL kernels running on GPUs.
For CPU OpenCL implementations, source-level debugging is generally possible using standard debuggers. However, I am specifically interested in debugging OpenCL kernels executing on actual GPU hardware.
My questions are:
-
Is there currently a full-fledged debugger that supports source-level debugging of OpenCL kernels on GPUs?
-
Can such a debugger:
- Set breakpoints inside kernels?
- Single-step kernel instructions?
- Inspect kernel variables (private, local, and global memory)?
- Inspect work-item and work-group state?
- Switch between individual work-items and inspect each work-item's execution state independently?
- View call stacks and source line mappings?
-
Are there vendor-specific solutions (AMD, Intel, NVIDIA) that provide this functionality?
-
How is this typically implemented under the hood, given that GPU execution is based on wavefronts/warps rather than independently scheduled threads?
-
If true source-level debugging is not generally available, what are the main technical limitations that make it difficult?
I would also appreciate references to any open-source or commercial tools that support OpenCL kernel debugging, as well as any papers, documentation, or presentations describing the current state of GPU debugging.
My goal is to understand whether OpenCL kernel debugging on GPUs has reached a level comparable to CPU debugging with GDB/LLDB, or whether developers still primarily rely on printf-style debugging, simulators, and profiling tools.
For context, I am particularly interested in AMD ROCm, Intel GPU runtimes, PoCL, and other modern OpenCL implementations.
I am trying to understand the current state of source-level debugging for OpenCL kernels running on GPUs.
For CPU OpenCL implementations, source-level debugging is generally possible using standard debuggers. However, I am specifically interested in debugging OpenCL kernels executing on actual GPU hardware.
My questions are:
Is there currently a full-fledged debugger that supports source-level debugging of OpenCL kernels on GPUs?
Can such a debugger:
Are there vendor-specific solutions (AMD, Intel, NVIDIA) that provide this functionality?
How is this typically implemented under the hood, given that GPU execution is based on wavefronts/warps rather than independently scheduled threads?
If true source-level debugging is not generally available, what are the main technical limitations that make it difficult?
I would also appreciate references to any open-source or commercial tools that support OpenCL kernel debugging, as well as any papers, documentation, or presentations describing the current state of GPU debugging.
My goal is to understand whether OpenCL kernel debugging on GPUs has reached a level comparable to CPU debugging with GDB/LLDB, or whether developers still primarily rely on printf-style debugging, simulators, and profiling tools.
For context, I am particularly interested in AMD ROCm, Intel GPU runtimes, PoCL, and other modern OpenCL implementations.