gdb/testsuite: support optimized code in gdb.rocm testsuite#103
gdb/testsuite: support optimized code in gdb.rocm testsuite#103spatrang wants to merge 8 commits intoamd-stagingfrom
Conversation
Introduce the necessary helper functions into the gdb.rocm framework so that support for optimized code can be implemented. Co-Authored-By: Andrey Kasaurov <andrey.kasaurov@amd.com> Co-Authored-By: Luis Machado <luis.machado@amd.com> Change-Id: I750d64e842413429e281bb8f7d85c4f46319a627 Signed-off-by: Sarang Patrange <spatrang@amd.com>
Co-Authored-By: Luis Machado <luis.machado@amd.com> Change-Id: I750d64e842413429e281bb8f7d85c4f46319a627 Signed-off-by: Sarang Patrange <spatrang@amd.com>
Co-Authored-By: Andrey Kasaurov <andrey.kasaurov@amd.com> Co-Authored-By: Luis Machado <luis.machado@amd.com> Change-Id: I750d64e842413429e281bb8f7d85c4f46319a627 Signed-off-by: Sarang Patrange <spatrang@amd.com>
|
Hi @lancesix This PR is a re-submission of the optimized-code series that was originally
The corresponding fix-up commit is on top of this branch. Please take cc: @lumachad |
|
pre-commit seems to be grumpy. |
* lib/rocm.exp: Add missing periods and fix indentation in
no_optimized_code and no_lto.
* gdb.rocm/device-attach.cpp: Restore the "__global__ void"
qualifier on bit_extract_kernel.
* gdb.rocm/hip-builtin-variables.cpp: Use "volatile int" instead
of the implicit-int "const volatile" declarators, and apply
__attribute__((optnone)) to kern, inner_func1 and inner_func2
so the locals stay observable and the inner functions remain
distinguishable as frames under -O1/-O2/-O3.
* gdb.rocm/hip-builtin-variables.exp: Relax the "next" regex
after stepping into inner_func2 to accept either "return;" or
the closing brace; Clang merges the line entry of the
explicit "return;" with the closing brace under
__attribute__((optnone)).
e05abcd to
12346d2
Compare
* unaligned-memory-access.cpp: noinline on done(), optnone on
kernel() to keep the breakpoint target and the local variable
observable under optimization.
* watch-gpu-global-from-host.cpp: optnone on main() and volatile
int *devGlobal so the triggering store is not tail-optimised.
+82 PASS / -82 FAIL at -O1/-O2/-O3; no -O0 regression.
At -O2/-O3 Clang eliminates bit_extract_kernel's "while (loop_forever)" spin loop (constant condition + side-effect-free body). The kernel returns, the host exits, and "attach <pid>" fails. Make loop_forever volatile so the loop survives optimisation.
|
As stated elsewhere, the series only handles GDB_TESTCASE_OPTIONS. If someone passes CFLAGS directly to the compiler the optimization level checks won't work properly. Do we want to fix that as well? Maybe as a follow up if not in this one? |
|
There's a pre-commit failure. Other than that, I'm still seeing FAIL's for gdb.rocm/watchpoint-basic.exp when running the testsuite with -O1 or -O3. It passes when running with -O0. Could you please check that you're running the testsuite correctly? |
ed63897 to
7c21341
Compare
* static-global, lane-info, deref-scoped-pointer,
scheduler-locking: apply noinline / optnone (and volatile
where the test relies on lvalue-ness) to anchor functions
and kernels so the breakpoint targets, lvalue addresses and
per-statement line entries survive optimization. Drop the
"require no_optimized_code" gate from each .exp.
Validated at -O0, -O2, and -O3 on gfx942: 73 PASS, 0 FAIL.
7c21341 to
a912e0e
Compare
|
Did the latest force-pushes provide anything that would fix gdb.rocm/watchpoint-basic.exp? I can't tell. |
- lib/rocm.exp: check_for_any_flags now also inspects
CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET, so existing
require no_optimized_code gates fire under
test_rocgdb.py.
- watchpoint-basic.exp: skip rwatch/awatch
test_non_write_watchpoint_before_runtime_load sub-cases
as UNSUPPORTED under optimisation; HIP runtime loads
too early for the test's premise to hold.
| if { [check_for_any_flags {-O1 -O2 -O3}] == -1 } { | ||
| # Under -O1/-O2/-O3, the HIP runtime is initialised earlier | ||
| # (during static initialisers/dynamic loading) than at -O0, so | ||
| # by the time main() reaches "Break before runtime load" the | ||
| # amd-dbgapi target is already attached and rejects the | ||
| # host-side read/access watchpoint at insert time. The test's | ||
| # "before runtime load" premise does not hold for optimised | ||
| # builds; skip the sub-test in that case. | ||
| unsupported "$wp_cmd before runtime load (optimised code)" | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Doesn't "require no_optimized_code" cover this? If so, we don't need to call check_for_any_flags here, right?
| global GDB_TESTCASE_OPTIONS | ||
| global GDB_TESTCASE_OPTIONS CFLAGS_FOR_TARGET CXXFLAGS_FOR_TARGET | ||
|
|
||
| if {[llength $flags] && [info exists GDB_TESTCASE_OPTIONS]} { | ||
| if {![llength $flags]} { | ||
| return 0 | ||
| } | ||
|
|
||
| # Sources of compile flags to check, in order: | ||
| # - GDB_TESTCASE_OPTIONS (the standard GDB testsuite override). | ||
| # - CFLAGS_FOR_TARGET / CXXFLAGS_FOR_TARGET (used by some CI | ||
| # drivers, e.g. TheRock's test_rocgdb.py, to inject | ||
| # optimisation levels without going through | ||
| # GDB_TESTCASE_OPTIONS). DejaGnu populates these as Tcl | ||
| # globals from the runtest command line. | ||
| set sources [list] | ||
| foreach var {GDB_TESTCASE_OPTIONS CFLAGS_FOR_TARGET CXXFLAGS_FOR_TARGET} { | ||
| if {[info exists $var]} { | ||
| set val [set $var] | ||
| if {$val ne ""} { | ||
| lappend sources $val | ||
| } | ||
| } | ||
| } | ||
|
|
||
| foreach src $sources { | ||
| foreach flag $flags { | ||
| if {[lsearch -exact $GDB_TESTCASE_OPTIONS $flag] != -1} { | ||
| if {[lsearch -exact $src $flag] != -1} { |
There was a problem hiding this comment.
I think this change needs to be its own commit. Please remember this is an upstream change, so we have to format things by their rules, and upstream gdb's community favors single-purpose commits.
Also, please don't mention test_rocgdb.py as that doesn't exist upstream. It's AMD-specific.
Summary
This series makes the
gdb.rocmtestsuite pass when the test programs arecompiled with optimization (
-O1/-O2/-O3,-flto) passed viaGDB_TESTCASE_OPTIONS, rather than the implicit-O0the suite assumes today.Three commits, split by concern:
Update
gdb.rocmframework to support optimized code — adds helpers ingdb/testsuite/lib/rocm.exp:check_for_any_flags {flags}— detects whetherGDB_TESTCASE_OPTIONScontains any of the listed flags.
gdb_caching_proc no_optimized_code— predicate for-O1/-O2/-O3.gdb_caching_proc no_lto— predicate for-flto.Update some
gdb.rocmtests to support optimized code (22 files) —two patterns applied surgically:
.cpp/.c: addvolatileon locals the test inspects, and__attribute__((optnone))on helper functions the test breakpoints into.Add a trivial
volatile intstatement to empty "break here" functionsso their body is not elided.
.exp: relax line-number arithmetic, backtrace PC formatting,expected source strings after
step, and add a bounded retry loop forextra ticks under optimization.
Skip unsupported tests in case of optimized code (19 files) — for tests
whose semantics cannot be reconciled with optimized code, add
require no_optimized_codeat the top;mi-aspace.expalso getsrequire no_lto.register-watchpoint.expsoft-skips the tail viacheck_for_any_flagsafter the earlier portion has run.Diffstat: 42 files changed, +146 / -46.
Context
This continues the work started in internal PR
AMD-ROCm-Internal/ROCgdb#6,
rebased onto the current
amd-stagingof the public repo. The commits arecontent-identical to the internal PR, preserving the original authors
(Aleksandar Rikalo, Andrey Kasaurov) with their
Co-Authored-By:trailersand adding my
Signed-off-by.