New test gdb.rocm/write-all-registers.exp#91
Conversation
f6a5379 to
1a456cc
Compare
lancesix
left a comment
There was a problem hiding this comment.
Some comments, let me know what you think.
| set reg_size [register_size $reg] | ||
|
|
||
| # Set register to all ones. Read-only bits will stay | ||
| # at zero. |
There was a problem hiding this comment.
Just a nit.
We have registers with read-only bits which do not have to be 0s. We can have 1s as well which are read-only.
| set reg_size [register_size $reg] | ||
|
|
||
| # Set register to all ones. Read-only bits will stay | ||
| # at zero. |
There was a problem hiding this comment.
Just noting, the read-only bits in "special" registers do not have to be 0s. Those can be ones as well.
For example, gfx12 has "STATUS.W64" to tell if a wave is in wave32 or wave64 mode. When using wave64, this bit is 1 and readonly.
That being said, I don't think we currently expose any register where this could be an issue, but we could in the future add some completely read-only (no bit can be written) registers, in which case we'll need to adjust this testcase accordingly.
edit: scratch_base / flat_scratch might be good candidates of registers to become "fully read-only".
There was a problem hiding this comment.
Just noting, the read-only bits in "special" registers do not have to be 0s. Those can be ones as well.
For example, gfx12 has "STATUS.W64" to tell if a wave is in wave32 or wave64 mode. When using wave64, this bit is 1 and readonly.
Did we just find a bug then? With this:
--- c/gdb/testsuite/gdb.rocm/write-all-registers.cpp
+++ w/gdb/testsuite/gdb.rocm/write-all-registers.cpp
@@ -26,7 +26,7 @@ kern ()
int
main ()
{
- kern<<<1, 1>>> ();
+ kern<<<1, 64>>> ();
CHECK (hipDeviceSynchronize ());
return 0;
}
diff --git c/gdb/testsuite/gdb.rocm/write-all-registers.exp w/gdb/testsuite/gdb.rocm/write-all-registers.exp
index 9e2fe7bb00c..aa4167615b9 100644
--- c/gdb/testsuite/gdb.rocm/write-all-registers.exp
+++ w/gdb/testsuite/gdb.rocm/write-all-registers.exp
@@ -22,7 +22,10 @@ require allow_hipcc_tests
standard_testfile .cpp
-if {[build_executable "failed to prepare" $testfile $srcfile {debug hip}]} {
+set opts {debug hip}
+lappend opts additional_flags=-mwavefrontsize64
+
+if {[build_executable "failed to prepare" $testfile $srcfile $opts]} {
return
}
I see 64 lanes in one wave, and the WAVE64 bit is set, but I can clear it from gdb and it sticks:
(gdb) p $status
$1 = [ TRAP_EN VCCZ VALID WAVE64 ]
(gdb) p $status = 0
$2 = [ ]
(gdb) p $status
$3 = [ ]
So in wave 64 mode the testcase as is still passes (except for the buggy state_priv).
There was a problem hiding this comment.
I think so, yes.
The definite test should be to do:
(gdb) p $status = 0
...
(gdb) si
....
(gdb) p $status
But re-looking at the doc (sect 3.4.1), the $status register is fully read-only. We should not be able to update it directly.
There was a problem hiding this comment.
If I issue a "si" after clearing the bit, the step succeeds normally, I still have 64 lanes, and the wave64 is back set to 1 again.
There was a problem hiding this comment.
Did anything change in the $status value? I expect it should be the same value (unless the single step changed something which impact status, like changing $vcc could change $status.vccz).
There was a problem hiding this comment.
AMD_DBGAPI_REGISTER_PROPERTY_READONLY_BITS is set for that register.
However, in dbgapi, each architecture returns the read-only bits for each register with the register_read_only_mask virtual method. For the status register though, that is currently returning null. Sounds like that should return an actual mask for this register.
I'm a bit puzzled looking for where does the register end up with AMD_DBGAPI_REGISTER_PROPERTY_READONLY_BITS set. I don't see a breakpoint hit in gfx12_architecture_t::register_properties for that register. I'll look a little deeper.
There was a problem hiding this comment.
Setting the READONLY_BITS property for registers with non-nulltpr read_only_mask is done here:
amd_dbgapi_register_properties_t
amdgcn_architecture_t::register_properties (amdgpu_regnum_t regnum) const
{
amd_dbgapi_register_properties_t properties
= register_read_only_mask (regnum) != nullptr
? AMD_DBGAPI_REGISTER_PROPERTY_READONLY_BITS
: AMD_DBGAPI_REGISTER_PROPERTY_NONE;
...
}There was a problem hiding this comment.
What was confusing me earlier is that there's pseudo_status and status, two different register numbers.
I have something that works now, but I'm now wondering whether pseudo_state_priv should be writable too.
From the same docs, state_priv can only be written from the trap handler (priv=1), so how come you wanted to be able to write to it from rocgdb, leading to that dbgapi patch? Can one use rocgdb to debug the trap handler?
There was a problem hiding this comment.
From the same docs, state_priv can only be written from the trap handler (priv=1), so how come you wanted to be able to write to it from rocgdb, leading to that dbgapi patch? Can one use rocgdb to debug the trap handler?
I've convinced myself that GDB+dbgapi should treat state_priv as read-only too.
See dbgapi patch at ROCm/rocm-systems#5211, and the comments/questions in patch 2/2 of that series. Your dbgapi patch is still useful if we have internal dbgapi writes to state_priv, though I didn't find any.
I've uploaded a new version of the testcase that assumes that dbgapi patch is applied. Since I'm ignoring writes to state_priv, the testcase no longer covers the bug you fixed.
There was a problem hiding this comment.
I don't think I agree for state_priv There is the HALT bit in this register that we might want to be able to control.
Shader code can do something s_sethalt 1 which would result in the wave being halted, and then being able to write to the state_priv.halt bit from the debugger would be the only way to resume execution.
I'll go and check the patch you have in dbgapi.
| save_restore_register $reg { | ||
| gdb_test "p/x \$$reg = $all_ones_val" " = $all_ones_val" | ||
| # Read back the value, which should have writable bits | ||
| # set to one, read-only bits set to zero. |
There was a problem hiding this comment.
Same, this comment is probably inaccurate.
To really get the set of read-only registers, you'd need to
- write all 1s, and collect a mask of bits staying at 0.
- write all 0s and collect a mask of bits staying at 1.
- OR those two masks.
This should give a mask of read-only bits, and only a very restricted set of regs (which we do not expose today) can have that mask equal to $all_ones_val.
There was a problem hiding this comment.
This is now improved along those lines. Thanks, I had only originally considered the reserved (always zero) bits.
| # Registers that have bits that are always zero. | ||
| proc is_special {reg} { | ||
| # The single list contains all the special registers from all | ||
| # supported architectures, sorted in alphabetical order. |
There was a problem hiding this comment.
A good dynamic approach to discover such registers is probably to look at info registers system.
For example, on gfx1031, I have:
(gdb) info registers system
mode 0x3f0 [ FP_ROUND.32=NEAREST_EVEN FP_ROUND.64_16=NEAREST_EVEN FP_DENORM.32=FLUSH_NONE FP_DENORM.64_16=FLUSH_NONE DX10_CLAMP IEEE ]
trapsts 0x40000000 [ EXCP_CYCLE=0 EXCP_GROUP_MASK=0 DP_RATE=2 ]
flat_scratch 0x7ffee4a00000 140732734111744
status 0x98041 [ SCC SPI_PRIO=0 USER_PRIO=0 TRAP_EN TTRACE_SIMD_EN VALID PERF_EN ]
Note that we see here flat_scratch (known as `scratch_base for gfx12) which I think should probably be fully read-only. Where the scratch for a wave is is not something we should edit, it is a property of a wave we should not be able to touch.
pc is probably another register with read-only bits which is not listed here, but it is handled in a special way anyway.
There was a problem hiding this comment.
I've now tweaked the dbgapi patch to consider flat_scratch / scratch_base read-only too, and updated the gdb patch. Juggling too many machines at once, missed that one... I haven't yet tested on gfx11.
|
Also, I am surprised that the CI passes. There is a known issue for |
We don't run rocgdb tests on gfx12xx at the moment. So that would need to be exercised manually. |
|
Just a FYI, I observed the number of tests differing between a run with O0 and another with O3. About half as many tests passing on O3 compared to O0. Not sure if that's by design or not. |
We had work internally to enable the testsuite at 01/02/03, some of that implied disabling some tests. Not sure how much of that work landed (but we should eventually resurrect that). |
Just for the record, the test does fail for state_priv on gfx12 currently (both linux and windows the same way), that's the bug that triggered writing the testcase in the first place. |
1a456cc to
9b0e45f
Compare
This adds a test that writes to all GPU registers, confirming that: - For registers with non-read-only bits, written values are really written and can be read back. - Writing to read-only registers is ignored. Tested on Linux, on gfx90a, gfx942, gfx1030, and gfx1201. Tested on Windows, on gfx1201. Change-Id: I76599cb02bd0cdbe504398763d48a8112ad21bab commit-id:23de25e0
9b0e45f to
7c4988a
Compare
This adds a test that writes to all GPU registers, confirming that:
written and can be read back.
Tested on Linux, on gfx90a, gfx942, gfx1030, and gfx1201.
Tested on Windows, on gfx1201.
Change-Id: I76599cb02bd0cdbe504398763d48a8112ad21bab