You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add FP8 subtile unit tests and generalize GPU test helpers (ROCm#7318)
## Motivation
FP8 support requires verifying that the subtile-based kernel's GR/LR
swizzled tile assignment and MFMA
computation produce correct results on hardware. Previously there were
no GPU unit tests covering FP8 data types
in the subtile kernel path.
## Technical Details
**New FP8 unit tests:**
- `test_graTileAssignment_fp8.py` — GPU tests verifying GRA (Global Read
Assignment) tile layout for FP8 (AB_B8,
inst_k=128, bpe=1) with swizzled addressing
- `test_lraTileAssignment_fp8.py` — GPU tests verifying LRA (Local Read
Assignment) tile layout for FP8
- `test_gr_lr_roundtrip_fp8.py` — End-to-end GR→LDS→LR roundtrip tests
confirming data survives the FP8 swizzled
store/load pipeline
- `test_mfma_fp8.py` — GPU tests executing
`v_mfma_f32_16x16x128_fp8_fp8` and verifying results against a Python
reference dot-product
**Subtile component updates** (to support FP8 layouts):
- `SubtileGREmit.py`, `SubtileLREmit.py` — FP8 swizzled layout emission
- `SubtileGeometry.py`, `Kernel.py` — geometry and kernel-level FP8
support
**Refactoring of shared test infrastructure** (`gpu_test_helpers.py`):
- Generalized `_create_kernel` / `create_writer` to accept `geometry` /
`inst_k` / `bpe` parameters so FP8 and
FP16 tests share one code path
- Added `collect_tile_vgprs`, `compute_expected_subtile`,
`setup_roundtrip_writer`, `build_roundtrip_inner_asm`,
`alloc_export_vgprs`, `generate_srd_setup` as common helpers
- Eliminated ~550 lines of duplicated boilerplate across
`test_graTileAssignment.py`, `test_lraTileAssignment.py`,
`test_gr_lr_roundtrip.py`, `test_mfma_fp8.py`, and
`test_storeD_roundtrip.py`
## Test Plan
Run the full unit test suite from the tensilelite root:
```bash
PYTHONPATH=<rocisa_lib_path>:. python3 -m pytest Tensile/Tests/unit/ -v
-s
Or run individual FP8 test files:
PYTHONPATH=<rocisa_lib_path>:. python3 -m pytest
Tensile/Tests/unit/test_graTileAssignment_fp8.py -v -s
PYTHONPATH=<rocisa_lib_path>:. python3 -m pytest
Tensile/Tests/unit/test_lraTileAssignment_fp8.py -v -s
PYTHONPATH=<rocisa_lib_path>:. python3 -m pytest
Tensile/Tests/unit/test_gr_lr_roundtrip_fp8.py -v -s
PYTHONPATH=<rocisa_lib_path>:. python3 -m pytest
Tensile/Tests/unit/test_mfma_fp8.py -v -s
Test Result
124 unit tests passing, including all new FP8 tests and all pre-existing
FP16 tests. No regressions in existing
test files after the shared-helper refactoring.
```
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Matthew Emmett <matthew.emmett@amd.com>
0 commit comments