Add INT8 support for LDS transpose load#2214
Open
stefankoncarevic wants to merge 2 commits intolds-transpose-load-fp8from
Open
Add INT8 support for LDS transpose load#2214stefankoncarevic wants to merge 2 commits intolds-transpose-load-fp8from
stefankoncarevic wants to merge 2 commits intolds-transpose-load-fp8from
Conversation
f3176a8 to
a75ab7a
Compare
24d9bf6 to
076a998
Compare
3ccbf35 to
a6318df
Compare
a6e9ccf to
b8674ba
Compare
b8674ba to
b54ad4c
Compare
43f0c7e to
b4f76ac
Compare
80e126e to
7a60515
Compare
a6318df to
06376ea
Compare
f394fde to
024e85a
Compare
Extend LDS transpose load (ds_read_tr8_b64) to cover INT8 (i8) on top
of the refactored FP8/BF8 path. INT8 reuses the 8-bit lane swizzle and
adds two new double-rate geometries: (16, 64) and (32, 32).
Core changes:
* LdsTransposeLoad.{h,cpp}: add isInt8Type / uses8BitTransposeLoad and
isInt8OnlyLdsTransposeGeometry helpers; extend isValidLdsTransposeMfma-
Geometry, getTransposeLoadVectorLength, getDoubleRateKOffsetBase,
getBasePanelOffsets and emitThreadwiseHWTranspose to compute the right
kStride / kOffsetBase / highHalfOffset for INT8 double-rate loads.
* makeDecision: reject INT8 with FP8-only or F16-only geometries and
vice versa.
* RockOps.td: allow I8 on rock.lds_transpose_load source/result types.
* RockDialect.cpp: ThreadwiseReadIntoOp::verify accepts i8 destinations
and enforces (geometry, type) consistency for INT8.
Cleanup applied during the review:
* Add isF16DoubleRateGeometry helper and reuse it in getDoubleRateKOffset-
Base / getBasePanelOffsets / emitThreadwiseHWTranspose.
* Fix outdated assert message in buildTransposeAttrFromParams and the
kStride doc comment in getDoubleRateKOffsetBase.
* Reorganize the LdsTransposeLoad.h header doc to list supported MFMA
geometries by element type.
Tests:
* ops.mlir: positive rock.lds_transpose_load + LDSTransposeConfigAttr
cases for the four INT8 geometries.
* lowering_load_transpose_lds.mlir: lowering check for i8 ->
amdgpu.transpose_load.
* lds_transpose_load_panels.mlir: rename from the FP8-only file and
add an INT8 (32, 16) panel-count check.
* lds_transpose_error.mlir: negative tests for INT8 with f16-only and
fp8-only geometries, f16 / fp8 with int8-only geometries, updated
valid-geometry messages, and a kPerBlock divisibility test for the
INT8 (32, 32) double-rate geometry.
* PrLdsTransposeLoadI8.{toml,cfg} and PrLdsTransposeLoadAttentionI8.
{toml,cfg}: new e2e configs for INT8 GEMM and Attention with LDS
transpose on A/B and on K/Q.
* lds_transpose_load_panels.mlir: add panel-count Lit guards for the two INT8 double-rate geometries (16,64) and (32,32). Each stanza checks both the number of amdgpu.transpose_load ops and the resulting amdgpu.mfma instruction. * LdsTransposeLoad.h: promote isInt8Type next to isFp8Type and use hwtranspose::isInt8Type from RockDialect.cpp for symmetry. * LdsTransposeLoad.cpp: refresh stale doc comments that listed only fp8/bf8 where the path now also handles INT8. * PrLdsTransposeLoadI8.toml / PrLdsTransposeLoadAttentionI8.toml: replace inaccurate suite banners with a description of the INT8 MFMA geometries actually exercised. No functional change for supported configurations.
06376ea to
4dae620
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Extends LDS transpose load optimization to support INT8 data types for GEMM and Attention kernels on gfx950. This enables hardware-accelerated transposed loads (ds_read_tr8_b64) for all INT8 MFMAs (16x16x32, 16x16x64, 32x32x16, 32x32x32), improving performance for INT8 quantized inference.
Technical Details
Test Plan
Added MLIR unit tests
Added E2E tests
All tests verified on gfx950 hardware with numerical correctness validation
Test Result
Submission Checklist