Add INT8 support for LDS transpose load by stefankoncarevic · Pull Request #2214 · ROCm/rocMLIR

stefankoncarevic · 2026-01-23T14:40:21Z

⚠️ Do not merge until #2210 is merged - this PR depends on LDS transpose load fp8 support

Motivation

Extends LDS transpose load optimization to support INT8 data types for GEMM and Attention kernels on gfx950. This enables hardware-accelerated transposed loads (ds_read_tr8_b64) for all INT8 MFMAs (16x16x32, 16x16x64, 32x32x16, 32x32x32), improving performance for INT8 quantized inference.

Technical Details

LdsTransposeLoad.cpp: Added INT8 type support, offset formulas for (16,64) and (32,32) geometries, and double-rate K-coverage logic
AccelEmitter.cpp: Added K-dimension transformation for INT8 MFMAs with kBase=16 when kpack=1
RockDialect.cpp/RockOps.td: Updated validation and type support for INT8 LDS transpose

Test Plan

Added MLIR unit tests
Added E2E tests
All tests verified on gfx950 hardware with numerical correctness validation

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Extend LDS transpose load (ds_read_tr8_b64) to cover INT8 (i8) on top of the refactored FP8/BF8 path. INT8 reuses the 8-bit lane swizzle and adds two new double-rate geometries: (16, 64) and (32, 32). Core changes: * LdsTransposeLoad.{h,cpp}: add isInt8Type / uses8BitTransposeLoad and isInt8OnlyLdsTransposeGeometry helpers; extend isValidLdsTransposeMfma- Geometry, getTransposeLoadVectorLength, getDoubleRateKOffsetBase, getBasePanelOffsets and emitThreadwiseHWTranspose to compute the right kStride / kOffsetBase / highHalfOffset for INT8 double-rate loads. * makeDecision: reject INT8 with FP8-only or F16-only geometries and vice versa. * RockOps.td: allow I8 on rock.lds_transpose_load source/result types. * RockDialect.cpp: ThreadwiseReadIntoOp::verify accepts i8 destinations and enforces (geometry, type) consistency for INT8. Cleanup applied during the review: * Add isF16DoubleRateGeometry helper and reuse it in getDoubleRateKOffset- Base / getBasePanelOffsets / emitThreadwiseHWTranspose. * Fix outdated assert message in buildTransposeAttrFromParams and the kStride doc comment in getDoubleRateKOffsetBase. * Reorganize the LdsTransposeLoad.h header doc to list supported MFMA geometries by element type. Tests: * ops.mlir: positive rock.lds_transpose_load + LDSTransposeConfigAttr cases for the four INT8 geometries. * lowering_load_transpose_lds.mlir: lowering check for i8 -> amdgpu.transpose_load. * lds_transpose_load_panels.mlir: rename from the FP8-only file and add an INT8 (32, 16) panel-count check. * lds_transpose_error.mlir: negative tests for INT8 with f16-only and fp8-only geometries, f16 / fp8 with int8-only geometries, updated valid-geometry messages, and a kPerBlock divisibility test for the INT8 (32, 32) double-rate geometry. * PrLdsTransposeLoadI8.{toml,cfg} and PrLdsTransposeLoadAttentionI8. {toml,cfg}: new e2e configs for INT8 GEMM and Attention with LDS transpose on A/B and on K/Q.

* lds_transpose_load_panels.mlir: add panel-count Lit guards for the two INT8 double-rate geometries (16,64) and (32,32). Each stanza checks both the number of amdgpu.transpose_load ops and the resulting amdgpu.mfma instruction. * LdsTransposeLoad.h: promote isInt8Type next to isFp8Type and use hwtranspose::isInt8Type from RockDialect.cpp for symmetry. * LdsTransposeLoad.cpp: refresh stale doc comments that listed only fp8/bf8 where the path now also handles INT8. * PrLdsTransposeLoadI8.toml / PrLdsTransposeLoadAttentionI8.toml: replace inaccurate suite banners with a description of the INT8 MFMA geometries actually exercised. No functional change for supported configurations.

stefankoncarevic requested a review from causten as a code owner January 23, 2026 14:40

stefankoncarevic requested review from dhernandez0, djramic, justinrosner, pabloantoniom and umangyadav January 23, 2026 14:48

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch 3 times, most recently from f3176a8 to a75ab7a Compare January 29, 2026 14:10

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch 2 times, most recently from 24d9bf6 to 076a998 Compare February 27, 2026 13:37

stefankoncarevic force-pushed the lds-transpose-load-int8 branch from 3ccbf35 to a6318df Compare March 2, 2026 10:05

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch 3 times, most recently from a6e9ccf to b8674ba Compare April 2, 2026 13:56

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch from b8674ba to b54ad4c Compare April 6, 2026 12:21

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch 2 times, most recently from 43f0c7e to b4f76ac Compare April 23, 2026 15:29

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch 4 times, most recently from 80e126e to 7a60515 Compare May 6, 2026 10:48

stefankoncarevic force-pushed the lds-transpose-load-int8 branch from a6318df to 06376ea Compare May 6, 2026 14:24

stefankoncarevic force-pushed the lds-transpose-load-fp8 branch from f394fde to 024e85a Compare May 6, 2026 22:18

stefankoncarevic added 2 commits May 8, 2026 10:12

stefankoncarevic force-pushed the lds-transpose-load-int8 branch from 06376ea to 4dae620 Compare May 8, 2026 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INT8 support for LDS transpose load#2214

Add INT8 support for LDS transpose load#2214
stefankoncarevic wants to merge 2 commits intolds-transpose-load-fp8from
lds-transpose-load-int8

stefankoncarevic commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stefankoncarevic commented Jan 23, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant