Skip to content

WIP: gpu support for TopK#42

Open
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:topk-gpu-alpaka
Open

WIP: gpu support for TopK#42
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:topk-gpu-alpaka

Conversation

@harz05

@harz05 harz05 commented Jun 21, 2026

Copy link
Copy Markdown

This PR adds an initial GPU alpaka path for the TopK operator.

Approach: one thread per slice along the sorted axis. Each thread keeps a K-sized insertion-sorted buffer and selects its top-K in a single pass. largest/smallest, sorted, k and the strides are baked in at codegen, so the generated kernel has no attribute branches. Handles arbitrary axis (strided slices) and both largest and smallest; output is kept ordered to match the CPU
op, with smaller index winning on ties.

Test: added a TopK case to the alpaka test suite, reusing the existing TopK.onnx and its reference output.

WIP:

Currently working on implementin R-Topk, Warp Select for our use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant