Skip to content

Pad gpu kernel support and test#40

Open
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:feat/pad-gpu
Open

Pad gpu kernel support and test#40
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:feat/pad-gpu

Conversation

@harz05

@harz05 harz05 commented Jun 17, 2026

Copy link
Copy Markdown

This PR implements the gpu support for the Pad operator (constant mode, matching the CPU op which also supports only constant mode).

Approach: one thread per output element (gather). Each thread decomposes its flat output index into per-dim coordinates, tests whether they map back inside the input (interior vs padding) and either copies the input element or writes the constant. Single pass, no pre-fill. Shapes/strides/pads are baked in as literals at codegen time. One thread per input element could have been used too but from what I can understand it will require 2 kernels, one that fills output with constant and one to scatter each input element into its output position

Test verified on Colab T4, existing Pad.onnx has been use and expected output generated from numpy.pad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant