Pad gpu kernel support and test by harz05 · Pull Request #40 · ML4EP/SOFIE

harz05 · 2026-06-17T17:39:40Z

This PR implements the gpu support for the Pad operator (constant mode, matching the CPU op which also supports only constant mode).

Approach: one thread per output element (gather). Each thread decomposes its flat output index into per-dim coordinates, tests whether they map back inside the input (interior vs padding) and either copies the input element or writes the constant. Single pass, no pre-fill. Shapes/strides/pads are baked in as literals at codegen time. One thread per input element could have been used too but from what I can understand it will require 2 kernels, one that fills output with constant and one to scatter each input element into its output position

Test verified on Colab T4, existing Pad.onnx has been use and expected output generated from numpy.pad

harz05 added 2 commits June 17, 2026 22:17

pad gpu kernel and test added

06f16c0

fix test ref values; skip redundant pad lower-bound check

28eea79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pad gpu kernel support and test#40

Pad gpu kernel support and test#40
harz05 wants to merge 2 commits into
ML4EP:gpu/alpakafrom
harz05:feat/pad-gpu

harz05 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

harz05 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant