Generate_GPU_ALPAKA in ROperator_Conv.hxx has a batch loop that offsets input/output pointers per sample and runs im2col + GEMM for each one. The code is there, but every Conv ONNX model in test/input_models/ has input shape [1, 1, H, W], so the loop always executes exactly once and the multi-batch path has never actually been tested.
the fix is to add ONNX models with batch sizes 2, 4, and 8 along with their reference outputs and TEST_F cases in the TestCustomModelsFromONNXForAlpakaCuda.cxx file, this helps in verifiying each batch item.
Generate_GPU_ALPAKAinROperator_Conv.hxxhas a batch loop that offsets input/output pointers per sample and runs im2col + GEMM for each one. The code is there, but every Conv ONNX model intest/input_models/has input shape [1, 1, H, W], so the loop always executes exactly once and the multi-batch path has never actually been tested.the fix is to add ONNX models with batch sizes 2, 4, and 8 along with their reference outputs and
TEST_Fcases in theTestCustomModelsFromONNXForAlpakaCuda.cxxfile, this helps in verifiying each batch item.