In Generate_GPU_ALPAKA function of ROperator_Conv.hxx (around line 849), the WeightVecKernel launch is currently emitted inside the _infer_impl body. The kernel reorders weights W into the dilated layout _f that the GEMM consumes. Since weights are constant after they're loaded from the .dat file in the session constructor, this reordering doesn't really need to happen on every infer() call.
The base operator already exposes GenerateInitCode_GPU_ALPAKA() in ROperator.hxx:71, called once per operator from inside the session constructor body in RModel_ALPAKA.cxx:641-646 (right after weights get uploaded to device). This seems like the right hook to move this launch into.
But I believe it's open to discussion on whether this is actually worth doing in practice(?) The win is one kernel launch and one alpaka::wait(queue) skipped per inference, which is modest on its own but might matter for latency-sensitive trigger inference at batch=1. Curious if there's a reason it was put inside infer() originally that I might be missing.
In
Generate_GPU_ALPAKAfunction ofROperator_Conv.hxx(around line 849), theWeightVecKernellaunch is currently emitted inside the_infer_implbody. The kernel reorders weights W into the dilated layout _f that the GEMM consumes. Since weights are constant after they're loaded from the.datfile in the session constructor, this reordering doesn't really need to happen on everyinfer()call.The base operator already exposes
GenerateInitCode_GPU_ALPAKA()inROperator.hxx:71, called once per operator from inside the session constructor body inRModel_ALPAKA.cxx:641-646(right after weights get uploaded to device). This seems like the right hook to move this launch into.But I believe it's open to discussion on whether this is actually worth doing in practice(?) The win is one kernel launch and one
alpaka::wait(queue)skipped per inference, which is modest on its own but might matter for latency-sensitive trigger inference at batch=1. Curious if there's a reason it was put insideinfer()originally that I might be missing.