Move WeightVecKernel out of infer() into session init

In `Generate_GPU_ALPAKA` function of `ROperator_Conv.hxx` (around line 849), the `WeightVecKernel` launch is currently emitted inside the `_infer_impl` body. The kernel reorders weights W into the dilated layout _f that the GEMM consumes. Since weights are constant after they're loaded from the `.dat` file in the session constructor, this reordering doesn't really need to happen on every `infer()` call.

The base operator already exposes `GenerateInitCode_GPU_ALPAKA()` in `ROperator.hxx:71`, called once per operator from inside the session constructor body in `RModel_ALPAKA.cxx:641-646` (right after weights get uploaded to device). This seems like the right hook to move this launch into.

But I believe it's open to discussion on whether this is actually worth doing in practice(?) The win is one kernel launch and one `alpaka::wait(queue)` skipped per inference, which is modest on its own but might matter for latency-sensitive trigger inference at batch=1. Curious if there's a reason it was put inside `infer()` originally that I might be missing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move WeightVecKernel out of infer() into session init #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Move WeightVecKernel out of infer() into session init #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions