Skip to content

Move WeightVecKernel out of infer() into session init #25

Description

@harz05

In Generate_GPU_ALPAKA function of ROperator_Conv.hxx (around line 849), the WeightVecKernel launch is currently emitted inside the _infer_impl body. The kernel reorders weights W into the dilated layout _f that the GEMM consumes. Since weights are constant after they're loaded from the .dat file in the session constructor, this reordering doesn't really need to happen on every infer() call.

The base operator already exposes GenerateInitCode_GPU_ALPAKA() in ROperator.hxx:71, called once per operator from inside the session constructor body in RModel_ALPAKA.cxx:641-646 (right after weights get uploaded to device). This seems like the right hook to move this launch into.

But I believe it's open to discussion on whether this is actually worth doing in practice(?) The win is one kernel launch and one alpaka::wait(queue) skipped per inference, which is modest on its own but might matter for latency-sensitive trigger inference at batch=1. Curious if there's a reason it was put inside infer() originally that I might be missing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions