Skip to content

[improvement] Batch the non-grouped Conv GEMM with gemmStridedBatched instead of a per sample loop #29

Description

@harz05

In Generate_GPU_ALPAKA of ROperator_Conv.hxx (the batch loop, around line 867), the non-grouped path runs one blas.matmul per batch sample. Each iteration does im2col into the shared _xcol buffer, broadcasts bias, calls matmul, then alpaka::wait(queue) before the next sample (since the next im2col would overwrite _xcol while the GEMM is still reading it). For batch B that is B separate small GEMMs and this creates around ~3B sync points.

For every sample the weight _f is identical; only the im2col input and the output slice change. This is exactly the case gemmStridedBatched handles. It already exists in sofieBLAS dev (commit fa108fb) and the Gemm operator uses it for the stacked MatMul case (ROperator_Gemm.hxx, useSBatched path around line 665), so there is a working reference for how to call it.

Proposed change: give each sample its own slice of _xcol so they don't alias, im2col each sample into its slice, then replace the B matmul calls with a single gemmStridedBatched over all samples.

The GEMM per sample is Y = Xcol * W with m = gemm_m (output spatial), n = gemm_n (output channels), k = gemm_k (inC * kH * kW). Strides for the batched call:

  • A = _xcol, strideA = colElements (gemm_m * gemm_k), since each sample has its own im2col
  • B = _f, strideB = 0, since the weight is shared across samples
  • C = _Y, strideC = gemm_n * gemm_m, each sample's output block
  • batchCount = B

Bias still works: Conv already broadcasts bias into the output with a separate kernel, then the GEMM accumulates with beta = 1. With per-sample slices the inter-sample alpaka::wait calls also go away.

Tradeoff is that _xcol (registered around line 322) has to grow B times so all samples' im2col coexist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions