Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
300f21b
Initial implementation for ALPAKA integration to SOFIE
sanjibansg Apr 11, 2025
fc9846c
GPU ALPAKA Support in GEMM
sanjibansg Apr 11, 2025
419b354
fix: errors with the generation function
sanjibansg Apr 11, 2025
6481c05
fix: defining intermediate and initialized tensors
sanjibansg Aug 18, 2025
e31303f
feat: use sofieblas efficiently and add leaky relu, sigmoid support
sanjibansg Oct 14, 2025
afae7c3
feat: add basic binary kernel
sanjibansg Oct 17, 2025
c845fe7
feat: add cast kernel
sanjibansg Oct 17, 2025
3d9f812
feat: add squeeze, unsqueeze, flatten and reshape
sanjibansg Oct 21, 2025
284405e
feat: add support for basic unary
sanjibansg Oct 21, 2025
d64a40f
feat: add support for Constant operator
sanjibansg Oct 21, 2025
ac8d662
feat: add support for shape operator
sanjibansg Oct 22, 2025
d75eac3
feat: add support for Basic Binary operations
sanjibansg Nov 20, 2025
f7e44ad
fix: compilation issues due to faulty rebase
sanjibansg Nov 23, 2025
b4cd917
fix: parameteric inputs for range operator
sanjibansg Nov 23, 2025
2016794
fix: linking issue because of incorrect symbols
sanjibansg Nov 24, 2025
f35d9d9
fix: cmake script for tests
sanjibansg Nov 27, 2025
cdc6a9f
fix: define failures in EmitFromRoot.cxx.in (#6)
Saransh-cpp Dec 2, 2025
3ffbe46
fix: layout inconsistencies in alpaka code generation
sanjibansg Dec 14, 2025
1979a11
feat: turn off emitting from ROOT files and skip tests with multiple …
sanjibansg Dec 15, 2025
59aeac4
feat: support for google tests for inference code with alpaka impleme…
sanjibansg Dec 15, 2025
815a80c
feat: test cases for leaky relu operator
sanjibansg Jan 26, 2026
671b4b0
fix: sigmoid operator gpu implementation and test
sanjibansg Jan 26, 2026
2fe07bb
feat: Support for heterogeneous inference of transpose operator
sanjibansg Mar 3, 2026
ccebeb3
feat: Support for heterogeneous inference on concat operator
sanjibansg Mar 5, 2026
4b179ea
feat: Support for heterogeneous inference on scatter elements operator
sanjibansg Mar 5, 2026
50d478e
fix: split operator implementation and multiple buffer return
sanjibansg Mar 9, 2026
5af24cf
feat: Support for heterogeneous inference on tile operator
sanjibansg Mar 12, 2026
6dac663
feat: tests for heterogeneous inference of Gather operator
sanjibansg Mar 13, 2026
0d205e9
feat: tests for heterogeneous inference of Expand operator
sanjibansg Mar 16, 2026
1f0ebc6
feat: Support for heterogeneous inference on gathernd operator
sanjibansg Mar 16, 2026
19cdc11
fix: tensor management for expand and cast operators
sanjibansg Mar 19, 2026
c3b892a
feat: alpaka wait only before blas calls
sanjibansg Mar 30, 2026
ba9643e
feat: Support for heterogeneous inference on comparison operators
sanjibansg Apr 13, 2026
6ba3f29
feat: Support for heterogeneous inference for slice and unary operators
sanjibansg Apr 13, 2026
7f27cd4
feat: Support for heterogeneous inference for where operator
sanjibansg Apr 13, 2026
860d34f
feat: Support for heterogeneous inference for reduce and softplus ope…
sanjibansg Apr 13, 2026
4433ca2
feat: Make ROOT usage optional
sanjibansg Apr 28, 2026
dae834e
feat: Support for heterogeneous inference for Conv operator
sanjibansg May 7, 2026
2b6ede3
feat: infer methods using alpaka views
sanjibansg May 11, 2026
ff9d6cf
feat: Support for inference on batchnorm operator
sanjibansg May 11, 2026
16d8fab
feat: fusion
sanjibansg May 19, 2026
21740c6
fix: several fixes with operator initialization and cpu generation
sanjibansg May 21, 2026
2a98dca
feat: project restructure and benchmark tool
sanjibansg May 24, 2026
b7e5178
feat: Trilu and Logic operator implementations, several GPU optimizat…
sanjibansg May 27, 2026
dbd19b6
feat: structural change, add github ci workflow
sanjibansg Jun 16, 2026
2242c25
fix: ci container setup for tests and benchmarking
sanjibansg Jun 17, 2026
f9a4a85
feat: profiler support for inference on heterogeneous inference
sanjibansg Jun 17, 2026
7e177be
fix: self hosted runners for github ci tests and benchmark
sanjibansg Jun 19, 2026
c51ab63
test: fix ci
gigabyte132 Jun 19, 2026
7e3351f
test: fix ci
gigabyte132 Jun 19, 2026
2ecf981
fix: temp suspend unbound variables errors in CI
sanjibansg Jun 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
158 changes: 158 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
name: Benchmark

on:
pull_request:
branches: [main]
workflow_dispatch:
inputs:
warmup:
description: "Warmup iterations"
default: "5"
iterations:
description: "Timed iterations"
default: "100"

concurrency:
group: benchmark-${{ github.ref }}
cancel-in-progress: true

env:
LCG_VIEW: /cvmfs/sft.cern.ch/lcg/views/LCG_106a/x86_64-el9-gcc13-opt
CUDA_CVMFS: /cvmfs/sft.cern.ch/lcg/contrib/cuda/12.4/x86_64-el9
CUDA_ARCH: "90"
BUILD_TYPE: Release
BENCH_WARMUP: ${{ github.event.inputs.warmup || '5' }}
BENCH_ITERS: ${{ github.event.inputs.iterations || '100' }}
DEPS_CACHE: /tmp/sofie-cmake-deps

jobs:
benchmark:
name: Benchmark Comparison (H100)
runs-on: ml4ep-h100
container: registry.cern.ch/ngt/lxplus-like:9
timeout-minutes: 120

steps:
- name: GPU check
run: nvidia-smi

- name: Setup build environment
run: |
set -euo pipefail

if [ -f "${{ env.LCG_VIEW }}/setup.sh" ]; then
set +u; source "${{ env.LCG_VIEW }}/setup.sh"; set -u
else
echo "LCG view not found — installing from dnf"
dnf install -y epel-release
dnf install -y cmake ninja-build gcc-c++ python3 git \
protobuf-devel openblas-devel
fi

if [ -x "${{ env.CUDA_CVMFS }}/bin/nvcc" ]; then
echo "${{ env.CUDA_CVMFS }}/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=${{ env.CUDA_CVMFS }}" >> "$GITHUB_ENV"
elif [ -x /usr/local/cuda/bin/nvcc ]; then
echo "/usr/local/cuda/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=/usr/local/cuda" >> "$GITHUB_ENV"
else
dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
dnf install -y cuda-compiler-12-4 cuda-cudart-devel-12-4 cuda-libraries-devel-12-4
echo "/usr/local/cuda-12.4/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=/usr/local/cuda-12.4" >> "$GITHUB_ENV"
fi

echo "PATH=$PATH" >> "$GITHUB_ENV"
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}" >> "$GITHUB_ENV"
echo "CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH:-}" >> "$GITHUB_ENV"

- name: Checkout PR branch
uses: actions/checkout@v4
with:
path: sofie-pr

- name: Checkout main branch
if: github.event_name == 'pull_request'
uses: actions/checkout@v4
with:
ref: main
path: sofie-main

- name: Cache FetchContent dependencies
uses: actions/cache@v4
with:
path: ${{ env.DEPS_CACHE }}
key: cmake-deps-bench-${{ hashFiles('sofie-pr/benchmark/CMakeLists.txt') }}
restore-keys: cmake-deps-bench-

- name: Configure PR build
run: |
cmake -B sofie-pr/build -S sofie-pr \
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} \
-DSOFIE_WITH_ROOT=OFF \
-DSOFIE_BENCHMARK=ON \
-DSOFIE_BENCHMARK_BACKEND=CUDA \
"-DSOFIE_BENCHMARK_CUDA_ARCH=${{ env.CUDA_ARCH }}" \
"-DCMAKE_CUDA_ARCHITECTURES=${{ env.CUDA_ARCH }}" \
"-DFETCHCONTENT_BASE_DIR=${{ env.DEPS_CACHE }}"

- name: Build PR benchmark
run: cmake --build sofie-pr/build --target sofie_benchmark -j$(nproc)

- name: Run PR benchmark
working-directory: sofie-pr/build/benchmark
run: |
./sofie_benchmark \
-w ${{ env.BENCH_WARMUP }} \
-n ${{ env.BENCH_ITERS }} \
| tee benchmark_pr.txt

- name: Configure main build
if: github.event_name == 'pull_request'
run: |
cmake -B sofie-main/build -S sofie-main \
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} \
-DSOFIE_WITH_ROOT=OFF \
-DSOFIE_BENCHMARK=ON \
-DSOFIE_BENCHMARK_BACKEND=CUDA \
"-DSOFIE_BENCHMARK_CUDA_ARCH=${{ env.CUDA_ARCH }}" \
"-DCMAKE_CUDA_ARCHITECTURES=${{ env.CUDA_ARCH }}" \
"-DFETCHCONTENT_BASE_DIR=${{ env.DEPS_CACHE }}"

- name: Build main benchmark
if: github.event_name == 'pull_request'
run: cmake --build sofie-main/build --target sofie_benchmark -j$(nproc)

- name: Run main benchmark
if: github.event_name == 'pull_request'
working-directory: sofie-main/build/benchmark
run: |
./sofie_benchmark \
-w ${{ env.BENCH_WARMUP }} \
-n ${{ env.BENCH_ITERS }} \
| tee benchmark_main.txt

- name: Summarise PR vs main
if: github.event_name == 'pull_request'
run: |
echo "### Benchmark comparison: PR vs main" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
echo "── PR ──────────────────────────────────────────────────────────────" \
>> "$GITHUB_STEP_SUMMARY"
cat sofie-pr/build/benchmark/benchmark_pr.txt >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "── main ────────────────────────────────────────────────────────────" \
>> "$GITHUB_STEP_SUMMARY"
cat sofie-main/build/benchmark/benchmark_main.txt >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"

- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.run_id }}
path: |
sofie-pr/build/benchmark/benchmark_pr.txt
sofie-main/build/benchmark/benchmark_main.txt
113 changes: 113 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: Unit Tests

on:
push:
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:

concurrency:
group: tests-${{ github.ref }}
cancel-in-progress: true

env:
LCG_VIEW: /cvmfs/sft.cern.ch/lcg/views/LCG_106a/x86_64-el9-gcc13-opt
CUDA_CVMFS: /cvmfs/sft.cern.ch/lcg/contrib/cuda/12.4/x86_64-el9
CUDA_ARCH: "90"
BUILD_TYPE: Release
DEPS_CACHE: /tmp/sofie-cmake-deps

jobs:
gpu-tests:
name: GPU Unit Tests (NVIDIA/H100)
runs-on: ml4ep-h100
container: registry.cern.ch/ngt/lxplus-like:9
timeout-minutes: 60

steps:
- name: GPU check
run: nvidia-smi

- name: Setup build environment
run: |
set -euo pipefail

# LCG view (cmake, gcc-13, protobuf, openblas)
if [ -f "${{ env.LCG_VIEW }}/setup.sh" ]; then
set +u; source "${{ env.LCG_VIEW }}/setup.sh"; set -u
else
echo "LCG view not found — installing from dnf"
dnf install -y epel-release
dnf install -y cmake ninja-build gcc-c++ python3 git \
protobuf-devel openblas-devel
fi

# CUDA toolkit (nvcc + headers)
if [ -x "${{ env.CUDA_CVMFS }}/bin/nvcc" ]; then
echo "${{ env.CUDA_CVMFS }}/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=${{ env.CUDA_CVMFS }}" >> "$GITHUB_ENV"
elif [ -x /usr/local/cuda/bin/nvcc ]; then
echo "/usr/local/cuda/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=/usr/local/cuda" >> "$GITHUB_ENV"
else
echo "nvcc not found — installing CUDA toolkit from NVIDIA repo"
dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
dnf install -y cuda-compiler-12-4 cuda-cudart-devel-12-4 cuda-libraries-devel-12-4
echo "/usr/local/cuda-12.4/bin" >> "$GITHUB_PATH"
echo "CUDA_HOME=/usr/local/cuda-12.4" >> "$GITHUB_ENV"
fi

# GTest
dnf install -y gtest-devel 2>/dev/null || \
dnf install -y googletest-devel 2>/dev/null || (
cd /tmp
git clone --depth 1 -b v1.14.0 https://github.com/google/googletest.git
cmake -B gtest-build -S googletest \
-DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_SHARED_LIBS=ON
cmake --build gtest-build -j$(nproc)
cmake --install gtest-build
)

echo "PATH=$PATH" >> "$GITHUB_ENV"
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}" >> "$GITHUB_ENV"
echo "CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH:-}" >> "$GITHUB_ENV"

- name: Checkout
uses: actions/checkout@v4

- name: Cache FetchContent dependencies
uses: actions/cache@v4
with:
path: ${{ env.DEPS_CACHE }}
key: cmake-deps-tests-${{ hashFiles('test/CMakeLists.txt') }}
restore-keys: cmake-deps-tests-

- name: Configure
run: |
cmake -B build -S . \
-DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} \
-DSOFIE_WITH_ROOT=OFF \
-Dtesting=ON \
-DENABLE_ALPAKA_TESTS=ON \
-DALPAKA_BACKEND=cuda \
"-DCMAKE_CUDA_ARCHITECTURES=${{ env.CUDA_ARCH }}" \
"-DFETCHCONTENT_BASE_DIR=${{ env.DEPS_CACHE }}"

- name: Build tests
run: |
cmake --build build \
--target TestCustomModelsFromONNXForAlpakaCuda \
-j$(nproc)

- name: Run tests
working-directory: build
run: ctest --output-on-failure -j1

- name: Upload test log
if: always()
uses: actions/upload-artifact@v4
with:
name: test-log-${{ github.run_id }}
path: build/Testing/Temporary/LastTest.log
Loading
Loading