Implement AVX2 SIMD optimization #10

5000user5000 · 2025-11-09T09:19:53Z

實作 AVX2 SIMD 加速

Add l2_simd with AVX2 intrinsics
Enable -march=native for vectorization

- Add l2_simd with AVX2 intrinsics - Enable -march=native for vectorization

5000user5000

實作 AVX SIMD 加速

增加 -match=native 啟用 CPU 指令集
原有 L2 計算新增 AVX SIMD 計算
將舊的 L2 函式名替換成新的

5000user5000 · 2025-11-09T09:22:05Z

Makefile

@@ -1,5 +1,5 @@
 CXX := g++
-CXXFLAGS := -std=c++17 -O3 -fPIC -fopenmp
+CXXFLAGS := -std=c++17 -O3 -fPIC -march=native -fopenmp


增加 -march=native , 啟用 CPU 支援指令集

5000user5000 · 2025-11-09T09:23:03Z

include/zenann/SimdUtils.h

+inline float l2_simd(const float* __restrict a,
+                     const float* __restrict b,
+                     size_t dim) {
+#if defined(__AVX2__)
+    const size_t step = 8;            // 8 × 32-bit floats
+    __m256 acc       = _mm256_setzero_ps();
+    size_t i         = 0;
+    for (; i + step - 1 < dim; i += step) {
+        __m256 va   = _mm256_loadu_ps(a + i);
+        __m256 vb   = _mm256_loadu_ps(b + i);
+        __m256 diff = _mm256_sub_ps(va, vb);
+        acc         = _mm256_fmadd_ps(diff, diff, acc);   // acc += diff²
+    }
+    float buf[step];
+    _mm256_storeu_ps(buf, acc);
+    float d = 0.f;
+    for (int j = 0; j < step; ++j) d += buf[j];
+
+    for (; i < dim; ++i) {
+        float diff = a[i] - b[i];
+        d += diff * diff;
+    }
+    return d;
+#else
    float d = 0.f;
    for (size_t i = 0; i < dim; ++i) {
        float diff = a[i] - b[i];
        d += diff * diff;
    }
    return d;
+#endif


L2 計算新增 AVX SIMD 版本,如果不支援 AVX2,則會退回原版

5000user5000 · 2025-11-09T09:23:48Z

src/IVFFlatIndex.cpp

    #pragma omp parallel for schedule(static)
    for (size_t c = 0; c < nlist_; ++c) {
-        float d = l2_naive(query.data(), centroids_[c].data(), dimension_);
+        float d = l2_simd(query.data(), centroids_[c].data(), dimension_);


將既有 l2_naive 替換成新的 l2_simd

Implement AVX2 SIMD optimization

61c421e

- Add l2_simd with AVX2 intrinsics - Enable -march=native for vectorization

5000user5000 added the enhancement New feature or request label Nov 9, 2025

5000user5000 commented Nov 9, 2025

View reviewed changes

5000user5000 mentioned this pull request Nov 9, 2025

openMP 和 SIMD 加速 #2

Closed

5000user5000 changed the title ~~Implement AVX2 SIMD optimization~~ 實作 AVX2 SIMD 加速以及條件編譯 Nov 9, 2025

5000user5000 changed the title ~~實作 AVX2 SIMD 加速以及條件編譯~~ Implement AVX2 SIMD optimization Nov 9, 2025

5000user5000 merged commit d558de0 into main Nov 9, 2025
1 check passed

5000user5000 added acceleration and removed enhancement New feature or request labels Nov 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AVX2 SIMD optimization #10

Implement AVX2 SIMD optimization #10

Uh oh!

5000user5000 commented Nov 9, 2025

Uh oh!

5000user5000 left a comment

Uh oh!

5000user5000 Nov 9, 2025

Uh oh!

5000user5000 Nov 9, 2025

Uh oh!

5000user5000 Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement AVX2 SIMD optimization #10

Implement AVX2 SIMD optimization #10

Uh oh!

Conversation

5000user5000 commented Nov 9, 2025

Uh oh!

5000user5000 left a comment

Choose a reason for hiding this comment

Uh oh!

5000user5000 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

5000user5000 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

5000user5000 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants