Cuda 實作 #14

Seco1024 · 2025-11-24T22:23:47Z

主要更動

CUDA 實作

拔掉 OpenMP + SIMD
針對 batch_search() 實現 CUDA 搜尋版本，總共有四個 kernel function
- Coarse Distance Computation
- Top-nprobe Selection
- Inverted List Scanning -> 最主要的優化在這邊
- Final Merge & Reranking
GPU Memory Management
- 實作 GpuCentroidsManager, GpuDataManager, GpuInvertedListsManager
- 分別對照到 centroids, vectors, posting list 的記憶體管理
整合到原本的 IVF Interface（src/IVFFlatIndex.cpp）
詳細實作邏輯可以參考 /doc 下的 cuda.md

Benchmark tools

把 search() 改成 batch_search() 才能最大化 ANNS 的平行化效能
CPU & GPU 的 QPS measurement 要能夠 align

Profiling tools

測四大搜尋階段（可參考 discussion）的時間占比

編譯下 make cuda 就可以編譯出 GPU 版本的 IVF search

Seco1024 · 2025-11-24T22:31:45Z

優化重點

Memory Coalescing
- Row-major layout 連續記憶體存取
- Threads 以 stride 方式掃描資料
Shared Memory 重用
- Query vectors 載入 shared memory 避免重複讀取
- Block-level merge 使用 shared memory 做候選收集
Register-level Top-k
- Thread-local top-k 完全在 registers 中運行（最多 128 個）
- 使用 insertion sort（對小 k 很高效）
Zero CPU-GPU Transfer Overhead
- Index 資料常駐 GPU，只傳輸 queries 和 results
- Batch processing 攤銷傳輸成本

效能問題
k=100 下 shared memory 不夠！！
要優化的話可以思考看看上面已經實作的優化重點是否有地方可以拔掉，和 shared memory 的實作 tradeoff 一下。

5000user5000

LGTM，我會直接 merge 這個 PR

看起來很不錯，也有 cuda.md 能夠幫助理解實作內容，之後對於做報告應該會有幫助。
另外看起來 openMP+SIMD 應該還能運行，只是在 make cuda 版本就只有 cuda 避免干擾
數據上 cuda 也有比純 CPU 快，但 k=100 會爆 shared memory
這點我這禮拜會嘗試看看如何處理，並會開一個新的 PR 更新
如果沒辦法的話，由於時間問題，就直接開始做投影片準備報告
並且在報告可以說明此問題，以及之後可以用甚麼方式解決

Seco1024 added 5 commits November 18, 2025 00:29

Implement profiling script

bfc67d8

feat: cuda parallelization

4275f36

chore: Makefile

758c0df

fix: replace search() with batch_search() as benchmark tool

e9691b8

doc: gpt's advice

c421fcd

5000user5000 assigned Seco1024 Nov 25, 2025

5000user5000 added the cuda label Nov 25, 2025

5000user5000 self-requested a review November 25, 2025 01:03

5000user5000 approved these changes Nov 25, 2025

View reviewed changes

5000user5000 merged commit 8ca8e1c into 5000user5000:main Nov 25, 2025
1 check passed

5000user5000 mentioned this pull request Nov 25, 2025

CUDA 加速 #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda 實作 #14

Cuda 實作 #14

Uh oh!

Seco1024 commented Nov 24, 2025

Uh oh!

Seco1024 commented Nov 24, 2025

Uh oh!

5000user5000 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cuda 實作 #14

Cuda 實作 #14

Uh oh!

Conversation

Seco1024 commented Nov 24, 2025

Uh oh!

Seco1024 commented Nov 24, 2025

Uh oh!

5000user5000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants