Fortunately plans are cached by accelerate-fft however there is only one plan per shape and data-type and no mutual exclusion for concurrent usage of that plan which poses a thread-safety issue when multiple separate threads try to use the same plan, assign it a corresponding CUDA stream, and then execute said plan. Execution could feasibly look like:
p = ... cached plan #1 ...
thread 1: FFT.setStream p s1
thread 2: FFT.setStream p s2
thread 1: FFT.execC2C p (fftMode mode) d_in d_out
thread 2: FFT.execC2C p (fftMode mode) d_in d_out
Solutions include:
- Make redundant copies of cuFFT plans as concurrent demand is observed.
- Enforce mutual exclusion on plans during
cuFFT call.
The former is vastly preferred to the latter.
Additionally, the cuFFT documentation makes clear that concurrent usage of plans is fundamentally thread-unsafe.
Fortunately plans are cached by
accelerate-ffthowever there is only one plan per shape and data-type and no mutual exclusion for concurrent usage of that plan which poses a thread-safety issue when multiple separate threads try to use the same plan, assign it a corresponding CUDA stream, and then execute said plan. Execution could feasibly look like:Solutions include:
cuFFTcall.The former is vastly preferred to the latter.
Additionally, the cuFFT documentation makes clear that concurrent usage of plans is fundamentally thread-unsafe.