A program that using CUDA and C++ to generate Histogram Equalization picture in parallel way.
Update: Add Sobel Detection CUDA program
- Understand how Histogram Equalization is applied to images.
- Write an optimized GPU code in CUDA that provides the same functionality of the histogram equalization from OpenCV but can perform the algorithm faster.
The basic algorithm for Histagram Equalization can be divided into four steps:
-
Calculate the histogram of the image.
-
Considering to split one big image into multi small images and parallelly caluculate that. (not used this)
-
Can comprise the image from the CPU, so less GPU memory will be malloced by calling cudamalloc(), which will save more time. (Not used this, this technology is called RLC, Run Length Coding)
-
Atomicadd method is initially considered to use, but it will reduce the performance. But we have no choice, it should definitely be used for histogram.
-
Better way to do is do per-thread histogrms parallelly, sort each gray value and reduce by key, then reduce all histograms. This is a per-block or per-thread histogram generation algorithm.
This method can work well on GTX 1060(Pascal Architecture), not well on NEU Discovery cluster, so it is not used.
-
Used down-sampling method to reduce the whole work of calculating histogram
-
-
Calculate the cumulative distribution function(CDF). Using prefix sum to parallely calculate. The algorithm is called Hillis and Steele scan algorithm
-
Calculate the cdfmin, maybe using the reduction tree method
-
Calculate the histogram equalization value with the given formula
-
Put the calculated value back to generate new image data, and transfer it back to host memory