(This is a guest post by Gustavo Stahl from São Paulo State University in Brazil.)
Summary
Corners present in images are widely used in multiple areas of computer science, such as augmented reality, autonomous vehicles, service robots, 3D reconstructions, object tracking, and many more. To work appropriately, applications in these areas usually rely on fast corner detectors with good-quality extractions.
The FOAGDD (First-order Anisotropic Gaussian Direction Derivative) is an algorithmic technique for extracting corners in an image originally proposed by Weichuan Zhang and Changming Sun in 2019. The method surpassed the majority of extractors in corner detection quality but lacked speed, making it improper for real-time applications. Hence, this paper proposes transferring the workload from the original implementation to the GPU, leveraging its multiple working threads to accelerate heavy computations in the code.
FOAGDD’s online implementation was written in Python and is composed of multiple steps, two major ones:
- convolution between multiple 2D filters and the input image;
- computation of a “corner measure” for each image pixel, considering its neighboring pixels.
Both operations have high computational costs that increase with the image resolution. Therefore, this paper targeted implementing these two steps to work on the GPU, using custom CUDA kernels and ArrayFire’s functions.
ArrayFire’s role
This paper added the ArrayFire library to replace the entire convolution step, which originally was performed in a simple for-loop with calls to the convolve2d module from scipy.signal. The replacement was done using the handy convolve2 function from ArrayFire’s signal processing module to perform a batched convolution between multiple filters and one image. In the benchmarks with an NVIDIA Tesla T4, ArrayFire brought a speed-up of 1594 (15,94 seconds → 10 milliseconds) for a standardized image with a resolution of 512 × 512 pixels. Following is a graph showing the speed-up obtained as the image resolution increased.
The results
In the end, this paper brought a speed-up of 3190 (66,03 seconds → 20,70 milliseconds) over the original CPU implementation, using as base a standardized 512 × 512 image. Below you can see FOAGDD’s corner detection results for two samples from the University of South Florida dataset.