Feature detection and tracking using ArrayFire

Brian Kloppenborg ArrayFire, C/C++, Image Processing Leave a Comment

A few weeks ago we added some computer vision functionality to our open source ArrayFire GPU computing library. Specifically, we implemented the FAST feature extractor, BRIEF feature point descriptor, ORB multi-resolution scale invariant feature extractor, and a Hamming distance function. When combined, these functions enable you to find features in videos (or images) and track them between successive frames.

FAST and ORB in ArrayFire: a demonstration

Here is a quick video showing these functions executing on a few frames from the Blender Foundation's Big Buck Bunny video:

Using FAST and ORB

Now that you have seen the functionality in action, how do you use these in your code? Assuming that you have a suitable function that can load your video frames in to ArrayFire "array" objects, it's pretty simple. To use FAST simply call the fast function:

The af::features object is a special structure which contains the (x,y) location of the features, their scores, and the number of features. To get access to this information simply use the getX(), getY(), getScore(), and getNumFeatures() functions:

ORB has a slightly different syntax:

The features from ORB may be accessed in the same fashion as FAST described above. The orb_descriptors variable is a Nx8 array containing extracted descriptors (read the ORB paper for further details on this structure).

The Hamming matcher I mention in the video is almost ready to be committed to our upstream development branch, so it is also worth mentioning how it can be used. Assuming you have a list of features from two sources called a_features and b_features respectively, you can compute their Hamming distance using ArrayFire as follows:

Then you may get the indicies of features whose distance is less than, say 50, using ArrayFire's where operator. After you have these inidices, you can pick out these near-matching features pretty easily:

Because these are ArrayFire array objects, they exist in your GPU's (or other accelerator device's) memory. To get access to them within your host code, you need to get a copy of the data and place it somewhere in RAM:

For further information about ArrayFire's FAST and ORB implementations, please see the ArrayFire's computer vision function documentation.


Given how easy it is to use ArrayFire's computer vision code, the next question is "how well does it perform?" To answer this question we created a small benchmarking program which ran FAST and ORB on single frames from the Big Buck Bunny video on one of our machines with a NVIDIA K20 GPU using ArrayFire's and OpenCV's CUDA implementations.. To benchmark the algorithms, we loaded the data into RAM (or to the GPU) prior to executing the core algorithms. The video frames were in 320p, 480p, 1080p, and 4k resolution. The results of our benchmarks are below:


Above you see the throughput (in images/second) as a function of image height (for 16:9 aspect video frames) for the FAST corner detection algorithm. ArrayFire's implementation of FAST is about 10% faster than OpenCV for all but 4k video frames.


Similarly, ArrayFire's implementation of ORB is about 50% faster than OpenCV for all but the 4k video where ArrayFire falls slightly behind. We are presently exploring methods to improve ArrayFire's performance on the 4k video.