A few weeks ago we added some computer vision functionality to our open source ArrayFire GPU computing library. Specifically, we implemented the FAST feature extractor, BRIEF feature point descriptor, ORB multi-resolution scale invariant feature extractor, and a Hamming distance function. When combined, these functions enable you to find features in videos (or images) and track them between successive frames.
FAST and ORB in ArrayFire: a demonstration
Using FAST and ORB
Now that you have seen the functionality in action, how do you use these in your code? Assuming that you have a suitable function that can load your video frames in to ArrayFire “array” objects, it’s pretty simple. To use FAST simply call the fast function:
// Run FAST on a frame and get the features: af::features fast_features = af::fast(frame, 20, 9, 1, 0.05f);
The `af::features` object is a special structure which contains the `(x,y)` location of the features, their scores, and the number of features. To get access to this information simply use the `getX()`, `getY()`, `getScore()`, and `getNumFeatures()` functions:
af::array x_pos = fast_features.getX(); af::array y_pos = fast_features.getY(); af::array scores = fast_features.getScore(); int N = fast_features.getNumFeatures();
ORB has a slightly different syntax:
// Run ORB on a frame and get the features and descriptors af::features orb_features; af::array orb_descriptors; af::orb(orb_features, orb_descriptors, frame, 20, 1000, 1.2, 1, false);
The features from `ORB` may be accessed in the same fashion as `FAST` described above. The `orb_descriptors` variable is a Nx8 array containing extracted descriptors (read the ORB paper for further details on this structure).
The Hamming matcher I mention in the video is almost ready to be committed to our upstream development branch, so it is also worth mentioning how it can be used. Assuming you have a list of features from two sources called `a_features` and `b_features` respectively, you can compute their Hamming distance using ArrayFire as follows:
af::array idx; af::array distances; af::array a_features; af::array b_features; af::hamming_matcher(idx, distances, a_features, b_features, 0, 1);
Then you may get the indicies of features whose distance is less than, say 50, using ArrayFire’s `where` operator. After you have these inidices, you can pick out these near-matching features pretty easily:
// select features with distances less than 50: af::array near_matches = where(dist < 50); // and now extract information about those features: array a_feat_x = a_features.getX()(near_matches) array a_feat_y = a_features.getY()(near_matches) ...
Because these are ArrayFire array objects, they exist in your GPU’s (or other accelerator device’s) memory. To get access to them within your host code, you need to get a copy of the data and place it somewhere in RAM:
float * features_x = a_feat_x.host(); float * features_y = a_feat_y.host();
Given how easy it is to use ArrayFire’s computer vision code, the next question is “how well does it perform?” To answer this question we created a small benchmarking program which ran FAST and ORB on single frames from the Big Buck Bunny video on one of our machines with a NVIDIA K20 GPU using ArrayFire’s and OpenCV’s CUDA implementations.. To benchmark the algorithms, we loaded the data into RAM (or to the GPU) prior to executing the core algorithms. The video frames were in 320p, 480p, 1080p, and 4k resolution. The results of our benchmarks are below:
Above you see the throughput (in images/second) as a function of image height (for 16:9 aspect video frames) for the FAST corner detection algorithm. ArrayFire’s implementation of FAST is about 10% faster than OpenCV for all but 4k video frames.
Similarly, ArrayFire’s implementation of ORB is about 50% faster than OpenCV for all but the 4k video where ArrayFire falls slightly behind. We are presently exploring methods to improve ArrayFire’s performance on the 4k video.