Researchers at SAIL (Stanford Artificial Intelligence Laboratory), have done it again. They have successfully used Jacket to speed up the training part of Deep Learning algorithms. In their paper titled “On Optimization Methods for Deep Learning”, they experiment with some of the well known training algorithms and demostrate their scalability across parallel architectures (GPUs as well as multi-machine networks). The algorithms include SGDs (Stochastic Gradient Descent) L-BFGS (Limited BFGS used for solving non-linear problems), CG (Conjugate Gradient). While SGDs are easy to implement, they require manual tuning. Add to that their sequential nature, they are hard to tune, scale and parallelize making them difficult to use with Deep Learning algorithms. L-BFGS and CG algorithms can be harder to implement and …
Fast Computer Vision with OpenCV and ArrayFire
Update: While the post below discusses LibJacket (no longer a product), you can do the same thing in the newer, but different, ArrayFire library. Improved performance benchmarks and a simpler API are the results of moving from LibJacket to ArrayFire. Mcclanahoochie just posted some code and instructions for pairing OpenCV with LibJacket to get accelerated computer vision. You can do really fast image processing on video cam feeds too, see picture below: Really cool stuff. Computer vision is really hot with applications emerging in defense, radiology, games, automotive, and other consumer applications. Computer vision algorithms like these are also going mobile. For instance, we have started to build LibJacket for Mobile applications, which runs on Tegra, PowerVR, and other mobile …
Action Recognition with Independent Subspace Analysis
Researchers at the Stanford Artificial Intelligence Laboratory (SAIL) have had more success (building on previous work) using Jacket to speed up their algorithm. In a paper at this year’s CVPR 2011, entitled “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis”, they explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features. KTH Hollywood2 UCF Youtube Best published Results 92.1% 50.9% 85.6% 71.2% Stanford group Results 93.9% 53.3% 86.5% 75.8% Testing their algorithm on four well-known benchmark datasets, they were able to achieve better performance than existing results that have been published so far. For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) …
Filtered Back-Projection and Non-Uniform FFTs
In order to investigate changes of forest biomass, scientists use microwave tomography to image the vegetation. At the smallest scale, individual plants can be imaged to investigate branching and growth, but even synthetic aperture radar can reveal large-scale changes in regional ecology. To the right, you can see the experimental setup to image an individual plant. Filtered back-projection is at the core of all of these techniques: using the inverse Radon transform to reconstruct regular images from Fourier samples. Below you can see the final reconstructed image. Since these samples are often not on a uniform Cartesian grid, the non-uniform version of the FFT comes into play (NUFFT), and all of this requires some serious number crunching: bring in the …
Music Beat Analysis
Did you ever wonder how the music visualizer in your media player works? Watching it pulsate in synchrony with the beats of the song is almost as entertaining as listening to the song itself! Researchers have been attempting to detect beats in audio signals for many years, and there are many techniques available, from the simplest (and least accurate) to more complicated algorithms that are highly accurate. All algorithms, though, perform some form of signal processing and frequency analysis, applications highly suited to GPU Computing. The beat visualizer described here was developed by researchers at Rice University, and is simple and fast. An incoming signal is broken down into six frequency bands for analysis. After smoothing out these bands and …
Accelerating LTE Simulation
Simulation in MATLAB is a driving force in several research projects. However, the accompanying long simulation times can tend to be a drag in many of these projects. In this article, we shall bring up the example of the work on 3GPP LTE System Simulation by Yuan Gao et al (from Tsinghua University, Beijing) and demonstrate how the use of Jacket can significantly improve the simulator performance and lead to faster validation times in simulation projects. 3GPP’s LTE (Long Term Evolution) and LTE-Advanced are important telecommunication standards pertaining to 3G and 4G communication networks. With networks worldwide beginning to adopt them for consumer usage, a great need has come up for several novel link and system-level communication techniques developed by …
High Performance Compressive Sensing
A few weeks ago, we published a blog entry that demonstrated the ability of Jacket to speed up “compressive sensing”, a technology that has wide applications in areas such as Image processing, reconstruction and spectroscopy. Here, we discuss the work of Nabor Reyna Jr. and Wotao Yin from Rice University using Jacket to speed up “compressive sensing” algorithms in reconstruction. This work deals with reconstruction of signals using partial Fourier matrices (RecPF). The major computational components of the algorithm involve shrinkage and FFTs. Jacket is employed to accelerate this compute-heavy code, and the resultant version (gRecPF) was about 5x faster! To reduce the cost involved in generating the random matrices involved in the above method, a second method (RecPC) that …
Using Jacket to design and simulate echo generators
Antenna array design involves repeated simulation to tune the many parameters involved, and waiting around for simulations to finish is no fun. Offloading the optimization problem onto the GPU cuts that time down significantly. In their recent paper, Capozzoli, Curcio, and Liseno (pdf, citation) of University of Naples Federico II demonstrated how a simple modification to their echo generator array simulation took advantage of the GPU to bring immediate speedups. Checkout this figure from their paper showing CPU simulation time growing prohibitively slow while the GPU grows little as more data is fed. Their simulation is designed around optimizing an energy functional. Using fminunc to drive the optimization problem on the CPU, they simply modified their functional evaluation to take …
Chan-Vese Active Contours on the GPU
Active Contours are mathematical models that enable detection of objects within images, and are extensively used in Computer Vision as self-adapting frameworks for the delineation and tracking of objects. To demonstrate Jacket’s cross-platform versatility, we implemented the Chan Vese contour tracking app on Android. The video can be viewed here. Today, however, we’d like to use a MATLAB implementation of active contours as an example of how to take a large project, and with minimal changes, achieve speedups with Jacket. We’ll dangle the proverbial carrot first: the GPU Chan-Vese implementation contains only three kinds of changes overall, and the computational code is exactly the same for both CPU and GPU versions. Plus, take a look at the speed-ups below! How …
Laplace Transform Inversion on the GPU
The numerical inversion of the Laplace transform is a long standing problem due its implicit ill-posedness. Patrick Kano and Moysey Brio of Acunum Algorithms and Simulations, with their experience in computational methods and algorithm development, found a solution that not only works, but is very fast. Their code implements the Weeks’ method for Numerical Laplace Inversion. Apart from casting CPU variables to GPU, etc, the major step involved in Jacketizing the code was as simple as converting a for loop to GFOR! Something like what’s given below: for nidx=1:Nprod Errorvec(nidx) = wfncpuErrorEst( … ); end gfor nidx=1:Nprod Errorvec(nidx) = wfnjacErrorEst( … ); gend The loop in question calls a global minimization function that computes an absolute error estimate for each …