Optimization methods for deep learning

ArrayFireCase Studies Leave a Comment

Researchers at SAIL (Stanford Artificial Intelligence Laboratory), have done it again. They have successfully used Jacket to speed up the training part of Deep Learning algorithms. In their paper titled “On Optimization Methods for Deep Learning”, they experiment with some of the well known training algorithms and demostrate their scalability across parallel architectures (GPUs as well as multi-machine networks). The algorithms include SGDs (Stochastic Gradient Descent) L-BFGS (Limited BFGS used for solving non-linear problems), CG (Conjugate Gradient). While SGDs are easy to implement, they require manual tuning. Add to that their sequential nature, they are hard to tune, scale and parallelize making them difficult to use with Deep Learning algorithms.  L-BFGS and CG algorithms can be harder to implement and …

Discrete GPUs are here to stay

John MelonakosCUDA 2 Comments

Ever since AccelerEyes began over 4 years ago, naysayers have flippantly tossed out the idea that somehow computing on discrete GPUs will soon go away. Some thought AMD’s Fusion would become the demise of discrete GPU computing. Others thought that Intel’s integrated graphics would squeeze high-end GPUs out of the market. Neither is anywhere close to disrupting the utility of discrete GPUs (especially those currently available from NVIDIA) for solving computational challenges that face domain professionals. Today, Jon Peddie Research introduced a free whitepaper describing the market forces and the sales projections of GPUs.  From the article: “The facts speak for themselves. Those who are concerned about graphics performance will buy discrete GPU systems. As good as they are, embedded …

Jacket Demo – CPU vs GPU runtimes on MATLAB® code

John MelonakosBenchmarks, CUDA 1 Comment

To explore the differences between CPU-only computing and GPU-accelerated computing, the new Jacket Demo is really convenient.  The Jacket Demo automatically launches two MATLAB® sessions, one running on the CPU-only and the other running on the GPU with Jacket. This side-by-side demo shows the computational speed of each processor as well as a visual depiction of the algorithm’s progression.  A variety of different demos are provided. The Jacket Demo is included in every Jacket installation (found in the examples directory and launchable from the Start Menu in Windows). Checkout this video of the Jacket Demo in action on an i7 CPU with a Tesla C2050 GPU.  Enjoy!

Fast Computer Vision with OpenCV and ArrayFire

John MelonakosArrayFire, Benchmarks, Case Studies, CUDA Leave a Comment

Update:  While the post below discusses LibJacket (no longer a product), you can do the same thing in the newer, but different, ArrayFire library.  Improved performance benchmarks and a simpler API are the results of moving from LibJacket to ArrayFire. Mcclanahoochie just posted some code and instructions for pairing OpenCV with LibJacket to get accelerated computer vision.  You can do really fast image processing on video cam feeds too, see picture below: Really cool stuff.  Computer vision is really hot with applications emerging in defense, radiology, games, automotive, and other consumer applications. Computer vision algorithms like these are also going mobile.  For instance, we have started to build LibJacket for Mobile applications, which runs on Tegra, PowerVR, and other mobile …

Action Recognition with Independent Subspace Analysis

ArrayFireCase Studies Leave a Comment

Researchers at the Stanford Artificial Intelligence Laboratory (SAIL) have had more success (building on previous work) using Jacket to speed up their algorithm. In a paper at this year’s CVPR 2011, entitled “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis”, they explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features. KTH Hollywood2 UCF Youtube Best published Results 92.1% 50.9% 85.6% 71.2% Stanford group Results 93.9% 53.3% 86.5% 75.8% Testing their algorithm on four well-known benchmark datasets, they were able to achieve better performance than existing results that have been published so far. For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) …

Filtered Back-Projection and Non-Uniform FFTs

ArrayFireCase Studies Leave a Comment

In order to investigate changes of forest biomass, scientists use microwave tomography to image the vegetation. At the smallest scale, individual plants can be imaged to investigate branching and growth, but even synthetic aperture radar can reveal large-scale changes in regional ecology. To the right, you can see the experimental setup to image an individual plant. Filtered back-projection is at the core of all of these techniques: using the inverse Radon transform to reconstruct regular images from Fourier samples. Below you can see the final reconstructed image. Since these samples are often not on a uniform Cartesian grid, the non-uniform version of the FFT comes into play (NUFFT), and all of this requires some serious number crunching: bring in the …

Music Beat Analysis

ArrayFireCase Studies Leave a Comment

Did you ever wonder how the music visualizer in your media player works? Watching it pulsate in synchrony with the beats of the song is almost as entertaining as listening to the song itself! Researchers have been attempting to detect beats in audio signals for many years, and there are many techniques available, from the simplest (and least accurate) to more complicated algorithms that are highly accurate. All algorithms, though, perform some form of signal processing and frequency analysis, applications highly suited to GPU Computing. The beat visualizer described here was developed by researchers at Rice University, and is simple and fast. An incoming signal is broken down into six frequency bands for analysis. After smoothing out these bands and …

Accelerating LTE Simulation

ArrayFireCase Studies Leave a Comment

Simulation in MATLAB is a driving force in several research projects. However, the accompanying long simulation times can tend to be a drag in many of these  projects. In this article, we shall bring up the example of the work on 3GPP LTE System Simulation by Yuan Gao et al (from Tsinghua University, Beijing) and demonstrate how the use of Jacket can significantly improve the simulator performance and lead to faster validation times in simulation projects. 3GPP’s LTE (Long Term Evolution) and LTE-Advanced are important telecommunication standards pertaining to 3G and 4G communication networks. With networks worldwide beginning to adopt them for consumer usage,  a great need has come up for several novel link and system-level communication techniques developed by …

Tree cats see your code!

John MelonakosArrayFire Leave a Comment

From time-to-time we stumble across funny quirks while using MATLAB®.  The latest came as one of our developers accidentally mis-keyed a few characters.  With 5 characters on the command line, you too can get a message about tree cats seeing your bad code (followed by a nasty seg fault, so beware).  Try this: >> a()@a tree_cat sees bad code * Subsref [4] * M_ID 0(5) which * M_LRB 5(1) * ExprList [1] * M_ID 6(1) e * M_RRB 7(1) tree_cat sees bad code * Subsref [4] * M_ID 0(5) which * M_LRB 5(1) * ExprList [1] * M_ID 6(1) e * M_RRB 7(1) Top Secret:  Part of Jacket’s GPU runtime involves monkeys obtaining bananas for optimal performance. While we can’t …

High Performance Compressive Sensing

ArrayFireBenchmarks, Case Studies Leave a Comment

A few weeks ago, we published a blog entry that demonstrated the ability of Jacket to speed up “compressive sensing”, a technology that has wide applications in areas such as Image processing, reconstruction and spectroscopy. Here, we discuss the work of Nabor Reyna Jr. and Wotao Yin from Rice University using Jacket to speed up “compressive sensing” algorithms in reconstruction. This work deals with reconstruction of signals using partial Fourier matrices (RecPF).  The major computational components of the algorithm involve shrinkage and FFTs.  Jacket is employed to accelerate this compute-heavy code, and the resultant version (gRecPF) was about 5x faster! To reduce the cost involved in generating the random matrices involved in the above method, a second method (RecPC) that …