Jacket on Lenovo Systems

ScottAnnouncements, Benchmarks 1 Comment

Lenovo and AccelerEyes have a joint solution for optimizing M code on Lenovo workstations.  The combined HPC solution combines high Intel Xeon CPU performance for daily productivity with unprecedented NVIDIA graphics (GPU) performance for parallel computing with Jacket. Jacket’s comprehensive benchmark suite, when run on Lenovo ThinkStation systems, shows tremendous amounts of speedups for a wide variety of computationally-intensive applications. Jacket is the world’s fastest and broadest GPU software accelerating the M-language commonly found in MATLAB®.  Thousands of customers around the world have used Jacket to accelerate their MATLAB code. Lenovo ThinkStation systems are ideally suited for running real-world high-performance applications using Jacket. While the high-end CPUs are ideal for daily productivity tasks, Jacket and the Quadro GPUs perform HPC …

AccelerEyes Releases ArrayFire GPU Software

ScottAnnouncements, ArrayFire, C/C++, CUDA, Fortran, OpenCL 1 Comment

A free, fast, and simple GPU library for CUDA and OpenCL devices. AccelerEyes announces the launch of ArrayFire, a freely-available GPU software library supporting CUDA and OpenCL devices. ArrayFire supports C, C++, Fortran, and Python languages on AMD, Intel, and NVIDIA hardware.  Learn more by visiting the ArrayFire product page. “ArrayFire is our best software yet and anyone considering GPU computing can benefit,” says James Malcolm, VP Engineering at AccelerEyes.  “It is fast, simple, GPU-vendor neutral, full of functions, and free for most users.” Thousands of paying customers currently enjoy AccelerEyes’ GPU software products.  With ArrayFire, everyone developing software for GPUs has an opportunity to enjoy these benefits without the upfront expense of a developer license. Reasons to use ArrayFire: …

AccelerEyes Webinar Series

ScottAnnouncements, CUDA, Events, OpenCL Leave a Comment

AccelerEyes invites you to participate in series of webinars designed to help you learn more about Jacket for MATLAB® and LibJacket for C/C++/Fortran/Python, a comprehensive library of GPU-accelerated functions. Joint Webinar With NVIDIA: LibJacket CUDA Library On October 20th we co-hosted a joint webinar with NVIDIA.  During this well-attended event, our GPU computing experts provided a general product overview and usage of the LibJacket CUDA library.  Several impressive demos of LibJacket in action were provided as well.  LibJacket supports hundreds of GPU computing functions and programmers in numerous industries have been able to speedup applications.  Be sure to check out the Q&A session included in the recorded webinar posted on NVIDIA’s Developer Zone. Thanks again to NVIDIA for co-hosting this informative webinar! GPU Programming for …

Filtering Benchmarks – OpenCV GPU vs LibJacket

ArrayFireBenchmarks, CUDA Leave a Comment

OpenCV is one of the most popular computer vision toolkits, and over the last year they’ve been integrating more GPU processing into the core. One of the most common image processing tasks is convolution. Since LibJacket and OpenCV both support this, one of my coworkers rolled up his sleeves and benchmarked the latest versions from both libraries: OpenCV/CPU, OpenCV/GPU, LibJacket. Jump over to his personal website for the full benchmark results and source code.  From the graphs, the GPU implementations from OpenCV and LibJacket both easily outperform the default CPU version in OpenCV, but notice that LibJacket pushes performance even further and dominates OpenCV’s GPU implementation, especially when using separable filters. We’ve worked really hard the last few years to …

Optimization methods for deep learning

ArrayFireCase Studies Leave a Comment

Researchers at SAIL (Stanford Artificial Intelligence Laboratory), have done it again. They have successfully used Jacket to speed up the training part of Deep Learning algorithms. In their paper titled “On Optimization Methods for Deep Learning”, they experiment with some of the well known training algorithms and demostrate their scalability across parallel architectures (GPUs as well as multi-machine networks). The algorithms include SGDs (Stochastic Gradient Descent) L-BFGS (Limited BFGS used for solving non-linear problems), CG (Conjugate Gradient). While SGDs are easy to implement, they require manual tuning. Add to that their sequential nature, they are hard to tune, scale and parallelize making them difficult to use with Deep Learning algorithms.  L-BFGS and CG algorithms can be harder to implement and …

Discrete GPUs are here to stay

John MelonakosCUDA 2 Comments

Ever since AccelerEyes began over 4 years ago, naysayers have flippantly tossed out the idea that somehow computing on discrete GPUs will soon go away. Some thought AMD’s Fusion would become the demise of discrete GPU computing. Others thought that Intel’s integrated graphics would squeeze high-end GPUs out of the market. Neither is anywhere close to disrupting the utility of discrete GPUs (especially those currently available from NVIDIA) for solving computational challenges that face domain professionals. Today, Jon Peddie Research introduced a free whitepaper describing the market forces and the sales projections of GPUs.  From the article: “The facts speak for themselves. Those who are concerned about graphics performance will buy discrete GPU systems. As good as they are, embedded …

Jacket Demo – CPU vs GPU runtimes on MATLAB® code

John MelonakosBenchmarks, CUDA 1 Comment

To explore the differences between CPU-only computing and GPU-accelerated computing, the new Jacket Demo is really convenient.  The Jacket Demo automatically launches two MATLAB® sessions, one running on the CPU-only and the other running on the GPU with Jacket. This side-by-side demo shows the computational speed of each processor as well as a visual depiction of the algorithm’s progression.  A variety of different demos are provided. The Jacket Demo is included in every Jacket installation (found in the examples directory and launchable from the Start Menu in Windows). Checkout this video of the Jacket Demo in action on an i7 CPU with a Tesla C2050 GPU.  Enjoy!

Fast Computer Vision with OpenCV and ArrayFire

John MelonakosArrayFire, Benchmarks, Case Studies, CUDA Leave a Comment

Update:  While the post below discusses LibJacket (no longer a product), you can do the same thing in the newer, but different, ArrayFire library.  Improved performance benchmarks and a simpler API are the results of moving from LibJacket to ArrayFire. Mcclanahoochie just posted some code and instructions for pairing OpenCV with LibJacket to get accelerated computer vision.  You can do really fast image processing on video cam feeds too, see picture below: Really cool stuff.  Computer vision is really hot with applications emerging in defense, radiology, games, automotive, and other consumer applications. Computer vision algorithms like these are also going mobile.  For instance, we have started to build LibJacket for Mobile applications, which runs on Tegra, PowerVR, and other mobile …

Action Recognition with Independent Subspace Analysis

ArrayFireCase Studies Leave a Comment

Researchers at the Stanford Artificial Intelligence Laboratory (SAIL) have had more success (building on previous work) using Jacket to speed up their algorithm. In a paper at this year’s CVPR 2011, entitled “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis”, they explain how their unsupervised feature learning algorithm competes with other algorithms that are hand crafted or use learned features. KTH Hollywood2 UCF Youtube Best published Results 92.1% 50.9% 85.6% 71.2% Stanford group Results 93.9% 53.3% 86.5% 75.8% Testing their algorithm on four well-known benchmark datasets, they were able to achieve better performance than existing results that have been published so far. For their training purposes, they used a multi-layered stacked convolutional ISA (Independent subspace analysis) …

Filtered Back-Projection and Non-Uniform FFTs

ArrayFireCase Studies Leave a Comment

In order to investigate changes of forest biomass, scientists use microwave tomography to image the vegetation. At the smallest scale, individual plants can be imaged to investigate branching and growth, but even synthetic aperture radar can reveal large-scale changes in regional ecology. To the right, you can see the experimental setup to image an individual plant. Filtered back-projection is at the core of all of these techniques: using the inverse Radon transform to reconstruct regular images from Fourier samples. Below you can see the final reconstructed image. Since these samples are often not on a uniform Cartesian grid, the non-uniform version of the FFT comes into play (NUFFT), and all of this requires some serious number crunching: bring in the …