LIBJACKET on Amazon EC2 GPU Cloud Instances

Pavan Yalamanchili Benchmarks, CUDA 1 Comment

Amazon recently added GPUs to their Elastic Compute Cloud. We decided to throw LIBJACKET into this GPU cloud to see how it would fare. The $2/hr pay-on-demand pricing is a great option for many Jacket programmers. This post is full of screenshots detailing the steps we took to get going with GPU computing in Amazon’s cloud: Sign up with Amazon EC2 Launch a GPU instance Login to the instance using ssh Setup the environment Download, build, and test LIBJACKET! Everything in this post applies equally well to running Jacket for MATLAB® on EC2. Simply install MATLAB + Jacket in your Amazon GPU instance and start working over ssh.

GPU accelerated lattice Boltzmann model for shallow water flow and mass transport

John Melonakos Benchmarks, Case Studies, CUDA 3 Comments

Dr. Kevin Tubbs and Professor Tsai at Louisiana State University recently published an interesting paper using GPUs and Jacket to accelerate lattice Boltzmann models for shallow water flow and mass transport.  More details about this work are provided in the full success story page on the website. Jacket makes GPU programming easy.  “Very little recoding was needed to promote the LBM code to run on the GPU,” say the authors at one point in their paper. In this blog post, we share the highlights of this work.  Using these methods, the authors are able to simulate shallow water flow and mass transport.  For instance, checkout these videos of a dam break: The authors completed this work with a relatively older …

Beam Propagation Methods – Jacket is 3.5X faster than the CPU and 2X faster than PCT

John Melonakos Benchmarks, Case Studies, CUDA 2 Comments

A couple weeks ago, a GPU-enabled code appeared on MATLAB Central entitled, “A CUDA accelerated Beam Propagation Method [BPM] Solver using the Parallel Computing Toolbox.”  In this post, we share a video which showcases how Jacket is much better than PCT at GPU computing, by analyzing performance on this Beam Propagation Method code. To reproduce these results, download the source code here:  CUDA_BPM_NOV_04_2010_AccelerEyes These benchmarks were run on an NVIDIA Tesla C2070 GPU versus a quad-core Intel CPU.  MATLAB + PCT R2010B were used for the PCT-GPU experiments.  MATLAB + Jacket 1.6 (prerelease) were used for the Jacket-GPU experiments. Take Home Message Due to Jacket’s extensive library of GPU functions and its optimized GPU runtime, it performs 3.5X faster than …

Speeding up critical code

ArrayFire CUDA Leave a Comment

With Jacket 1.5, we released a big new feature:  GCOMPILE. This allows you to convert critical sections of your MATLAB code directly into GPU kernels to further increase speed.  In an earlier post we introduced the prototype and have been working with several beta users over the past month to get it ready.  In this post, we’ll give some more details and start to look at the speedups you can quickly and easily achieve.  You can find more information about it on the wiki. Some of the best features of GCOMPILE are the ability to use IF statements, WHILE loops, and FOR loops in your code now.  Make sure to check out the wiki pages about these and the other …

A Jacket built for Speed

ArrayFire Benchmarks, CUDA 1 Comment

Just a few months ago, Jacket 1.4 was released sporting an improved MTIMES routine that brought about radical improvements to Jacket’s matrix multiplication. The quest for performance never ends though. Now, in the release of Jacket 1.5, MTIMES is even faster than before for SGEMM routines. Checkout the MTIMES Benchmarks wiki for more information. I you are attending GTC, you may want to attend this session also!

GPU Giddy – Excitement Building for GTC

John Melonakos CUDA, Events Leave a Comment

GTC is coming up… The GPU Technology Conference (GTC) starts later this month and is sure to generate a new level of excitement and energy around GPU computing.  The conference includes over 250 technology sessions presented by industry, government, and academic technology leaders.  AccelerEyes is pleased to be well represented at this year’s conference by our technical leadership and a number of our customers.  If you plan to attend the conference be sure to include the sessions outlined below on your agenda. In addition to being well represented, we are also flattered to see that others in the market have recognized that GPU Computing with MATLAB delivers clear productivity gains and that the performance improvements made possible by GPUs is …

SGEMM, MTIMES & CUBLAS performance on the GPU

ArrayFire Benchmarks, CUDA 5 Comments

AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically (Single General Matrix Multiply) SGEMM, or MTIMES.  The Jacket 1.3 release was based on CUBLAS for matrix multiplication and given the importance of matrix multiplication to so many of our customers, we decided to find out if we could improve performance of the function. Update: The new MTIMES routine in Jacket 1.4 has improved sigificantly since these benchmarks of the Release Candidate were taken. Have …

NVIDIA Fermi with CUDA and OpenCL

ArrayFire Benchmarks, CUDA, OpenCL 1 Comment

In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket.  If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post.  For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples.  Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance.  The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak). Before we present the benchmarks, we should comment on …

Power Flow with Jacket & MATLAB on the GPU!

John Melonakos Case Studies, CUDA Leave a Comment

Learn how Jacket, GPUs, and MATLAB can deliver magnitudes of performance improvement over CPU-based solutions for Power flow studies. AccelerEyes, in collaboration with the Indian Institute of Technology in Roorkee, has developed this case study to illustrate the ability to study power flow models on graphics processing units using Jacket and MATLAB. Implementation on the GPU is 35 times faster than CPU alternatives. http://www.accelereyes.com/resources/powerflow

Jacket and GPUs show promise in Neuroscience with fMRI and SPM

John Melonakos Case Studies, CUDA Leave a Comment

For those of you interested in neuroscience and neuroimaging, you have probably heard of a software capability called SPM or Statistical Parametric Mapping developed by a group at University College London.  Well, a group at Georgia Tech has been doing some work with Jacket and CUDA on SPM and have produced some initial results that show some promise.  Being able to speed up the image analysis of functional MRI can benefit the medical community in a big way.  AccelerEyes has been supporting these efforts at Georgia Tech and with the permission of the authors we have produced an initial look at their work.  Enjoy. http://www.accelereyes.com/resources/spm-fmri