Jacket Lectures – Learn and Teach GPU computing

John MelonakosAnnouncements, CUDA Leave a Comment

We are pleased to share 6 in-depth Jacket lectures, helpful both in learning and teaching Jacket.  Download the lectures (PDF format), here:  http://www.accelereyes.com/support/lectures Jacket is used in course instruction at many universities around the world. Professors and course instructors use Jacket to provide engineering students with GPU acceleration of MATLAB® algorithms and to bring HPC to MATLAB courses. The six lectures are entitled “Parallel High Performance Computing with Emphasis on Jacket Based GPU Computing” and have topics including: Parallel computing introduction Jacket introduction Basic programming with Jacket Advanced programming with Jacket Multiple GPU programming Benchmarking If you are looking at accelerating MATLAB code or parallel computing with MATLAB, you definitely will want to add these lectures to your arsenal of …

Speeding Up Compressed Sensing Algorithms

ScottCase Studies, CUDA 1 Comment

Are you looking for ways to speed up compressed sensing? If you work in the areas of medical image reconstruction, image acquisition or sensor networks, you probably are. This paper, Parallel Implementation of Compressed Sensing Algorithm on CUDA-GPU, compares CPUs running Matlab and GPUs running Jacket using a Basis Pursuit Algorithm for compressed sensing. They compared an Intel Core 2 Duo T8100 (2.1GHz and 3.0 GB memory) running Matlab with a NVIDIA GeForce series 8400m GS (256 MB video memory, DDR2 and bus width of 64bit) using an older version of Jacket, Version 1.3. The CPU and GPU setups were used to run their Basis Pursuit Algorithm on six MRI images. These are some samples:   The implementation using Jacket …

Our Point of View & Twitter Comedy

John MelonakosCUDA Leave a Comment

“Great businesses have a point of view, not just a product or service.” ~37 Signals At AccelerEyes, our point of view is that GPU software can and should deliver great results on real applications. With this point of view, we’ve kept our heads down solely focused on delivering a great runtime system for GPUs. All our energy has been devoted to the task of emitting optimized low-level code from high-level matrix notation. These efforts are now paying off in a big way!  Jacket is consistently delivering awesome results in real applications, read examples here and here. Alternative choices apparently have a different point of view.  Yesterday’s twitter stream contained a comical, but all-to-common indication of frustration with the recent GPU …

A better way to time Jacket code

ArrayFireBenchmarks 1 Comment

Whether you are a new Jacket programmer or a GPU maestro, you are bound to speed-test Jacket at some point. There are many factors to keep in mind while benchmarking Jacket code – a simple tic-func()-toc won’t do. For example, this is some typical benchmarking code: % warm up x = rand(n,’single’); x = grand(n, ‘single’); geval(x); % CPU timing tic for r = 1:reps x = rand(n,’single’); end cpu_time = toc; % GPU timing gsync, tic for r = 1:reps x = grand(n,’single’); geval(x); end gsync, gpu_time = toc With Jacket 1.7, this entire code chunk is now replaced by two lines: cpu_time = timeit(@()  rand(n,’single’)); gpu_time = timeit(@() grand(n,’single’));

Improved Fat/Water Reconstruction Algorithm with Jacket

ScottBenchmarks, Case Studies, CUDA 1 Comment

Case Western Reserve University researchers turned to GPUs running Jacket to develop a fast and robust Iterative Decomposition of water and fat with an Echo Asymmetry and Least-squares (IDEAL) reconstruction algorithm. The complete article can be found here. The authors report that “GPU usage is critical for the future of high resolution, small animal and human imaging” and Jacket “enables GPU computations in MATLAB.” Their research was performed on a desktop system with 32GB RAM, dual Intel Xeon X5450 3.0 GHz processors, an NVIDIA Quadro FX5800 (4GB RAM, 240 cores, 400 MHz clock), and MATLAB R2009a 64bit.  Jacket v1.1, an older version, was used to produce these results. Reconstruction tests with different sized images were performed to evaluate computation times …

Hybrid GPU & Multicore Processing for LU Decomposition

ScottBenchmarks, Case Studies, CUDA Leave a Comment

One of the hot areas in supercomputing is hybrid compute: balancing the computational load between one or more CPUs and GPUs. Along these lines Nolan Davis and Daniel Redig at SAIC recently presented work on Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems where they developed a novel algorithm for LU decomposition, one of the most important routines in linear algebra. Here’s a snapshot view of their setup: System Specs: GPU Nvidia® Tesla™ 2050 448 processing cores3 GB dedicated memory Multicore Host 24 cores64 GB system memory Red Hat® Enterprise Linux 5 Two AMD Opteron™ 6172 12-core processors Host-to-GPU Communications PCIE 2.0 16 channels at 500 MB/sec/laneTheoretical peak bandwidth of 8 GB/sec   Their initial results are very promising. For …

Unraveling Speedups: Two Important Questions

John MelonakosBenchmarks, CUDA 1 Comment

One Jacket programmer recently emailed the following to us: Our chief scientists asked me a question that I’d like to pass on to you.  I think I know the answer, but you guys can be much more definitive than I can. He recently read about people achieving ~10x speedups by converting parts of their code to MEX files.  He was wondering how much of the observed speedup is due to that MEX and how much is due to CUDA and the GPU. Two Questions You Should Ask Yourself When contemplating an effort to optimize a piece of code, it is important to unravel the effort into two separate questions.  Both need to be addressed to improve performance: How well-written is …

Stanford GPU Benchmarks: Jacket vs PCT/GPU

John MelonakosBenchmarks, Case Studies, CUDA Leave a Comment

Researchers in the Pervasive Parallelism Laboratory at Stanford University recently published work describing a novel framework for parallel computing with a paper entitled, “A Domain-Specific Approach to Heterogeneous Parallelism.”  As part of their research, they compared Jacket to the GPU support in the Parallel Computing Toolbox™.  The results clearly show that Jacket’s optimizations make a big difference in performance. In this blog post, we highlight 4 algorithms included in their research: NAME DESCRIPTION INPUT Gaussian Discriminant Analysis (GDA) Generative learning algorithm for modeling the probability distribution of a set of data as a multivariate Gaussian 1,200×1,024 Matrix Restricted Boltzmann Machine (RBM) Stochastic recurrent neural network, without connections between hidden units 2,000 Hidden Units 2,000 Dimensions Support Vector Machine (SVM) Optimal …

LIBJACKET on Amazon EC2 GPU Cloud Instances

Pavan YalamanchiliBenchmarks, CUDA 1 Comment

Amazon recently added GPUs to their Elastic Compute Cloud. We decided to throw LIBJACKET into this GPU cloud to see how it would fare. The $2/hr pay-on-demand pricing is a great option for many Jacket programmers. This post is full of screenshots detailing the steps we took to get going with GPU computing in Amazon’s cloud: Sign up with Amazon EC2 Launch a GPU instance Login to the instance using ssh Setup the environment Download, build, and test LIBJACKET! Everything in this post applies equally well to running Jacket for MATLAB® on EC2. Simply install MATLAB + Jacket in your Amazon GPU instance and start working over ssh.

GPU accelerated lattice Boltzmann model for shallow water flow and mass transport

John MelonakosBenchmarks, Case Studies, CUDA 3 Comments

Dr. Kevin Tubbs and Professor Tsai at Louisiana State University recently published an interesting paper using GPUs and Jacket to accelerate lattice Boltzmann models for shallow water flow and mass transport.  More details about this work are provided in the full success story page on the website. Jacket makes GPU programming easy.  “Very little recoding was needed to promote the LBM code to run on the GPU,” say the authors at one point in their paper. In this blog post, we share the highlights of this work.  Using these methods, the authors are able to simulate shallow water flow and mass transport.  For instance, checkout these videos of a dam break: The authors completed this work with a relatively older …