Crushing MATLAB Loop Runtimes with BSXFUN

Gallagher Pryor Benchmarks 1 Comment

One of the slowest blocks of code that inflate runtimes in MATLAB are for/while loops. In this blog post, I’m going to talk about a little known way of crushing MATLAB loop runtimes for many commonplace use cases by utilizing one of the most amazingly underrated and unknown functions in MATLAB’s repertoire: bsxfun. Using this function, one can break seemingly iterative code into clean, vectorized, snippets that beat the socks off even MATLAB’s JIT engine. Better still, Jacket fully supports bsxfun meaning that if you thought a vectorized loop was fast, you haven’t seen anything, yet. Also, in the end, a loop represented using bsxfun is just good programming practice. As we’ll see, the technique I’m going to describe is …

Jacket with MATLAB for Optics and DSP

John Melonakos Case Studies Leave a Comment

Over the last month I have heard many Jacket customers talk about their use of the Jacket platform for MATLAB to solve optics problems.   NASA and the University of Rochester are two that come to mind immediately.  We found some work that has been done recently to show an example of how Jacket can be used to solve an Optical Flow problem using the Horn and Schunk method and thought it might be useful to share. In addition, last week Seth Benton, a blogger for shares his experience in working with Jacket.  After about a week of getting up to speed and running some examples his experience is worth sharing if you have not already seen it.

GPUs in quantitative analytics and finance

John Melonakos Case Studies Leave a Comment

I have had a number of exchanges with the head of quantitative tools at the trading desk of one of the largest banks in Spain whose private banking subsidiary is considered one of the best boutique private banks.  He is an enthusiast for getting indistinguishably close to the right answer very fast, so enjoys thinking about all sorts of optimization that could be done with his codes. He is confident that the area of greatest potential these days is figuring out how to squeeze out all the flops that come with GPUs. This is why he has shown interest in AccelerEyes and Jacket. Since he joined the bank, they have modernized all the pricing and marketing tools that were hard …

Jacket and GPUs show promise in Neuroscience with fMRI and SPM

John Melonakos Case Studies, CUDA Leave a Comment

For those of you interested in neuroscience and neuroimaging, you have probably heard of a software capability called SPM or Statistical Parametric Mapping developed by a group at University College London.  Well, a group at Georgia Tech has been doing some work with Jacket and CUDA on SPM and have produced some initial results that show some promise.  Being able to speed up the image analysis of functional MRI can benefit the medical community in a big way.  AccelerEyes has been supporting these efforts at Georgia Tech and with the permission of the authors we have produced an initial look at their work.  Enjoy.

Median Filtering: CUDA tips and tricks

ArrayFire CUDA, Events 4 Comments

Last week we posted a video recording from NVIDIA’s GTC09 conference. In the video, I walked through median filtering, presenting the vanilla implementation and then walking through progressive CUDA optimizations. A comment on that post suggested trying some other compiler flags, and it sparked a new series of experiments. In the original video, we started with a vanilla CPU implementation of 3×3 median filtering. We then ported this to the GPU to realize some immediate gains, but then we started a string of optimizations to see how far we could drive up performance: switching to textured memory, switching to shared memory, switching the internal sorting of pixels, etc. The conclusion: pay attention to the resource usage reported by nvcc (registers, …

Accelerate Computer Vision Data Access Patterns with Jacket & GPUs

Gallagher Pryor ArrayFire Leave a Comment

For computer vision, we’ve found that efficient implementations require a new data access pattern that MATLAB does not currently support.  MATLAB and the M language is great for linear algebra where blocks of matrices are the typical access pattern, but not for Computer Vision where algorithms typically operate on patches of imagery. For instance, to pull out patches of imagery in M, one must do a double nested for loop, A = rand(100,100) for xs = -W:W for ys = -W:W patch(xs+W+1, ys+W+1) = A(xs+1+x, ys+1+y); end end …with guards for boundary conditions, etc. It gets even more complicated with non-square patches. On top of that, these implementations don’t translate to the GPUs memory hierarchy at all and are thus …

A case study in CUDA optimization

ArrayFire CUDA 4 Comments

Jimi Malcolm, VP of Engineering and Co-founder of AccelerEyes takes about 15 minutes to share CUDA optimization strategies to maximize performance of CUDA code.  Watch the video below to find out what needs to go into strategizing CUDA development to maximize performance.  Jimi uses Median Filtering for this case study. Get the Flash Player to see this player.

Using Parallel For Loops (parfor) with MATLAB® and Jacket

ArrayFire Benchmarks 3 Comments

MATLAB® parallel for loops (parfor) allow the body of a for loop to be executed across multiple workers simultaneously, but with some pretty large restrictions.  With Jacket MGL, Jacket can be used within parfor loops, with the same restrictions.  However, it is important to note that Jacket MGL does not currently support co-distributed arrays. Problem Size Problem size might be the single most important consideration in parallelization using the Parallel Computing Toolbox (PCT) and Jacket MGL.  When data is used by a worker in the MATLAB pool it must be copied from MATLAB to the worker, and must be copied back when the computation is complete.  Additionally, when GPU data is used, it must then be copied by the worker …

How long does it take to get 98X performance improvement with GPUs?

John Melonakos Case Studies 2 Comments

Well, here is a recent story with one of our customers that accomplished 98X performance speed up with Jacket in 16 days.  Of the 16 days, 15 days were spent sending emails back and forth about the problem and less than a day was spent getting the customer code in Jacket and running some performance tests!  Who would have imagined GPU programming with performance in 1 day.  Happy Reading. Day 1: Customer uses inverse radon (or iradon in MATLAB terms) extensively for their back projection algorithms.  They would like to know when the iradon function will be available/supported in Jacket. AccelerEyes product management informs the customer that the inverse radon algorithm used in MATLAB is based on the filtered back …