Unraveling Speedups: Two Important Questions

John Benchmarks, CUDA 1 Comment

One Jacket programmer recently emailed the following to us: Our chief scientists asked me a question that I'd like to pass on to you.  I think I know the answer, but you guys can be much more definitive than I can. He recently read about people achieving ~10x speedups by converting parts of their code to MEX files.  He was wondering how much of the observed speedup is due to that MEX and how much is due to CUDA and the GPU. Two Questions You Should Ask Yourself When contemplating an effort to optimize a piece of code, it is important to unravel the effort into two separate questions.  Both need to be addressed to improve performance: How well-written is ...

Beam Propagation Methods - Jacket is 3.5X faster than the CPU and 2X faster than PCT

John Benchmarks, Case Studies, CUDA 2 Comments

A couple weeks ago, a GPU-enabled code appeared on MATLAB Central entitled, "A CUDA accelerated Beam Propagation Method [BPM] Solver using the Parallel Computing Toolbox."  In this post, we share a video which showcases how Jacket is much better than PCT at GPU computing, by analyzing performance on this Beam Propagation Method code. To reproduce these results, download the source code here:  CUDA_BPM_NOV_04_2010_AccelerEyes These benchmarks were run on an NVIDIA Tesla C2070 GPU versus a quad-core Intel CPU.  MATLAB + PCT R2010B were used for the PCT-GPU experiments.  MATLAB + Jacket 1.6 (prerelease) were used for the Jacket-GPU experiments. Take Home Message Due to Jacket's extensive library of GPU functions and its optimized GPU runtime, it performs 3.5X faster than ...

Speeding up critical code

ArrayFire CUDA Leave a Comment

With Jacket 1.5, we released a big new feature:  GCOMPILE. This allows you to convert critical sections of your MATLAB code directly into GPU kernels to further increase speed.  In an earlier post we introduced the prototype and have been working with several beta users over the past month to get it ready.  In this post, we’ll give some more details and start to look at the speedups you can quickly and easily achieve.  You can find more information about it on the wiki. Some of the best features of GCOMPILE are the ability to use IF statements, WHILE loops, and FOR loops in your code now.  Make sure to check out the wiki pages about these and the other ...

Data-parallelism vs Task-parallelism

John CUDA, OpenCL 1 Comment

In order to understand how Jacket works, it is important to understand the difference between data parallelism and task parallelism.  There are many ways to define this, but simply put and in our context: Task parallelism is the simultaneous execution on multiple cores of many different functions across the same or different datasets. Data parallelism (aka SIMD) is the simultaneous execution on multiple cores of the same function across the elements of a dataset. Jacket focuses on exploiting data parallelism or SIMD computations.  The vectorized MATLAB language is especially conducive to good SIMD operations (more so than a non-vectorized language such as C/C++).  And if you're going to need a vectorized notation to achieve SIMD computation, why not choose the ...