We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights for Jacket 1.4 are as follows: Added support for the NVIDIA Fermi architecture (GTX400 and Tesla C2000 series) Jacket DLA support for Fermi Dramatically improved the performance of Jacket’s JIT (Just-In-Time) compilation technology Operations involving random scalar constants do not incur a recompile Removed dependencies on MINGW and NVCC Logical indexing now supported for SUBSREF and SUBSASGN, e.g. B = A(A > x) MTIMES supports …
SGEMM, MTIMES & CUBLAS performance on the GPU
AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically (Single General Matrix Multiply) SGEMM, or MTIMES. The Jacket 1.3 release was based on CUBLAS for matrix multiplication and given the importance of matrix multiplication to so many of our customers, we decided to find out if we could improve performance of the function. Update: The new MTIMES routine in Jacket 1.4 has improved sigificantly since these benchmarks of the Release Candidate were taken. Have …
Jacket accelerating life science and defense applications
With IBM’s decision this week to integrate Tesla technology into it’s high performance computing line, there should be no doubt that GP-GPU computing is more than a fad, organizations solving technical problems are able to do them more productively and efficiently than ever before with GPUs. AccelerEyes’ customers are experiencing this first hand with the Jacket product family as they are able to quickly and easily implement new or existing algorithms for GPUs and accomplish their technical needs much faster with substantial speed improvements. Case in point, this week, AccelerEyes has released two case studies from customers that have used Jacket to transform their applications to GPU Computing with compelling results. System Planning Corporation has implemented two different radar processing …
NVIDIA Fermi with CUDA and OpenCL
In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket. If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post. For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples. Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance. The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak). Before we present the benchmarks, we should comment on …
Power Flow with Jacket & MATLAB on the GPU!
Learn how Jacket, GPUs, and MATLAB can deliver magnitudes of performance improvement over CPU-based solutions for Power flow studies. AccelerEyes, in collaboration with the Indian Institute of Technology in Roorkee, has developed this case study to illustrate the ability to study power flow models on graphics processing units using Jacket and MATLAB. Implementation on the GPU is 35 times faster than CPU alternatives. http://www.accelereyes.com/resources/powerflow
Crushing MATLAB Loop Runtimes with BSXFUN
One of the slowest blocks of code that inflate runtimes in MATLAB are for/while loops. In this blog post, I’m going to talk about a little known way of crushing MATLAB loop runtimes for many commonplace use cases by utilizing one of the most amazingly underrated and unknown functions in MATLAB’s repertoire: bsxfun. Using this function, one can break seemingly iterative code into clean, vectorized, snippets that beat the socks off even MATLAB’s JIT engine. Better still, Jacket fully supports bsxfun meaning that if you thought a vectorized loop was fast, you haven’t seen anything, yet. Also, in the end, a loop represented using bsxfun is just good programming practice. As we’ll see, the technique I’m going to describe is …
Jacket with MATLAB for Optics and DSP
Over the last month I have heard many Jacket customers talk about their use of the Jacket platform for MATLAB to solve optics problems. NASA and the University of Rochester are two that come to mind immediately. We found some work that has been done recently to show an example of how Jacket can be used to solve an Optical Flow problem using the Horn and Schunk method and thought it might be useful to share. In addition, last week Seth Benton, a blogger for dspreleated.com shares his experience in working with Jacket. After about a week of getting up to speed and running some examples his experience is worth sharing if you have not already seen it.
GPUs in quantitative analytics and finance
I have had a number of exchanges with the head of quantitative tools at the trading desk of one of the largest banks in Spain whose private banking subsidiary is considered one of the best boutique private banks. He is an enthusiast for getting indistinguishably close to the right answer very fast, so enjoys thinking about all sorts of optimization that could be done with his codes. He is confident that the area of greatest potential these days is figuring out how to squeeze out all the flops that come with GPUs. This is why he has shown interest in AccelerEyes and Jacket. Since he joined the bank, they have modernized all the pricing and marketing tools that were hard …
Jacket and GPUs show promise in Neuroscience with fMRI and SPM
For those of you interested in neuroscience and neuroimaging, you have probably heard of a software capability called SPM or Statistical Parametric Mapping developed by a group at University College London. Well, a group at Georgia Tech has been doing some work with Jacket and CUDA on SPM and have produced some initial results that show some promise. Being able to speed up the image analysis of functional MRI can benefit the medical community in a big way. AccelerEyes has been supporting these efforts at Georgia Tech and with the permission of the authors we have produced an initial look at their work. Enjoy. http://www.accelereyes.com/resources/spm-fmri
Median Filtering: CUDA tips and tricks
Last week we posted a video recording from NVIDIA’s GTC09 conference. In the video, I walked through median filtering, presenting the vanilla implementation and then walking through progressive CUDA optimizations. A comment on that post suggested trying some other compiler flags, and it sparked a new series of experiments. In the original video, we started with a vanilla CPU implementation of 3×3 median filtering. We then ported this to the GPU to realize some immediate gains, but then we started a string of optimizations to see how far we could drive up performance: switching to textured memory, switching to shared memory, switching the internal sorting of pixels, etc. The conclusion: pay attention to the resource usage reported by nvcc (registers, …