In January, we introduced you to Torben’s Corner – a resource wiki created and maintained by Jacket programming guru, Torben Larsen at Aalborg University in Denmark. Many Jacket programmers have gained valuable insights from Torben’s Corner, including GPU performance charts, coding guidelines, special tricks. Since January, many wonderful additions have been added to Torben’s Corner. We think you will find value in not only this new information but the entire resource. Here is a quick summary of the most recent additions with links to the information: Benchmarking Update Torben’s Corner maintains a long list of benchmarks specifically detailing speedups of Jacket relative to standard MATLAB. This became an enormous task due to the sheer number of functions supported by Jacket …
GPU Giddy – Excitement Building for GTC
GTC is coming up… The GPU Technology Conference (GTC) starts later this month and is sure to generate a new level of excitement and energy around GPU computing. The conference includes over 250 technology sessions presented by industry, government, and academic technology leaders. AccelerEyes is pleased to be well represented at this year’s conference by our technical leadership and a number of our customers. If you plan to attend the conference be sure to include the sessions outlined below on your agenda. In addition to being well represented, we are also flattered to see that others in the market have recognized that GPU Computing with MATLAB delivers clear productivity gains and that the performance improvements made possible by GPUs is …
Tesla C2050 versus C1060 on Real MATLAB Applications
Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060. Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers. All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every …
Jacket for MATLAB now available for NVIDIA Fermi!
We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights for Jacket 1.4 are as follows: Added support for the NVIDIA Fermi architecture (GTX400 and Tesla C2000 series) Jacket DLA support for Fermi Dramatically improved the performance of Jacket’s JIT (Just-In-Time) compilation technology Operations involving random scalar constants do not incur a recompile Removed dependencies on MINGW and NVCC Logical indexing now supported for SUBSREF and SUBSASGN, e.g. B = A(A > x) MTIMES supports …
SGEMM, MTIMES & CUBLAS performance on the GPU
AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically (Single General Matrix Multiply) SGEMM, or MTIMES. The Jacket 1.3 release was based on CUBLAS for matrix multiplication and given the importance of matrix multiplication to so many of our customers, we decided to find out if we could improve performance of the function. Update: The new MTIMES routine in Jacket 1.4 has improved sigificantly since these benchmarks of the Release Candidate were taken. Have …
Jacket accelerating life science and defense applications
With IBM’s decision this week to integrate Tesla technology into it’s high performance computing line, there should be no doubt that GP-GPU computing is more than a fad, organizations solving technical problems are able to do them more productively and efficiently than ever before with GPUs. AccelerEyes’ customers are experiencing this first hand with the Jacket product family as they are able to quickly and easily implement new or existing algorithms for GPUs and accomplish their technical needs much faster with substantial speed improvements. Case in point, this week, AccelerEyes has released two case studies from customers that have used Jacket to transform their applications to GPU Computing with compelling results. System Planning Corporation has implemented two different radar processing …
NVIDIA Fermi with CUDA and OpenCL
In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket. If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post. For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples. Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance. The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak). Before we present the benchmarks, we should comment on …
Power Flow with Jacket & MATLAB on the GPU!
Learn how Jacket, GPUs, and MATLAB can deliver magnitudes of performance improvement over CPU-based solutions for Power flow studies. AccelerEyes, in collaboration with the Indian Institute of Technology in Roorkee, has developed this case study to illustrate the ability to study power flow models on graphics processing units using Jacket and MATLAB. Implementation on the GPU is 35 times faster than CPU alternatives. http://www.accelereyes.com/resources/powerflow
Crushing MATLAB Loop Runtimes with BSXFUN
One of the slowest blocks of code that inflate runtimes in MATLAB are for/while loops. In this blog post, I’m going to talk about a little known way of crushing MATLAB loop runtimes for many commonplace use cases by utilizing one of the most amazingly underrated and unknown functions in MATLAB’s repertoire: bsxfun. Using this function, one can break seemingly iterative code into clean, vectorized, snippets that beat the socks off even MATLAB’s JIT engine. Better still, Jacket fully supports bsxfun meaning that if you thought a vectorized loop was fast, you haven’t seen anything, yet. Also, in the end, a loop represented using bsxfun is just good programming practice. As we’ll see, the technique I’m going to describe is …
Jacket with MATLAB for Optics and DSP
Over the last month I have heard many Jacket customers talk about their use of the Jacket platform for MATLAB to solve optics problems. NASA and the University of Rochester are two that come to mind immediately. We found some work that has been done recently to show an example of how Jacket can be used to solve an Optical Flow problem using the Horn and Schunk method and thought it might be useful to share. In addition, last week Seth Benton, a blogger for dspreleated.com shares his experience in working with Jacket. After about a week of getting up to speed and running some examples his experience is worth sharing if you have not already seen it.