We just returned from SC’10, the biggest supercomputing show of the year. At the show, we demoed Jacket driving computer vision demos on an 8-GPU Colfax CXT8000 system… pure eye candy! We had CPU and GPU versions of the demos running on 8 different monitors, each attached to the 8 Tesla C2050 GPUs in the system. Input data for the various demos was sourced from 3 webcams and 2 Blu-ray video inputs. Checkout the demo details, below: Demo 1 Sobel edge detection with image dilation and interpolation overlaid on Blu-ray video in realtime. Demo 2 Feature detection on a 4-level pyramid of 640×480 realtime webcam video. Demo 3 Gradient descent feature tracking , a stripped down version of KLT, tracking …
Beam Propagation Methods – Jacket is 3.5X faster than the CPU and 2X faster than PCT
A couple weeks ago, a GPU-enabled code appeared on MATLAB Central entitled, “A CUDA accelerated Beam Propagation Method [BPM] Solver using the Parallel Computing Toolbox.” In this post, we share a video which showcases how Jacket is much better than PCT at GPU computing, by analyzing performance on this Beam Propagation Method code. To reproduce these results, download the source code here: CUDA_BPM_NOV_04_2010_AccelerEyes These benchmarks were run on an NVIDIA Tesla C2070 GPU versus a quad-core Intel CPU. MATLAB + PCT R2010B were used for the PCT-GPU experiments. MATLAB + Jacket 1.6 (prerelease) were used for the Jacket-GPU experiments. Take Home Message Due to Jacket’s extensive library of GPU functions and its optimized GPU runtime, it performs 3.5X faster than …
Speeding up critical code
With Jacket 1.5, we released a big new feature: GCOMPILE. This allows you to convert critical sections of your MATLAB code directly into GPU kernels to further increase speed. In an earlier post we introduced the prototype and have been working with several beta users over the past month to get it ready. In this post, we’ll give some more details and start to look at the speedups you can quickly and easily achieve. You can find more information about it on the wiki. Some of the best features of GCOMPILE are the ability to use IF statements, WHILE loops, and FOR loops in your code now. Make sure to check out the wiki pages about these and the other …
A Jacket built for Speed
Just a few months ago, Jacket 1.4 was released sporting an improved MTIMES routine that brought about radical improvements to Jacket’s matrix multiplication. The quest for performance never ends though. Now, in the release of Jacket 1.5, MTIMES is even faster than before for SGEMM routines. Checkout the MTIMES Benchmarks wiki for more information. I you are attending GTC, you may want to attend this session also!
Jacket for MATLAB on HP Z Workstation series
AccelerEyes has had access to a couple of Z Workstation series from Hewlett Packard for Jacket testing. Our goal is to make GPU computing more economical and accessible to technical computing users by working with leading computer OEMs. According to most analysts that follow the workstation market, HP is a leader for workstations. Support for HP’s Z Workstations enables users, who previously didn’t have the budgets and programming knowledge, to tap the power of GPU computing to solve their growing scientific and engineering problems. AccelerEyes has certified and completed performance testing on both the entry level HP Z200 Workstation and the high performance HP Z800 Workstation. The results from these tests can be reviewed at http://www.accelereyes.com/partners/hp. The HP Z Series …
Torben’s Corner – A GPU Computing Gem for Jacket Programmers!
In January, we introduced you to Torben’s Corner – a resource wiki created and maintained by Jacket programming guru, Torben Larsen at Aalborg University in Denmark. Many Jacket programmers have gained valuable insights from Torben’s Corner, including GPU performance charts, coding guidelines, special tricks. Since January, many wonderful additions have been added to Torben’s Corner. We think you will find value in not only this new information but the entire resource. Here is a quick summary of the most recent additions with links to the information: Benchmarking Update Torben’s Corner maintains a long list of benchmarks specifically detailing speedups of Jacket relative to standard MATLAB. This became an enormous task due to the sheer number of functions supported by Jacket …
GPU Giddy – Excitement Building for GTC
GTC is coming up… The GPU Technology Conference (GTC) starts later this month and is sure to generate a new level of excitement and energy around GPU computing. The conference includes over 250 technology sessions presented by industry, government, and academic technology leaders. AccelerEyes is pleased to be well represented at this year’s conference by our technical leadership and a number of our customers. If you plan to attend the conference be sure to include the sessions outlined below on your agenda. In addition to being well represented, we are also flattered to see that others in the market have recognized that GPU Computing with MATLAB delivers clear productivity gains and that the performance improvements made possible by GPUs is …
Tesla C2050 versus C1060 on Real MATLAB Applications
Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060. Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers. All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every …
Jacket for MATLAB now available for NVIDIA Fermi!
We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights for Jacket 1.4 are as follows: Added support for the NVIDIA Fermi architecture (GTX400 and Tesla C2000 series) Jacket DLA support for Fermi Dramatically improved the performance of Jacket’s JIT (Just-In-Time) compilation technology Operations involving random scalar constants do not incur a recompile Removed dependencies on MINGW and NVCC Logical indexing now supported for SUBSREF and SUBSASGN, e.g. B = A(A > x) MTIMES supports …
SGEMM, MTIMES & CUBLAS performance on the GPU
AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically (Single General Matrix Multiply) SGEMM, or MTIMES. The Jacket 1.3 release was based on CUBLAS for matrix multiplication and given the importance of matrix multiplication to so many of our customers, we decided to find out if we could improve performance of the function. Update: The new MTIMES routine in Jacket 1.4 has improved sigificantly since these benchmarks of the Release Candidate were taken. Have …