Computer Vision Demos at SC’10 with 8-GPU Colfax CXT8000

Gallagher PryorCase Studies, Events 2 Comments

We just returned from SC’10, the biggest supercomputing show of the year.  At the show, we demoed Jacket driving computer vision demos on an 8-GPU Colfax CXT8000 system… pure eye candy! We had CPU and GPU versions of the demos running on 8 different monitors, each attached to the 8 Tesla C2050 GPUs in the system.  Input data for the various demos was sourced from 3 webcams and 2 Blu-ray video inputs. Checkout the demo details, below: Demo 1 Sobel edge detection with image dilation and interpolation overlaid on Blu-ray video in realtime. Demo 2 Feature detection on a 4-level pyramid of 640×480 realtime webcam video. Demo 3 Gradient descent feature tracking , a stripped down version of KLT, tracking …

Speeding up critical code

ArrayFireCUDA Leave a Comment

With Jacket 1.5, we released a big new feature:  GCOMPILE. This allows you to convert critical sections of your MATLAB code directly into GPU kernels to further increase speed.  In an earlier post we introduced the prototype and have been working with several beta users over the past month to get it ready.  In this post, we’ll give some more details and start to look at the speedups you can quickly and easily achieve.  You can find more information about it on the wiki. Some of the best features of GCOMPILE are the ability to use IF statements, WHILE loops, and FOR loops in your code now.  Make sure to check out the wiki pages about these and the other …

Jacket for MATLAB on HP Z Workstation series

John MelonakosBenchmarks Leave a Comment

AccelerEyes has had access to a couple of Z Workstation series from Hewlett Packard for Jacket testing. Our goal is to make GPU computing more economical and accessible to technical computing users by working with leading computer OEMs. According to most analysts that follow the workstation market, HP is a leader for workstations. Support for HP’s Z Workstations enables users, who previously didn’t have the budgets and programming knowledge, to tap the power of GPU computing to solve their growing scientific and engineering problems. AccelerEyes has certified and completed performance testing on both the entry level HP Z200 Workstation and the high performance HP Z800 Workstation. The results from these tests can be reviewed at http://www.accelereyes.com/partners/hp. The HP Z Series …

Tesla C2050 versus C1060 on Real MATLAB Applications

John MelonakosBenchmarks 7 Comments

Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060. Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers. All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every …

Jacket for MATLAB now available for NVIDIA Fermi!

ArrayFireAnnouncements 2 Comments

We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights for Jacket 1.4 are as follows: Added support for the NVIDIA Fermi architecture (GTX400 and Tesla C2000 series) Jacket DLA support for Fermi Dramatically improved the performance of Jacket’s JIT (Just-In-Time) compilation technology Operations involving random scalar constants do not incur a recompile Removed dependencies on MINGW and NVCC Logical indexing now supported for SUBSREF and SUBSASGN, e.g. B = A(A > x) MTIMES supports …

SGEMM, MTIMES & CUBLAS performance on the GPU

ArrayFireBenchmarks, CUDA 5 Comments

AccelerEyes is focused on not only providing the most easy to use GPU programming platform for CUDA capable GPUs by leveraging the MATLAB® language, our engineering organization is always looking for ways to improve the performance of all areas in the Jacket platform. A case in point is some recent work with matrix multiplication, specifically (Single General Matrix Multiply) SGEMM, or MTIMES.  The Jacket 1.3 release was based on CUBLAS for matrix multiplication and given the importance of matrix multiplication to so many of our customers, we decided to find out if we could improve performance of the function. Update: The new MTIMES routine in Jacket 1.4 has improved sigificantly since these benchmarks of the Release Candidate were taken. Have …

Power Flow with Jacket & MATLAB on the GPU!

John MelonakosCase Studies, CUDA Leave a Comment

Learn how Jacket, GPUs, and MATLAB can deliver magnitudes of performance improvement over CPU-based solutions for Power flow studies. AccelerEyes, in collaboration with the Indian Institute of Technology in Roorkee, has developed this case study to illustrate the ability to study power flow models on graphics processing units using Jacket and MATLAB. Implementation on the GPU is 35 times faster than CPU alternatives. http://www.accelereyes.com/resources/powerflow

Crushing MATLAB Loop Runtimes with BSXFUN

Gallagher PryorBenchmarks 1 Comment

One of the slowest blocks of code that inflate runtimes in MATLAB are for/while loops. In this blog post, I’m going to talk about a little known way of crushing MATLAB loop runtimes for many commonplace use cases by utilizing one of the most amazingly underrated and unknown functions in MATLAB’s repertoire: bsxfun. Using this function, one can break seemingly iterative code into clean, vectorized, snippets that beat the socks off even MATLAB’s JIT engine. Better still, Jacket fully supports bsxfun meaning that if you thought a vectorized loop was fast, you haven’t seen anything, yet. Also, in the end, a loop represented using bsxfun is just good programming practice. As we’ll see, the technique I’m going to describe is …

Jacket with MATLAB for Optics and DSP

John MelonakosCase Studies Leave a Comment

Over the last month I have heard many Jacket customers talk about their use of the Jacket platform for MATLAB to solve optics problems.   NASA and the University of Rochester are two that come to mind immediately.  We found some work that has been done recently to show an example of how Jacket can be used to solve an Optical Flow problem using the Horn and Schunk method and thought it might be useful to share. In addition, last week Seth Benton, a blogger for dspreleated.com shares his experience in working with Jacket.  After about a week of getting up to speed and running some examples his experience is worth sharing if you have not already seen it.

How long does it take to get 98X performance improvement with GPUs?

John MelonakosCase Studies 2 Comments

Well, here is a recent story with one of our customers that accomplished 98X performance speed up with Jacket in 16 days.  Of the 16 days, 15 days were spent sending emails back and forth about the problem and less than a day was spent getting the customer code in Jacket and running some performance tests!  Who would have imagined GPU programming with performance in 1 day.  Happy Reading. Day 1: Customer uses inverse radon (or iradon in MATLAB terms) extensively for their back projection algorithms.  They would like to know when the iradon function will be available/supported in Jacket. AccelerEyes product management informs the customer that the inverse radon algorithm used in MATLAB is based on the filtered back …