Powering Mars Research

John MelonakosCase Studies, CUDA Leave a Comment

The Curiosity Mars rover landing reminded us of a recent talk by Brendan Babb of NASA and UAA in Anchorage about Jacket-accelerated Mars research. The talk was given at GTC 2012 in May. The main thrust of this research is improving mars rover image compression via GPUs and genetic algorithms. With Jacket and GPUs, the researchers were able to achieve 5X speedups on the larger data sizes. The algorithm works by pairing neighboring pixels with a random one and then adjusting the random pixel based on whether it incrementally improves the original image. Babb described the algorithm as an “embarrassingly” parallel process, ideally suited to GPU acceleration. He estimates he has been able to achieve a 20 to 30 percent error …

Parallelized Gene Predictors with Jacket

John MelonakosCase Studies, CUDA Leave a Comment

Researchers at the University of Quebec have developed high-performance gene predictors using Jacket to accelerated their MATLAB® code.  This work has been published in BMC Research Notes and is freely available here. Computerized approaches to studying the human genome are challenged by the exploding amount of data, which doubles roughly every 6 months.  In order to deal with this burgeoning datasets, demands for faster processing power continue to arise. This work focuses on predicting genes using frequency analysis with FFTs and with an equivalent technique known as Goertzel’s algorithm.  In these applications, the emphasis of this paper is to propose tools to geneticists and molecular biologists for the prediction or identification of new genes using existing complementary strategies. The criteria for these …

Benchmarking the new Kepler (GTX 680)

Pavan YalamanchiliBenchmarks, CUDA 13 Comments

NVIDIA has launched their next generation GPU based on their Kepler Architecture. They followed it up with a rather quick update to their CUDA toolkit. Considering that we have access to 3 generations of their GTX cards (480, 580 and 680), we thought we would show case how the performance has changed over the generations. Matrix multiplication: It can be seen that the GTX 680 breaches the 1 Terraflop mark comfortably for single precision, while the GTX 580 barely scratches it. However the performance seems to peak around 2048 x 2048 and then rallies downward to match the performance of the GTX 580 at larger sizes. The high end Tesla C2070 finishes last for single precision behind the third placed …

ArrayFire for Defense and Intelligence Applications

ArrayFireC/C++, Case Studies, CUDA, Events, Fortran Leave a Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library and its applications to Defense and Intelligence functions. Defense projects often have hard deadlines and definite speed targets, and ArrayFire is a fast and easy-to-use choice for these applications. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of Jacket and ArrayFire, while interacting with AccelerEyes GPU computing experts.  John Melonakos, our CEO, introduced ArrayFire and talked about some exciting recent customer successes in the field of defense. He then ran through the mechanics of compiling and running code on a machine with 2 Quadro 6000 GPUs, and talked about customer success stories. …

No Free Lunch for GPU Compiler Directives

John MelonakosArrayFire, C/C++, CUDA, Fortran 3 Comments

Last week, Steve Scott at NVIDIA put up a viral post entitled, “No Free Lunch for Intel MIC (or GPU’s).”  It was a great read and a big hit in technical computing circles. The centrepiece of Scott’s piece was to say that there are no magic compilers.  GPUs don’t have them, and neither will MIC.  No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs.  Rather, it takes a good amount of elbow grease to write high-performance code. We totally agree.  The problem Scott addresses is real.  Despite marketing spin to the contrary, developing code for GPUs requires work. However, we don’t agree with Scott’s conclusion that compiler directives are a good solution. You can’t fight …

Jacket v2.1 Now Available

ScottAnnouncements, CUDA 2 Comments

Optimization Library, Sparse Functionality, Graphics Library Improvements, CUDA 4.1 Enhancements, and much more… AccelerEyes announces the release of Jacket v2.1, adding GPU computing capabilities for use with MATLAB®. Jacket v2.1 delivers even more speed through a host of new improvements, maximizing GPU device performance and utilization.. Notable new features include an Optimization Library and additional functions to our Graphics Library. With Jacket v2.1, we have also extended support for sparse matrix subscripting and made improvements to host-to-device and device-to-host data transfer speeds for complex data. In addition, we have included various GFOR enhancements. Jacket v2.1 now includes NVIDIA CUDA 4.1 enhancements to provide improved functionality and performance (requires latest drivers). Jacket is the premier GPU software plugin for MATLAB®, better than alternative …

12,288 CUDA Cores in One Computer

John MelonakosAnnouncements, CUDA 3 Comments

Kepler is here.  And it’s fantastic! The news came out today that the first Kepler GPU, the GeForce GTX 680, has been launched.  A single GPU has 1,536 CUDA Cores.  This means that those high-end workstations with 8 PCIe slots will be able to pack 12,288 CUDA cores into a single computer.  That’s some serious computational power. Current high-end Fermi cards have 512 cores, so this new Kepler architecture boasts 3X the number of computation cores. Normally we focus on the higher-end Tesla products because those more aptly fit the needs of our science, engineering, and financial computing readers.  But we are excited nonetheless by this GeForce GPU.  It is a major step forward in GPU technology.  And this GeForce card portends …

ArrayFire Pro : Features and Scalability

ArrayFireArrayFire, C/C++, CUDA, Fortran Leave a Comment

ArrayFire is a fast GPU library that off-loads compute intensive tasks onto many-core GPUs, thereby reducing application runtime and accelerating it many times. ArrayFire is built on top of NVIDIA CUDA software stack which is currently the best and most stable GPU Software Development Kit available for GPU-based computing. ArrayFire comes with a huge set of functions that span across various domains like image processing, signal processing, financial modeling, applications requiring graphics support. ArrayFire has an array based notation (supports N-dimensional arrays) and allows sub-referencing and assignment into these multi-dimensional arrays. The following code snippet shows how you can index into array objects. // Generate a 3×3 array of random numbers on the GPU array A = randu(3,3); array a1 …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …