Upcoming CUDA & OpenCL Training Courses

John MelonakosAnnouncements, CUDA, OpenCL Leave a Comment

We’re pleased to announce upcoming CUDA and OpenCL training courses. Over the past couple of years, we’ve received numerous requests from around the world to be trained by AccelerEyes engineers. We finally got our act together and now have a nice schedule of CUDA and OpenCL training courses for 2013 within the United States: CUDA Feb 25-26, Houston, TX Mar 4-5, Baltimore/Washington D.C. Mar 25-26, Los Angeles, CA Apr 9-10, Seattle, WA Apr 15-16, San Francisco, CA May 6-7, Austin, TX May 27-28, Atlanta, GA Jun 10-11, Baltimore/Washington D.C. Jul 8-9, San Jose, CA Sep 2-3, Boston, MA Sep 23-24, Baltimore/Washington, D.C. Oct 7-8, Houston, TX Oct 21-22, Atlanta, GA Nov 4-5, Baltimore/Washington, D.C. Dec 2-3, New York, NY OpenCL …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

Getting Started with ArrayFire – a 30-minute Jump Start

ArrayFireArrayFire, C/C++, CUDA, OpenCL 1 Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …

Jacket v2.3 Now Available

John MelonakosAnnouncements, CUDA 1 Comment

We are pleased to announce the new release of Jacket v2.3.  This new version of Jacket brings even greater performance improvements through GPU computing for MATLAB® codes.  (Click here to download v2.3) With v2.3, new support has been added for CUDA 5.0.  This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line. This morning we received an email from a Jacket user who said, “V2.3 + CUDA 5 = wow. Just upgraded and re-ran one of the routines that previously took just under 4 minutes – now less than 2 minutes!” This is a must-have release for all Jacket users.  The performance improvements are generally felt across the board.  Existing Jacket …

Top 10 List at GTC 2012

John MelonakosAnnouncements, Events Leave a Comment

It’s going to be hard to sleep tonight.  So much GPU goodness awaits the coming 3 days of the GPU Technology Conference.  Here are my top 10 things to do at GTC 2012: Sessions to Attend #1:  S0287 – Jacket for Multidimensional Scaling in Genomics – This is a great opportunity to learn about accelerating MATLAB® on the GPU.  Come learn why thousands of scientists, engineers, and analysts are using Jacket to do more with less coding hassle. (Day: Tuesday, 05/15; Time: 5:30 pm – 5:55 pm; Location: Room K) #2:  S0415 – An Accelerated Weeks Method for Numerical Laplace Transform Inversion – Learn how the researchers have been able to utilize Jacket in MATLAB® to more efficiently and robustly implement the Weeks method. (Day: Wednesday, 05/16; Time: 9:30 …

Benchmarking the new Kepler (GTX 680)

Pavan YalamanchiliBenchmarks, CUDA 13 Comments

NVIDIA has launched their next generation GPU based on their Kepler Architecture. They followed it up with a rather quick update to their CUDA toolkit. Considering that we have access to 3 generations of their GTX cards (480, 580 and 680), we thought we would show case how the performance has changed over the generations. Matrix multiplication: It can be seen that the GTX 680 breaches the 1 Terraflop mark comfortably for single precision, while the GTX 580 barely scratches it. However the performance seems to peak around 2048 x 2048 and then rallies downward to match the performance of the GTX 580 at larger sizes. The high end Tesla C2070 finishes last for single precision behind the third placed …

No Free Lunch for GPU Compiler Directives

John MelonakosArrayFire, C/C++, CUDA, Fortran 3 Comments

Last week, Steve Scott at NVIDIA put up a viral post entitled, “No Free Lunch for Intel MIC (or GPU’s).”  It was a great read and a big hit in technical computing circles. The centrepiece of Scott’s piece was to say that there are no magic compilers.  GPUs don’t have them, and neither will MIC.  No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs.  Rather, it takes a good amount of elbow grease to write high-performance code. We totally agree.  The problem Scott addresses is real.  Despite marketing spin to the contrary, developing code for GPUs requires work. However, we don’t agree with Scott’s conclusion that compiler directives are a good solution. You can’t fight …

12,288 CUDA Cores in One Computer

John MelonakosAnnouncements, CUDA 3 Comments

Kepler is here.  And it’s fantastic! The news came out today that the first Kepler GPU, the GeForce GTX 680, has been launched.  A single GPU has 1,536 CUDA Cores.  This means that those high-end workstations with 8 PCIe slots will be able to pack 12,288 CUDA cores into a single computer.  That’s some serious computational power. Current high-end Fermi cards have 512 cores, so this new Kepler architecture boasts 3X the number of computation cores. Normally we focus on the higher-end Tesla products because those more aptly fit the needs of our science, engineering, and financial computing readers.  But we are excited nonetheless by this GeForce GPU.  It is a major step forward in GPU technology.  And this GeForce card portends …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …

OpenCL vs CUDA Comparisons

ArrayFireCUDA, Events, OpenCL 4 Comments

In case you missed it, we recently held an ArrayFire Webinar, focused on exploring the tradeoffs of OpenCL vs CUDA. This webinar is part of an ongoing series of webinars held each month to present new GPU software topics as well as programming techniques with Jacket and ArrayFire. For those of you who missed it, we provide a recap here. Lots of questions were fielded by our team, so it’s a must-watch. We hope to see you at the next one! Recap Download the slides.  Here is a transcript of the content portion of the webinar: AccelerEyes is pleased to present today’s ArrayFire webinar looking at OpenCL and CUDA Trade-offs and Comparisons. Everyday, we interact with many programmers in various stages of GPU …