AccelerEyes is Hiring at GTC 2012

John MelonakosAnnouncements, Events Leave a Comment

Do you want to code GPUs daily?  Do you want to build software that actually gets used by real people, solving real problems?  Do you want to join the whirlwind of a startup where you own projects and determine success or failure? Then come work at AccelerEyes.  AccelerEyes is hiring for 3 positions:  Inside Salespersons, Fulltime Engineers, and Remote Contract Developers. Checkout our Careers page or swing by our booth at GTC for more info.

Top 10 List at GTC 2012

John MelonakosAnnouncements, Events Leave a Comment

It’s going to be hard to sleep tonight.  So much GPU goodness awaits the coming 3 days of the GPU Technology Conference.  Here are my top 10 things to do at GTC 2012: Sessions to Attend #1:  S0287 – Jacket for Multidimensional Scaling in Genomics – This is a great opportunity to learn about accelerating MATLAB® on the GPU.  Come learn why thousands of scientists, engineers, and analysts are using Jacket to do more with less coding hassle. (Day: Tuesday, 05/15; Time: 5:30 pm – 5:55 pm; Location: Room K) #2:  S0415 – An Accelerated Weeks Method for Numerical Laplace Transform Inversion – Learn how the researchers have been able to utilize Jacket in MATLAB® to more efficiently and robustly implement the Weeks method. (Day: Wednesday, 05/16; Time: 9:30 …

No Free Lunch for GPU Compiler Directives

John MelonakosArrayFire, C/C++, CUDA, Fortran 3 Comments

Last week, Steve Scott at NVIDIA put up a viral post entitled, “No Free Lunch for Intel MIC (or GPU’s).”  It was a great read and a big hit in technical computing circles. The centrepiece of Scott’s piece was to say that there are no magic compilers.  GPUs don’t have them, and neither will MIC.  No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs.  Rather, it takes a good amount of elbow grease to write high-performance code. We totally agree.  The problem Scott addresses is real.  Despite marketing spin to the contrary, developing code for GPUs requires work. However, we don’t agree with Scott’s conclusion that compiler directives are a good solution. You can’t fight …

12,288 CUDA Cores in One Computer

John MelonakosAnnouncements, CUDA 3 Comments

Kepler is here.  And it’s fantastic! The news came out today that the first Kepler GPU, the GeForce GTX 680, has been launched.  A single GPU has 1,536 CUDA Cores.  This means that those high-end workstations with 8 PCIe slots will be able to pack 12,288 CUDA cores into a single computer.  That’s some serious computational power. Current high-end Fermi cards have 512 cores, so this new Kepler architecture boasts 3X the number of computation cores. Normally we focus on the higher-end Tesla products because those more aptly fit the needs of our science, engineering, and financial computing readers.  But we are excited nonetheless by this GeForce GPU.  It is a major step forward in GPU technology.  And this GeForce card portends …

GPU Computing with Jacket in Automated Trader

John MelonakosBenchmarks, Case Studies Leave a Comment

The Q1 2012 issue of Automated Trader contains an excellent “Mashup!” piece reviewing software for algorithmic trading.  The article provides a wonderful glimpse into the 1-2 month adventure of Andy Webb, Automated Trader’s Founder, and Wrecking Crew building a fast trading platform from several technologies.  We heartily recommend that those of you in financial computing go subscribe to get the full story and access to ongoing developments from these Automated Trader thought leaders! The full trading platform they built was quite extensive.  The part that caught our eye was the core computational component of the pipeline.  That component involved permuting 1,000 potential pairs with cointegration tests for 350 time windows on each potential pair. The single core MATLAB® version took 70 minutes …

Jacket Continues to Crush the Clone

John MelonakosArrayFire 6 Comments

This morning, I woke up to find the following comment in the MATLAB® Newsgroup: Over two years ago, MathWorks® started to build a clone of Jacket, which you now know as the GPU computing support in the Parallel Computing Toolbox (TM).  At the time, there were many naysayers suggesting that Jacket would somehow be eclipsed by the clone.  Made sense, right? Wrong!  Here we are 2 years later and the clone is still a poor imitation. There are several technical reasons for this, but if you are serious about getting great performance from your GPU, Jacket is the better option.  Look at all the real customers that are getting big benefit. Here are some other recent benchmarks from the Walking …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …

ArrayFire Support for CUDA 4.1

John MelonakosAnnouncements, ArrayFire, C/C++, CUDA, Fortran Leave a Comment

The question above comes from María (@turbonegra).  She follows us @accelereyes.  Many of you are wondering when ArrayFire support for new CUDA version 4.1 will be released.  The answer: work is currently under way. CUDA 4.1 includes a new Fermi compiler, and many people in the GPU ecosystem have reported slowdowns from upgrading to the new CUDA version. So we’ve delayed releasing ArrayFire and Jacket support for CUDA 4.1 because we want to verify performance and reliability across all our unit tests, performance regressions, and customer code samples.  Our tests sweep across various driver versions and everything from mobile GeForce cards through server-grade Tesla and Fermi chips. We are still working through the testing and verification at the moment. While …

Discrete GPUs are here to stay

John MelonakosCUDA 2 Comments

Ever since AccelerEyes began over 4 years ago, naysayers have flippantly tossed out the idea that somehow computing on discrete GPUs will soon go away. Some thought AMD’s Fusion would become the demise of discrete GPU computing. Others thought that Intel’s integrated graphics would squeeze high-end GPUs out of the market. Neither is anywhere close to disrupting the utility of discrete GPUs (especially those currently available from NVIDIA) for solving computational challenges that face domain professionals. Today, Jon Peddie Research introduced a free whitepaper describing the market forces and the sales projections of GPUs.  From the article: “The facts speak for themselves. Those who are concerned about graphics performance will buy discrete GPU systems. As good as they are, embedded …

Jacket Demo – CPU vs GPU runtimes on MATLAB® code

John MelonakosBenchmarks, CUDA 1 Comment

To explore the differences between CPU-only computing and GPU-accelerated computing, the new Jacket Demo is really convenient.  The Jacket Demo automatically launches two MATLAB® sessions, one running on the CPU-only and the other running on the GPU with Jacket. This side-by-side demo shows the computational speed of each processor as well as a visual depiction of the algorithm’s progression.  A variety of different demos are provided. The Jacket Demo is included in every Jacket installation (found in the examples directory and launchable from the Start Menu in Windows). Checkout this video of the Jacket Demo in action on an i7 CPU with a Tesla C2050 GPU.  Enjoy!