No Free Lunch for GPU Compiler Directives

Discover why relying on compiler directives for GPU performance may lead to disappointment, and learn about a more effective approach that combines ease of use with expertly tuned CUDA code to maximize your coding efforts.

John Melonakos

Apr 11, 2012

2 min read

Last week, Steve Scott at NVIDIA put up a viral post entitled, "No Free Lunch for Intel MIC (or GPUâ€™s)." It was a great read and a big hit in technical computing circles.

The centrepiece of Scott's piece was to say that there are no magic compilers. GPUs don't have them, and neither will MIC. No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs. Rather, it takes a good amount of elbow grease to write high-performance code.

We totally agree. The problem Scott addresses is real. Despite marketing spin to the contrary, developing code for GPUs requires work.

However, we don't agree with Scott's conclusion that compiler directives are a good solution. You can't fight magic compilers with more magic compilers. Directives are simply not a good option for most problems. A Google+ post by Derek Gerstmann sums this up well, saying that the fine tuning process with compiler directives quickly erodes the programmability advantages that were the point in the first place.

Fortunately, ArrayFire offers a better way forward. It is easier and more intuitive to use than directives, and it is full of hand-tuned CUDA code. Not just regular hand-tuned CUDA code; many man-years worth of elbow-greased CUDA code. Download it and you will smile.

There is no free lunch for any technology, including compiler directives. If you have experience with compiler directives for GPUs, let us know what you think.