Last week, Steve Scott at NVIDIA put up a viral post entitled, “No Free Lunch for Intel MIC (or GPU’s).” It was a great read and a big hit in technical computing circles.
The centrepiece of Scott’s piece was to say that there are no magic compilers. GPUs don’t have them, and neither will MIC. No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs. Rather, it takes a good amount of elbow grease to write high-performance code.
We totally agree. The problem Scott addresses is real. Despite marketing spin to the contrary, developing code for GPUs requires work.
However, we don’t agree with Scott’s conclusion that compiler directives are a good solution. You can’t fight magic compilers with more magic compilers. Directives are simply not a good option for most problems. A Google+ post by Derek Gerstmann sums this up well, saying that the fine tuning process with compiler directives quickly erodes the programmability advantages that were the point in the first place.
Fortunately, ArrayFire offers a better way forward. It is easier and more intuitive to use than directives, and it is full of hand-tuned CUDA code. Not just regular hand-tuned CUDA code; many man-years worth of elbow-greased CUDA code. Download it and you will smile.
There is no free lunch for any technology, including compiler directives. If you have experience with compiler directives for GPUs, let us know what you think.
agreed. am a GPU code optimize-by-hand’er myself so i can really see the extent of the above claimed. Even though, to fortify such conclusions we need extensive benchmarking of ArrayFire vs OpenACC, as simple as that.
Thanks for the comment and good points. We are working on benchmarks right now, but OpenACC is so new that it is not really broadly available yet.
But even in the absence of rigorous benchmarks, those of us that program GPUs professionally know implicitly that compilers are going to be a long time coming (if ever) in performing the level of optimizations that we had to do to achieve fast code.
In your future benchmarks can you please include LibraSDK (
http://www.gpusystems.com/libra.aspx )as well as MAGMA (