No Free Lunch for GPU Compiler Directives

John MelonakosArrayFire, C/C++, CUDA, Fortran 3 Comments

Last week, Steve Scott at NVIDIA put up a viral post entitled, “No Free Lunch for Intel MIC (or GPU’s).”  It was a great read and a big hit in technical computing circles. The centrepiece of Scott’s piece was to say that there are no magic compilers.  GPUs don’t have them, and neither will MIC.  No compiler will be able to automatically recompile existing code and get great performance from MIC or GPUs.  Rather, it takes a good amount of elbow grease to write high-performance code. We totally agree.  The problem Scott addresses is real.  Despite marketing spin to the contrary, developing code for GPUs requires work. However, we don’t agree with Scott’s conclusion that compiler directives are a good solution. You can’t fight …

ArrayFire Pro : Features and Scalability

ArrayFireArrayFire, C/C++, CUDA, Fortran Leave a Comment

ArrayFire is a fast GPU library that off-loads compute intensive tasks onto many-core GPUs, thereby reducing application runtime and accelerating it many times. ArrayFire is built on top of NVIDIA CUDA software stack which is currently the best and most stable GPU Software Development Kit available for GPU-based computing. ArrayFire comes with a huge set of functions that span across various domains like image processing, signal processing, financial modeling, applications requiring graphics support. ArrayFire has an array based notation (supports N-dimensional arrays) and allows sub-referencing and assignment into these multi-dimensional arrays. The following code snippet shows how you can index into array objects. // Generate a 3×3 array of random numbers on the GPU array A = randu(3,3); array a1 …

Jacket Continues to Crush the Clone

John MelonakosArrayFire 6 Comments

This morning, I woke up to find the following comment in the MATLAB® Newsgroup: Over two years ago, MathWorks® started to build a clone of Jacket, which you now know as the GPU computing support in the Parallel Computing Toolbox (TM).  At the time, there were many naysayers suggesting that Jacket would somehow be eclipsed by the clone.  Made sense, right? Wrong!  Here we are 2 years later and the clone is still a poor imitation. There are several technical reasons for this, but if you are serious about getting great performance from your GPU, Jacket is the better option.  Look at all the real customers that are getting big benefit. Here are some other recent benchmarks from the Walking …

ArrayFire Support for CUDA 4.1

John MelonakosAnnouncements, ArrayFire, C/C++, CUDA, Fortran Leave a Comment

The question above comes from María (@turbonegra).  She follows us @accelereyes.  Many of you are wondering when ArrayFire support for new CUDA version 4.1 will be released.  The answer: work is currently under way. CUDA 4.1 includes a new Fermi compiler, and many people in the GPU ecosystem have reported slowdowns from upgrading to the new CUDA version. So we’ve delayed releasing ArrayFire and Jacket support for CUDA 4.1 because we want to verify performance and reliability across all our unit tests, performance regressions, and customer code samples.  Our tests sweep across various driver versions and everything from mobile GeForce cards through server-grade Tesla and Fermi chips. We are still working through the testing and verification at the moment. While …

AccelerEyes Releases ArrayFire GPU Software

ScottAnnouncements, ArrayFire, C/C++, CUDA, Fortran, OpenCL 1 Comment

A free, fast, and simple GPU library for CUDA and OpenCL devices. AccelerEyes announces the launch of ArrayFire, a freely-available GPU software library supporting CUDA and OpenCL devices. ArrayFire supports C, C++, Fortran, and Python languages on AMD, Intel, and NVIDIA hardware.  Learn more by visiting the ArrayFire product page. “ArrayFire is our best software yet and anyone considering GPU computing can benefit,” says James Malcolm, VP Engineering at AccelerEyes.  “It is fast, simple, GPU-vendor neutral, full of functions, and free for most users.” Thousands of paying customers currently enjoy AccelerEyes’ GPU software products.  With ArrayFire, everyone developing software for GPUs has an opportunity to enjoy these benefits without the upfront expense of a developer license. Reasons to use ArrayFire: …

Fast Computer Vision with OpenCV and ArrayFire

John MelonakosArrayFire, Benchmarks, Case Studies, CUDA Leave a Comment

Update:  While the post below discusses LibJacket (no longer a product), you can do the same thing in the newer, but different, ArrayFire library.  Improved performance benchmarks and a simpler API are the results of moving from LibJacket to ArrayFire. Mcclanahoochie just posted some code and instructions for pairing OpenCV with LibJacket to get accelerated computer vision.  You can do really fast image processing on video cam feeds too, see picture below: Really cool stuff.  Computer vision is really hot with applications emerging in defense, radiology, games, automotive, and other consumer applications. Computer vision algorithms like these are also going mobile.  For instance, we have started to build LibJacket for Mobile applications, which runs on Tegra, PowerVR, and other mobile …

Tree cats see your code!

John MelonakosArrayFire Leave a Comment

From time-to-time we stumble across funny quirks while using MATLAB®.  The latest came as one of our developers accidentally mis-keyed a few characters.  With 5 characters on the command line, you too can get a message about tree cats seeing your bad code (followed by a nasty seg fault, so beware).  Try this: >> a()@a tree_cat sees bad code * Subsref [4] * M_ID 0(5) which * M_LRB 5(1) * ExprList [1] * M_ID 6(1) e * M_RRB 7(1) tree_cat sees bad code * Subsref [4] * M_ID 0(5) which * M_LRB 5(1) * ExprList [1] * M_ID 6(1) e * M_RRB 7(1) Top Secret:  Part of Jacket’s GPU runtime involves monkeys obtaining bananas for optimal performance. While we can’t …

Accelerate Computer Vision Data Access Patterns with Jacket & GPUs

Gallagher PryorArrayFire Leave a Comment

For computer vision, we’ve found that efficient implementations require a new data access pattern that MATLAB does not currently support.  MATLAB and the M language is great for linear algebra where blocks of matrices are the typical access pattern, but not for Computer Vision where algorithms typically operate on patches of imagery. For instance, to pull out patches of imagery in M, one must do a double nested for loop, A = rand(100,100) for xs = -W:W for ys = -W:W patch(xs+W+1, ys+W+1) = A(xs+1+x, ys+1+y); end end …with guards for boundary conditions, etc. It gets even more complicated with non-square patches. On top of that, these implementations don’t translate to the GPUs memory hierarchy at all and are thus …