ArrayFire v2.0 Release Candidate Now Available for Download

Aaron TaylorAnnouncements, ArrayFire, CUDA, OpenCL Leave a Comment

ArrayFire v2.0 is now available for download. The second iteration of our free, fast, and simple GPU library now supports both CUDA and OpenCL devices. Major Updates ArrayFire now works on OpenCL enabled devices New and improved documentation Optimized for new GPUs–NVIDIA Kepler (K20) and AMD Tahiti (7970) New in ArrayFire OpenCL Same APIs as ArrayFire CUDA version Supports both Linux and Windows Just In Time Compilation (JIT) of kernels Parallel for: gfor Accelerated algorithms in the following domains Image Processing Signal Processing Data Analysis and Statistics Visualization And more New in ArrayFire CUDA New Signal and Image processing functions Faster transpose and matrix multiplication Better debugging support for GDB and Visual Studio Bug fixes to make overall experience better For a more complete list of  the …

clMath: An Open Source BLAS and FFT Library for OpenCL

ScottAnnouncements, OpenCL Leave a Comment

If you’re reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire’s higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a significant boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute to the OpenCL community by open-sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have previously used AMD’s OpenCL libraries within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to join AMD and the open-source community in maintaining and improving these libraries for the benefit of all. …

Parallel Software Development Trends for Dummies

John MelonakosComputing Trends Leave a Comment

Last month, I posted two articles describing computing trends and why heterogeneous computing will be a significant force in computing for the next decade. Today, I continue that series with an article describing the biggest challenge to continued increases in computing performance – parallel software development. Biggest Challenge As I described previously, in order to use an accelerator, software changes must be made. Regular x86-based compilers cannot compile code to run on accelerators without these needed changes. The amount of software change required varies depending upon the availability of and reliance upon software tools that increase performance and productivity. There are four possible approaches to take advantage of accelerators in heterogeneous computing environments:  do-it-yourself, use compilers, use libraries, or use …

7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps

ArrayFireArrayFire, CUDA, OpenCL Leave a Comment

In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …

Upcoming CUDA & OpenCL Training Courses

John MelonakosAnnouncements, CUDA, OpenCL Leave a Comment

We’re pleased to announce upcoming CUDA and OpenCL training courses. Over the past couple of years, we’ve received numerous requests from around the world to be trained by AccelerEyes engineers. We finally got our act together and now have a nice schedule of CUDA and OpenCL training courses for 2013 within the United States: CUDA Feb 25-26, Houston, TX Mar 4-5, Baltimore/Washington D.C. Mar 25-26, Los Angeles, CA Apr 9-10, Seattle, WA Apr 15-16, San Francisco, CA May 6-7, Austin, TX May 27-28, Atlanta, GA Jun 10-11, Baltimore/Washington D.C. Jul 8-9, San Jose, CA Sep 2-3, Boston, MA Sep 23-24, Baltimore/Washington, D.C. Oct 7-8, Houston, TX Oct 21-22, Atlanta, GA Nov 4-5, Baltimore/Washington, D.C. Dec 2-3, New York, NY OpenCL …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

ArrayFire Reception in France

John MelonakosArrayFire, Case Studies, CUDA, OpenCL Leave a Comment

As an engineers company, we spend a lot of time wrestling in the weeds of low-level GPU and accelerator codes. This is our battleground, and it can often be dizzying in its complexity. Our whole purpose is to hide that mess and tame those low-level beasts so that ArrayFire users get better performance than anyone else. The joy of ArrayFire comes when we get feedback from ArrayFire users, often from different parts of the world. For instance, the week I share excerpts from two recent emails we received in France: 1) From Barep, a French manufacturing company:  “I think ArrayFire is a ‘must have’ library. It’s very easy to use and can be used under Linux and Windows. Personally, I’m happy …

Getting Started with ArrayFire – a 30-minute Jump Start

ArrayFireArrayFire, C/C++, CUDA, OpenCL 1 Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …

Machine Learning with ArrayFire

ArrayFireBenchmarks, C/C++, Case Studies, CUDA, Events Leave a Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library and its applications to Machine Learning on June 15. This webinar was part of a free series of webinars that help you learn about ArrayFire and Jacket (our MATLAB® product). Anyone can attend these webinars, for they are absolutely free and open for anyone to attend and interact with AccelerEyes engineers. Learn more at http://www.accelereyes.com/webinars. Chris, a Software Engineer at AccelerEyes, explained ArrayFire’s position in the GPU computing world, and presented benchmarks where ArrayFire beats GPU libraries such as Thrust in many critical applications. He also mentioned that ArrayFire could be used either standalone, or in combination with other options for GPU computing such …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …