7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps

ArrayFireArrayFire, CUDA, OpenCL Leave a Comment

In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

ArrayFire Reception in France

John MelonakosArrayFire, Case Studies, CUDA, OpenCL Leave a Comment

As an engineers company, we spend a lot of time wrestling in the weeds of low-level GPU and accelerator codes. This is our battleground, and it can often be dizzying in its complexity. Our whole purpose is to hide that mess and tame those low-level beasts so that ArrayFire users get better performance than anyone else. The joy of ArrayFire comes when we get feedback from ArrayFire users, often from different parts of the world. For instance, the week I share excerpts from two recent emails we received in France: 1) From Barep, a French manufacturing company:  “I think ArrayFire is a ‘must have’ library. It’s very easy to use and can be used under Linux and Windows. Personally, I’m happy …

Getting Started with ArrayFire – a 30-minute Jump Start

ArrayFireArrayFire, C/C++, CUDA, OpenCL 1 Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …

Exciting Updates from AccelerEyes

John MelonakosAnnouncements 4 Comments

We are pleased to announce today that MathWorks and AccelerEyes have started working together to provide the best overall solution for GPU computing in MATLAB® through the Parallel Computing Toolbox™ and MATLAB Distributed Computing Server™ from MathWorks. This new relationship will result in great product updates for end users of the Parallel Computing Toolbox™ and MATLAB Distributed Computing Server™. Since 2007, AccelerEyes has been a leader in developing GPU software, including Jacket.  AccelerEyes has sold Jacket as a 3rd-party add-on to the MathWorks MATLAB® product.  Effective today, AccelerEyes will discontinue new Jacket product sales.   All existing Jacket license holders will continue to receive support and maintenance from AccelerEyes for 1 year. All existing Jacket licenses are perpetual and will not expire.  Future GPU computing updates …

Jacket v2.3 Now Available

John MelonakosAnnouncements, CUDA 1 Comment

We are pleased to announce the new release of Jacket v2.3.  This new version of Jacket brings even greater performance improvements through GPU computing for MATLAB® codes.  (Click here to download v2.3) With v2.3, new support has been added for CUDA 5.0.  This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line. This morning we received an email from a Jacket user who said, “V2.3 + CUDA 5 = wow. Just upgraded and re-ran one of the routines that previously took just under 4 minutes – now less than 2 minutes!” This is a must-have release for all Jacket users.  The performance improvements are generally felt across the board.  Existing Jacket …