In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …
Upcoming CUDA & OpenCL Training Courses
We’re pleased to announce upcoming CUDA and OpenCL training courses. Over the past couple of years, we’ve received numerous requests from around the world to be trained by AccelerEyes engineers. We finally got our act together and now have a nice schedule of CUDA and OpenCL training courses for 2013 within the United States: CUDA Feb 25-26, Houston, TX Mar 4-5, Baltimore/Washington D.C. Mar 25-26, Los Angeles, CA Apr 9-10, Seattle, WA Apr 15-16, San Francisco, CA May 6-7, Austin, TX May 27-28, Atlanta, GA Jun 10-11, Baltimore/Washington D.C. Jul 8-9, San Jose, CA Sep 2-3, Boston, MA Sep 23-24, Baltimore/Washington, D.C. Oct 7-8, Houston, TX Oct 21-22, Atlanta, GA Nov 4-5, Baltimore/Washington, D.C. Dec 2-3, New York, NY OpenCL …
How much speedup can you get with CUDA or OpenCL?
Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware: The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes: In general, accelerators will outperform CPUs to …
ArrayFire Reception in France
As an engineers company, we spend a lot of time wrestling in the weeds of low-level GPU and accelerator codes. This is our battleground, and it can often be dizzying in its complexity. Our whole purpose is to hide that mess and tame those low-level beasts so that ArrayFire users get better performance than anyone else. The joy of ArrayFire comes when we get feedback from ArrayFire users, often from different parts of the world. For instance, the week I share excerpts from two recent emails we received in France: 1) From Barep, a French manufacturing company: “I think ArrayFire is a ‘must have’ library. It’s very easy to use and can be used under Linux and Windows. Personally, I’m happy …
Getting Started with ArrayFire – a 30-minute Jump Start
In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …
Exciting Updates from AccelerEyes
We are pleased to announce today that MathWorks and AccelerEyes have started working together to provide the best overall solution for GPU computing in MATLAB® through the Parallel Computing Toolbox™ and MATLAB Distributed Computing Server™ from MathWorks. This new relationship will result in great product updates for end users of the Parallel Computing Toolbox™ and MATLAB Distributed Computing Server™. Since 2007, AccelerEyes has been a leader in developing GPU software, including Jacket. AccelerEyes has sold Jacket as a 3rd-party add-on to the MathWorks MATLAB® product. Effective today, AccelerEyes will discontinue new Jacket product sales. All existing Jacket license holders will continue to receive support and maintenance from AccelerEyes for 1 year. All existing Jacket licenses are perpetual and will not expire. Future GPU computing updates …
CUDA GPUs Boost Mars Research
With the recent news release from NASA about the Mars Curiosity rover, and as a continuation of our previous post “Powering Mars Research”, Brendan Babb is here again to provide us with an exciting look into Jacket’s role in Mars research from the Curiosity rover . Brendan Babb and colleague Frank Moore, at the University of Alaska in Anchorage, work with NASA’s Jet Propulsion Lab to improve image quality and image compression of the Mars Rover images. Here is what Brendan had to tell us about the use of Jacket in his GPU computing challenges… Brendan Babb: I was thrilled to watch the new Mars Rover Curiosity successful landing with my visiting nieces and nephews. The new rover will take pictures, …
Jacket v2.3 Now Available
We are pleased to announce the new release of Jacket v2.3. This new version of Jacket brings even greater performance improvements through GPU computing for MATLAB® codes. (Click here to download v2.3) With v2.3, new support has been added for CUDA 5.0. This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line. This morning we received an email from a Jacket user who said, “V2.3 + CUDA 5 = wow. Just upgraded and re-ran one of the routines that previously took just under 4 minutes – now less than 2 minutes!” This is a must-have release for all Jacket users. The performance improvements are generally felt across the board. Existing Jacket …
Fast Computation of Isotropic Gradients with Jacket’s Convolutions
Researchers from the École Polytechnique de Montréal showed that Jacket is very efficient to rapidly calculate 2D or 3D isotropic gradients in MATLAB® code. From a mathematical point of view, the isotropic gradients are characterized by their very precise orientation compared to the standard 1D finite difference discretizations. Using convolution functions developed by AccelerEyes, the method becomes very simple to apply and provides a very fast evaluation of isotropic gradients of functions or images. This type of isotropic discretization currently has an application in computational fluid dynamics. They are useful for simulating immiscible multiphase flows using the Lattice Boltzmann Method (LBM), where the orientation of the various fluid interfaces has to be computed very frequently and precisely. In multiphase flow …
Genomics Applications on the GPU
Recently, AccelerEyes held a free webinar that dealt with accelerating genomics MATLAB applications on the GPU. We recently added new genomics examples to Jacket, and wanted to use this webinar to showcase these examples and run through some code. This was part of the free series of AccelerEyes webinars that provide a great opportunity for you to interact with AccelerEyes engineers, see demos executing live on GPUs, and learn about AccelerEyes products and services. Over the course of the last decade, GPUs have continued to advance at a large pace, and are leaving CPUs behind in some ways, specifically in terms of their ability to perform massively parallel computations. Jacket is proven to be very efficient at harnessing this ability …