In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …
Getting Started with ArrayFire – a 30-minute Jump Start
In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …
CUDA GPUs Boost Mars Research
With the recent news release from NASA about the Mars Curiosity rover, and as a continuation of our previous post “Powering Mars Research”, Brendan Babb is here again to provide us with an exciting look into Jacket’s role in Mars research from the Curiosity rover . Brendan Babb and colleague Frank Moore, at the University of Alaska in Anchorage, work with NASA’s Jet Propulsion Lab to improve image quality and image compression of the Mars Rover images. Here is what Brendan had to tell us about the use of Jacket in his GPU computing challenges… Brendan Babb: I was thrilled to watch the new Mars Rover Curiosity successful landing with my visiting nieces and nephews. The new rover will take pictures, …
Fast Computation of Isotropic Gradients with Jacket’s Convolutions
Researchers from the École Polytechnique de Montréal showed that Jacket is very efficient to rapidly calculate 2D or 3D isotropic gradients in MATLAB® code. From a mathematical point of view, the isotropic gradients are characterized by their very precise orientation compared to the standard 1D finite difference discretizations. Using convolution functions developed by AccelerEyes, the method becomes very simple to apply and provides a very fast evaluation of isotropic gradients of functions or images. This type of isotropic discretization currently has an application in computational fluid dynamics. They are useful for simulating immiscible multiphase flows using the Lattice Boltzmann Method (LBM), where the orientation of the various fluid interfaces has to be computed very frequently and precisely. In multiphase flow …
Genomics Applications on the GPU
Recently, AccelerEyes held a free webinar that dealt with accelerating genomics MATLAB applications on the GPU. We recently added new genomics examples to Jacket, and wanted to use this webinar to showcase these examples and run through some code. This was part of the free series of AccelerEyes webinars that provide a great opportunity for you to interact with AccelerEyes engineers, see demos executing live on GPUs, and learn about AccelerEyes products and services. Over the course of the last decade, GPUs have continued to advance at a large pace, and are leaving CPUs behind in some ways, specifically in terms of their ability to perform massively parallel computations. Jacket is proven to be very efficient at harnessing this ability …
Time delay estimation algorithms with Jacket
Time delay estimation (TDE) techniques have many diverse signal processing applications: for instance, in such fields as radar, sonar, seismology, geophysics, and ultrasonics for identifying and localizing radiating sources. In this case study, we evaluate the performance of two algorithms developed by Markus Nentwig to find delay and scaling factor between two cyclic signals. The first algorithm uses linear least-squares fitting to estimate the delay. The second algorthm is iterative and relies on FFT-based cross-correlation. A MATLAB® implementation of both approaches can be found in Algorithm 1 and Algorithm 2, respectively. As the author pointed out, the algorithms are not suited for real-time applications since the whole signal needs to be known in advance. However, they can be very useful …
SAR Image Formation Algorithms on the GPU
Since the 1950s Synthetic aperture radar (SAR) systems have gained extreme popularity in both civilian and military domains due to their all-weather, day-or-night capabilities as well as the ability to render different views of a “target”. However, the raw SAR data (phase-history data) must be preprocessed since all point targets at each pulse instance are superimposed and create a complex interference that is not very useful for target location. SAR image formation algorithms compress this target information in range (frequency) and along-track (azimuth) directions to obtain interpretable images. In the paper titled “SAR image formation toolbox for MATLAB®“, Gorham L.A. and Moore L.J. of the Air Force Research Lab discuss the implementation of the matched filter and backprojection image formation …
Optics Applications with ArrayFire
In case you missed it, we recently held a webinar on the Jacket GPU Computing Engine for MATLAB® and its applications to Optics and Photonics on Aug 1. From beam propagation methods to lens design, optics engineers are enjoying the benefit of GPU computing with Jacket to accelerate MATLAB® codes. This was part of a free series of webinars that help you learn about ArrayFire (for C/C++/Fortran/Python) and Jacket (for use with MATLAB®). Anyone can attend these webinars, for they are absolutely free and open for anyone to attend and interact with AccelerEyes engineers. Learn more at http://www.accelereyes.com/webinars. Jacket allows you to envision really fast applications for GPU computing, and the team at AccelerEyes recently helped Northrop Grumman Corporation achieve …
Machine Learning with ArrayFire
In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library and its applications to Machine Learning on June 15. This webinar was part of a free series of webinars that help you learn about ArrayFire and Jacket (our MATLAB® product). Anyone can attend these webinars, for they are absolutely free and open for anyone to attend and interact with AccelerEyes engineers. Learn more at http://www.accelereyes.com/webinars. Chris, a Software Engineer at AccelerEyes, explained ArrayFire’s position in the GPU computing world, and presented benchmarks where ArrayFire beats GPU libraries such as Thrust in many critical applications. He also mentioned that ArrayFire could be used either standalone, or in combination with other options for GPU computing such …
Option Pricing
Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using Jacket to accelerate his MATLAB® code with GPUs. Andrew says, “My buddy and I are, at best, novice programmers and we couldn’t imagine having to figure out how to code all this in CUDA.” But he found Jacket to be straight-forward. With these results, he says he can see Jacket and GPUs populating Koch’s mark-to-futures cube, which contains its assets, simulations, and simulated asset prices. Modern option pricing techniques are often considered among the most mathematically complex of all applied areas of finance. Andrew shared some exemplary code to demonstrate how much leverage you can get out of Jacket and GPUs for financial computing in MATLAB® and C/C++. …