GTC 2013 Tutorial – CUDA Accelerated Image Processing Libraries

John Melonakos ArrayFire, CUDA, Events Leave a Comment

The 2013 GPU Technology Conference is just two weeks away. We’re super excited. We’re spending a lot of time preparing for our tutorial on CUDA Accelerated Image Processing Libraries. We think it will be well worth your while to attend. This is an 80-minute share all about CUDA image processing from James Malcolm, an AccelerEyes co-founder and lead engineer. You will walk away from the tutorial much better prepared to build fast computer vision and image processing codes. The session abstract is as follows: Image processing has consistently proven to benefit greatly from GPU acceleration. A number of libraries available from NVIDIA and AccelerEyes make image processing development efficient and lead to big speedups. Using these libraries can often significantly shorten …

ArrayFire Examples (Part 1 of 8) – Getting Started

ArrayFire ArrayFire, CUDA Leave a Comment

This is the first in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the getting_started/ directory. Hello World Of course we start with the classic “Hello World” example, which walks you through the basics of using the ArrayFire library. Running this example will print out system and device information, as well as perform some basic matrix operations. This is a good place to get familiar with the basic data container for ArrayFire – the array. ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 304.54 GPU0 Quadro 6000, 6144 …

Benchmarking Tesla K20

Pavan Yalamanchili ArrayFire, Benchmarks, CUDA 1 Comment

In this blog post, we are going to compare NVIDIA’s latest high end offering, the Tesla K series (PDF) with their previous offering. In particular we are comparing the Tesla K20C with Tesla C2070/2075. This blog post follows a similar post about benchmarking the GTX680 we did last year. We take a look at similar set of functions (and a little bit more) to see what benefits the newer line brings. All of the benchmarks were done using double precision. In all of the graphs, higher trendlines are better. Matrix Multiplication In house at AccelerEyes, we use matrix multiplication as the gold standard for testing the maximum performance of all new GPUs we end up with. The K20c reaches a peak at …

7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps

ArrayFire ArrayFire, CUDA, OpenCL Leave a Comment

In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …

How much speedup can you get with CUDA or OpenCL?

Scott ArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

ArrayFire Reception in France

John Melonakos ArrayFire, Case Studies, CUDA, OpenCL Leave a Comment

As an engineers company, we spend a lot of time wrestling in the weeds of low-level GPU and accelerator codes. This is our battleground, and it can often be dizzying in its complexity. Our whole purpose is to hide that mess and tame those low-level beasts so that ArrayFire users get better performance than anyone else. The joy of ArrayFire comes when we get feedback from ArrayFire users, often from different parts of the world. For instance, the week I share excerpts from two recent emails we received in France: 1) From Barep, a French manufacturing company:  “I think ArrayFire is a ‘must have’ library. It’s very easy to use and can be used under Linux and Windows. Personally, I’m happy …

Getting Started with ArrayFire – a 30-minute Jump Start

ArrayFire ArrayFire, C/C++, CUDA, OpenCL 1 Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library. This webinar was part of an ongoing series of webinars that will help you learn more about the many applications of ArrayFire, while interacting with AccelerEyes GPU computing experts. ArrayFire is the world’s most comprehensive GPU software library. In this webinar, James Malcolm, who has built many of ArrayFire’s core components, walked us through the basic principles and syntax for ArrayFire. He also provided an overview of existing efforts in GPU software, and compared them to the extensive capabilities of ArrayFire. For example, the same application that takes 26 lines to write in Thrust, can be coded up in just 3 lines in ArrayFire! ArrayFire has supported …

Image Processing with ArrayFire and OpenCV on the GPU

John Melonakos ArrayFire, C/C++, Case Studies, CUDA Leave a Comment

ArrayFire is a great way to supplement OpenCV for faster processing on the GPU. Mcclanahoochie recently posted an interactive demo showing the use of OpenCV with ArrayFire for computing Local Contrast Enhancement on the GPU from webcam video. Mcclanahoochie also shows how easy it is to convert OpenCV Mat images into ArrayFire GPU array images, as seen in the code snippit below: All the source code is available on Google Code, linked to from his website. Simply download ArrayFire and OpenCV and try it out for yourself!

SAR Image Formation Algorithms on the GPU

ArrayFire ArrayFire, Case Studies, CUDA 1 Comment

Since the 1950s Synthetic aperture radar (SAR) systems have gained extreme popularity in both civilian and military domains due to their all-weather, day-or-night capabilities as well as the ability to render different views of a “target”. However, the raw SAR data (phase-history data) must be preprocessed  since all point targets at each pulse instance are superimposed  and create a complex interference that is not very useful for target location. SAR image formation algorithms compress this target information in range (frequency) and along-track (azimuth) directions to obtain interpretable images. In the paper titled “SAR image formation toolbox for MATLAB®“, Gorham L.A. and Moore L.J. of the Air Force Research Lab discuss the implementation of the matched filter and backprojection image formation …

Option Pricing

ArrayFire ArrayFire, Benchmarks, C/C++, Case Studies, CUDA 2 Comments

Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using Jacket to accelerate his MATLAB® code with GPUs. Andrew says, “My buddy and I are, at best, novice programmers and we couldn’t imagine having to figure out how to code all this in CUDA.” But he found Jacket to be straight-forward. With these results, he says he can see Jacket and GPUs populating Koch’s mark-to-futures cube, which contains its assets, simulations, and simulated asset prices. Modern option pricing techniques are often considered among the most mathematically complex of all applied areas of finance. Andrew shared some exemplary code to demonstrate how much leverage you can get out of Jacket and GPUs for financial computing in MATLAB® and C/C++. …