ArrayFire Examples (Part 1 of 8) – Getting Started

ArrayFireArrayFire, CUDA Leave a Comment

This is the first in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the getting_started/ directory.

Hello World

Of course we start with the classic “Hello World” example, which walks you through the basics of using the ArrayFire library. Running this example will print out system and device information, as well as perform some basic matrix operations. This is a good place to get familiar with the basic data container for ArrayFire – the array.

ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Linux)
CUDA toolkit 5.0, driver 304.54
GPU0 Quadro 6000, 6144 MB, Compute 2.0 (single,double)
Display Device: GPU0 Quadro 6000
Memory Usage: 5549 MB free (6144 MB total)...


In this example we show you how to perform a basic image convolution, as well as how to use our timing functions. The output of running this code is simply the time it took to perform each convolution. Here we do it three different ways:

// Wrapper functions for timeit()
static void full() { full_out = convolve(img, kernel); }
static void dsep() { dsep_out = convolve(dx, spread, img); }
static void hsep() { hsep_out = convolve(5, h_dx, 5, h_spread, img); }

The first method simply convolves signal (img) with the filter (kernel). The second method allows you to just supply the two dimensions of the filter independently, and the third method allows you to use native data pointers (what we call a host) instead of the device data type.

Convolving a 640 by 480 matrix
full 2D convolution: 0.000258857 seconds
separable, device pointers: 9.71524e-05 seconds
separable, host pointers: 0.000117301 seconds

And, trying it on a bigger matrix…

Convolving a 6400 by 4800 matrix
full 2D convolution: 0.0242132 seconds
separable, device pointers: 0.008284 seconds
separable, host pointers: 0.0083929 seconds


We have a method for performing the Fast Fourier Transform on an ArrayFire array. In this example we show you how to compute and print the result, as well as extract data from device memory. The following code snippet shows you the extraction:

cuComplex *B =;
float real = cuCrealf(*B), imag = cuCimagf(*B);
printf("B[0] = %g %c %gin", real, ((imag < 0) ? '-' : '+'), imag);

Generally, moving data back and forth between the CPU and devices is relatively slow, and eliminating as much of these transfers as possible will give you the best results. This example also shows you the other way to time a function using ArrayFire. Here are some of the results that I get when running the FFT on my machine “Quadro 6000, 6144 MB, Compute 2.0 (single,double)“:

FFT on a 30x30 matrix: 0.000431 seconds
FFT on a 9000x9000 matrix: 0.001324 seconds


This is a great example because you get to see the power of a special ArrayFire feature, the gfor-loop, which optimizes certain cases of for-loop programming. The basic idea of gfor is that you can launch all iterations of a loop in parallel on the GPU. There are a lot of things that can be done with the gfor loop, see the documentation for more details. In this example we see how much faster gfor is than a regular for (even if we are using ArrayFire arrays in the regular for-loop), and we see a few different ways of using a gfor loop. On my machine “Quadro 6000, 6144 MB, Compute 2.0 (single,double)” I got the following speed-up on the matrix multiply portion of the example:

Timing matrix multiply...
 for-loop 0.000210663 seconds
 gfor-loop 6.75187e-05 seconds
 speedup 3.1x

One of the main advantages of using ArrayFire is speed. The library was built to take advantage of your system’s resources.


This example shows off a bunch of the arithmetic that can be done on the elements of a matrix. One of the key things to learn here is that ArrayFire supports a number of basic data types for the elements of the array container. A description of these types can be found here. Many of these operations that shift or pick elements (like sort, or min) can also return an index array containing references to the original positions of the chosen elements. This example shows another great advantage of using ArrayFire – simplicity. Many of the snippets here perform operations in a way that is easy to understand, without a lot of the typical hassle in worrying about the details of data, memory, not to mention GPU interfacing.

Here is a list of functions found in this example: col, row, span, array bit-wise ops (&, |, ^) and logical-ops (&&, ||), sort, transpose, flip, sum, multiply, min, and max

Linear Algebra

The example walks you through three of the functions in the linear algebra package – LU decomposition, solving the equation AX = B, and Eigen value decomposition. The setup and function calls are simple:

out = lu(in);
array X = solve(A, B);

Just as a note, the last function “eigen” is part of the Pro License package, so you won’t be able to run the example without the license. Here is what the output looks like on my machine “Quadro 6000, 6144 MB, Compute 2.0 (single,double)“:

-- ArrayFire Eigen value decomposition

eigen(in) =

in =
   0.3022  0.7513  0.9504
   0.8077  0.1578  0.4499
   0.4071  0.5678  0.9829

val =
   1.8201  0.0000  0.0000
   0.0000  -0.5023  0.0000
   0.0000  0.0000  0.1251

vec =
   -0.6237  -0.5741  -0.3185
   -0.4717  0.8049  -0.7139
   -0.6232  -0.1503  0.6237


To wrap it all up, there is an sample problem of measuring data across many sites. The ArrayFire library makes it simple to aggregate, and analyze this data. Of course, here we just use a small number of data points, but you can see how easy it would be to run this kind of powerful analysis on a large data set.

Download the ArrayFire library and give it a try!

Posts in this series:

  1. ArrayFire Examples (Part 1 of 8) – Getting Started
  2. ArrayFire Examples (Part 2 of 8) – Benchmarks
  3. ArrayFire Examples (Part 3 of 8) – Financial
  4. ArrayFire Examples (Part 4 of 8) – Image Processing
  5. ArrayFire Examples (Part 5 of 8) – Machine Learning
  6. ArrayFire Examples (Part 6 of 8) – Multiple GPU
  7. ArrayFire Examples (Part 7 of 8) – PDE


Leave a Reply

Your email address will not be published. Required fields are marked *