ArrayFire Pro : Features and Scalability

ArrayFireArrayFire, C/C++, CUDA, Fortran Leave a Comment

ArrayFire is a fast GPU library that off-loads compute intensive tasks onto many-core GPUs, thereby reducing application runtime and accelerating it many times. ArrayFire is built on top of NVIDIA CUDA software stack which is currently the best and most stable GPU Software Development Kit available for GPU-based computing.

ArrayFire comes with a huge set of functions that span across various domains like image processing, signal processing, financial modeling, applications requiring graphics support. ArrayFire has an array based notation (supports N-dimensional arrays) and allows sub-referencing and assignment into these multi-dimensional arrays. The following code snippet shows how you can index into array objects.

// Generate a 3x3 array of random numbers on the GPU
array A = randu(3,3);
array a1 = A(0);   // first element
array a2 = A(0,1); // first row, second column

A(end);   // last element
A(-1);    // also last element
A(end-1); // second-to-last element

A(1,span);       // second row
A.row(end);      // last row
A.cols(1,end);   // all but first column

// setting entries in an array to a constant
A(span) = 4;        // fill entire array
A.row(0) = -1;      // first row
A(seq(3)) = 3.1415; // first three elements

// copy in another matrix
array B = ones(4,4,f64);
B.row(0) = randu(1,4,f32); // set a row to random values (also upcast)

ArrayFire also has a premium version called ArrayFire Pro that adds other capabilities to the basic version, which includes scaling up to multi-GPUs, dense and sparse linear algebra on GPUs and sparse support in general. These add-ons play a significant role porting critical sections of the code onto the GPUs for improving performance. Reason being most of the code in scientific applications involve algorithms that do some kind of linear algebra, eg: solving a system of linear equations. These addons therefore give the overall application a big performance boost.

Added to this is the multi-GPU capability that delivers cost to performance benefit over multi-core CPU approach and is also “greener” as GPUs deliver more floating point operations per second per Watt (FLOPS / Watt). Multi-GPU capability is also important for scaling up the performance from a single GPU version, and is the most effective way of doing distributed computing in a cluster employing multiple GPUs. This capability in some cases gives users an opportunity to obtain results in real time, for example, estimating price of a stock in real time doing a multi-GPU Monte Carlo simulation. ArrayFire Pro enables these functionalities by providing a simple set of functions to switch between GPUs and compute simultaneously launching CUDA functions on each GPU in parallel. The follwing code snippet shows how you can do multi-GPU FFT using ArrayFire.

// This is the input buffer
float * data_h[1000*4];
array data[4];
array result[4];

for (int i = 0; i < NGPU; i++) {
   // Switch to GPU 'i'
   device(i);

   // Create Array object using host (CPU)pointer
   // and move data to GPU memory
   data[i] = array(1000,1, A_h + i * 1000);
}

// Compute FFTs on all GPUs in parallel
for (int i = 0; i < NGPU; i++) {
   device(i);
   result[i] = fft(data[i]));
}

// Global synchronization barrier between GPU and host(CPU) thread.
// Blocks CPU thread till all GPUs finish computing FFTs.
af::sync();

ArrayFire Pro also offers double-precision linear algebra (DLA) on GPU which forms the basis of most calculations in scientific computing. The DLA add-on provides functions that are common to matrix analysis, linear systems solutions, eigen value problems and singular value decompositions.

Most of the algorithms either involve operating on dense matrices or sparse matrices. ArrayFire Pro addresses both these classes of algorithms and can handle dense as well as sparse inputs. ArrayFire sparse functionality has typical use cases like Power systems simulations, chemical processes simulation problems, optimization problems, etc. ArrayFire Pro provides both dense and sparse solvers, constructs sparse arrays with standard sparse storage formats and gels quite well with other libraries and user applications following standard storage formats like Compressed-Row(CSR), Compressed Column (CSC), etc.

ArrayFire Pro adds powerful and advanced capabilities to ArrayFire free version and is worth the cost. With ever evolving GPU architectures and software stack, GPU-based computing has proved to be quite challenging, laborious and a costly process in terms of development and maintenance. ArrayFire helps customers save a lot of their development time and maintenance costs. ArrayFire offloads all the computations onto the GPUs hiding all low level CUDA constructs and gives users a chance to concentrate on their problem rather than worrying about writing CUDA code and exploring various optimization strategies for peak performance on GPUs. ArrayFire engineers work day and night to attain the best performance across different GPU architectures, giving its customers value for every dollar spent.

Leave a Reply

Your email address will not be published. Required fields are marked *