ArrayFire v3.6 Release

Umar Arshad Announcements, ArrayFire 3 Comments

Today we are pleased to announce the release of ArrayFire v3.6.  It can be downloaded from these locations:

This latest version of ArrayFire is better than ever! We added several new features that improve the performance and usability of the ArrayFire library. The main features are:

  • Support for batched matrix multiply
  • Added the topk function
  • Added the anisotropic diffusion filter

We have also spent a significant amount of effort improving the internals of the library. The build system is significantly improved and organized.

Batched Matrix Multiplication

The new batch matmul allows you to perform several matrix multiplication operations in one call of matmul. You might want to call this function if you are performing multiple smaller matrix multiplication operations. Here is an example:

// Create a 3x3x2 matrix
array a = randu(3, 3, 2);
array b = randu(3, 3, 2);

array c = matmul(a, b);

The example above is creating two 3 by 3 matrices as a 3x3x2 volume for a and b. The matmul function then multiplies the slice of a volume with the corresponding slice in b. With this approach you can reach higher efficiencies with smaller matrices. Here is a benchmark for batched matmul on the NVIDIA Tesla V100.

Batched Matrix Multiplication Performance

Batched Matrix Multiplication Performance

The batched matrix multiplication allows you to reach the maximum performance of your GPU with smaller matrix sizes. Larger matrices do not necessarily need to be batched to fully utilize the GPU.


Topk is another exciting function that was added in this release. This function returns the top k maximum or minimum values in a vector or matrix along with their indices. Here is an example:

array in = randu(10, 8); // Create a 10x8 matrix

array idx, vals;
topk(vals, idx, in, 3); // Get the top 3 values of a

In this example we create a 10×8 matrix which will be passed into the topk function. This function will return two matrices of size 3×8. The first matrix is composed of the top 3 maximum values. The second matrix is made up of the index of those values in the original matrix along that particular column.

[10 8 1 1]
 0.6010 0.1583 0.6755 0.5143 0.7917 0.9092 0.5970 0.9048 
 0.0278 0.3712 0.6105 0.3670 0.1654 0.8865 0.9594 0.0198 
 0.9806 0.3543 0.5232 0.3336 0.8657 0.9676 0.2323 0.4436 
 0.2126 0.6450 0.5567 0.0363 0.3766 0.1425 0.9623 0.6808 
 0.0655 0.9675 0.7896 0.5349 0.7331 0.5137 0.8578 0.6636 
 0.5497 0.3636 0.8966 0.0123 0.2522 0.6484 0.0192 0.8738 
 0.2864 0.4165 0.0536 0.3988 0.9644 0.6353 0.7191 0.3954 
 0.3410 0.5814 0.5775 0.9787 0.4711 0.7449 0.4035 0.5277 
 0.7509 0.8962 0.2908 0.2308 0.3637 0.4391 0.4692 0.3592 
 0.4105 0.3712 0.9941 0.6244 0.9643 0.6982 0.3353 0.8567 

[3 8 1 1]
 0.9806 0.9675 0.9941 0.9787 0.9644 0.9676 0.9623 0.9048 
 0.7509 0.8962 0.8966 0.6244 0.9643 0.9092 0.9594 0.8738 
 0.6010 0.6450 0.7896 0.5349 0.8657 0.8865 0.8578 0.8567 
[3 8 1 1]
 2 4 9 7 6 2 3 0 
 8 8 5 9 9 0 1 5 
 0 3 4 4 2 1 4 9 

The topk function is optimized for small sizes of k(<32). You could have implemented something like this in older versions of ArrayFire using a combination of sort and indexing. Here are some benchmarks comparing the two approaches.

For smaller sizes, topk is getting a 23x speedup over the sort and indexing approach. This test was performed on an NVIDIA Quadro GV100 GPU. Note that the run-times are on a log scale.

Anisotropic Smoothing Filter

Anisotropic smoothing filter is an edge-preserving smoothing filter. It smooths out minor intensity variations all the while enhancing edges and removing noise. A more detailed explanation of what this filter does internally is explained in the documentation. Given below is the sample output from this filter that illustrates how edges are better preserved in comparison to normal isotropic smoothing techniques, such as Gaussian blurring.


Image 1: Original Input(a), Prewitt edge filter(b), Anisotropically smoothed input(c), Gradient after gaussian blur(d), Gradient on original input (e), Gradient after diffusion(f)

Notice the edge details in images 1(b) & 1(e). There are lot of extraneous details mostly that come out of noise. Even after we apply Gaussian blur (more than once) on the original input, some of the required details may also go away along with noise as illustrated in the image 1(d). anisotropic diffusion comes to the rescue in such cases. Notice the image 1(c) in comparison to 1(a). All minor/subtle variations in texture are smoothed out while preserving the edges as you can see in gradient of diffused image in 1(f). Feel free to play with the parameters and kindly notify us if there are unexpected behaviors.

Other Improvements

We have also improved the performance of our JIT. We have moved the shift and select functions into JIT. We also reduced the number of mutex locks and improved the thread safety of ArrayFire.

In addition to the functions above, we have made several improvements which will increase the usability of the library. Our CMake config script now uses import targets instead of plain variables to add flags. If you are using CMake you can now compile and link your application with ArrayFire like this:


add_executable(FantasticApp main.cpp)
target_link_library(FantasticApp PRIVATE ArrayFire::afopencl)

This will automatically add the correct compiler and linker flags when you compile your application.

We have also improved the building from source experience ArrayFire. Internally we have employed Modern CMake techniques and rewrote most of the build scripts to make it easy to maintain and read.

The period and quality of the random number generator has been improved, thanks to Ralf Stubner. He has also added several unit tests to keep track of the quality of the random number generator in the future. Adrien Vincent added a few color maps that were available in our Forge library but were missing from ArrayFire. Several other contributors helped us fix errors in our documentation and build system.

We are excited to finally release ArrayFire v3.6. We would like to thank our community for supporting us and helping us improve ArrayFire. We are looking forward to more community participation for the future releases!

Dedicated Support and Coding Services

ArrayFire is open source and always will be. For those who want dedicated support or custom function development, we offer a variety of support packages.

ArrayFire also serves many clients through consulting and coding services, algorithm development, porting code, and training courses for developers. Contact us at or schedule a free technical consultation to learn more about our consulting and coding services.

Comments 3

    1. Post

      It is unfortunate Apple has chosen to deprecate OpenCL. Even though the Apple drivers were known to have several bugs in their implementation, we wanted to support macOS so that our users do not have to move away from the platform of their choice. We do not have any immediate plans of dropping support for macOS because you can still use CUDA and CPU backends.

Leave a Reply

Your email address will not be published.