Sparse Matrices in ArrayFire v3.4

Shehzan Mohammed ArrayFire Leave a Comment

In ArrayFire v3.4, we have added support for sparse matrices, which greatly reduce the memory footprint on GPUs and accelerated devices for many applications.

Sparse Data Structure

A sparse data structure is one where all the non-zero elements are not stored. Sparse matrices are useful when the number of zero-values elements are much greater than the number of non-zero elements (i.e. the sparsity of the matrix is high). A sparse data structure is generally stored as 3 arrays:

  • A data or values array containing all the non-zero elements
  • A vector for row indices (based on storage format)
  • A vector for column indices (based on storage format)

There are many ways to store sparse matrices, the most prominent of which are:

ArrayFire supports CSR and COO format, with support for CSC coming in future releases.

A more detailed description of sparse matrices can be found on Wikipedia.

Sparse Arrays in ArrayFire

In ArrayFire, sparse arrays use the same user-facing object, af::array. This makes working with sparse arrays very seamless. We have added a function af::array::issparse() that allows users to check if an array is sparse or not. Although externally, there is no difference between a sparse and dense array, internally, the sparse array is stored as 3 independent arrays along with other metadata. These arrays, as well as other properties, can be fetched using the sparse functions.

In v3.4, the following features are supported for sparse arrays:

The Sparse-Dense Matrix Multiplication is implemented within the same af::matmul() function used for dense arrays.

As with all ArrayFire features, sparse support is available across all backends (CUDA, OpenCL, and CPU) and supported devices and architectures.


In this section, we showcase the difference in running a conjugate gradient solver with a dense array and a sparse array. The inputs are the same for each example. In the sparse case, the “A” matrix is converted to sparse.


using namespace af;

static size_t dimension = 16 * 1024;
static const int maxIter = 10;
static const int sparsityFactor = 7;

static array A;
static array spA;  // Sparse A
static array x0;
static array b;

void setupInputs()
    // Generate a random input: A
    array T = randu(dimension, dimension, f32);
    // Create 0s in input.
    // Anything that is no divisible by sparsityFactor will become 0.
    A = floor(T * 1000);
    A = A * ((A % sparsityFactor) == 0) / 1000;
    // Make it positive definite
    A = transpose(A) + A + A.dims(0)*identity(A.dims(0), A.dims(0), f32);

    // Make A sparse as spA
    spA = sparse(A);

    // Generate x0: Random guess
    x0 = randu(A.dims(0), f32);

    //Generate b
    b = matmul(A, x0);

    std::cout << "Sparsity of A = "
              << 100.f * (float)sparseGetNNZ(spA) / (float)spA.elements()
              << std::endl;
    std::cout << "Memory Usage of A = " <<  A.bytes() << " bytes" << std::endl;
    std::cout << "Memory Usage of spA = "
              << sparseGetValues(spA).bytes()
               + sparseGetRowIdx(spA).bytes()
               + sparseGetColIdx(spA).bytes()
              << " bytes" << std::endl;

void conjugateGradient(const af::array in)
    array x = constant(0, b.dims(), f32);
    array r = b - matmul(in, x);
    array p = r;

    for (int i = 0; i < maxIter; ++i) {
        array Ap = matmul(in, p);
        array alpha_num = dot(r, r);
        array alpha_den = dot(p, Ap);
        array alpha = alpha_num/alpha_den;
        r = r - tile(alpha, Ap.dims())*Ap;
        x = x + tile(alpha, Ap.dims())*p;
        array beta_num = dot(r, r);
        array beta = beta_num/alpha_num;
        p = r + tile(beta, p.dims())*p;
    array res = x0 - x;

    std::cout << "Final difference in solutions:\n";
    af_print(dot(res, res));

int main(int argc, char *argv[])

    std::cout << "Verifying Dense Conjugate Gradient:" << std::endl;

    std::cout << "Verifying Sparse Conjugate Gradient:" << std::endl;

    return 0;

Performance and Memory Footprint

Using the CUDA backend on a Tesla K40c, the conjugate gradient function took 64.94ms for the dense matrix and 39.31ms for the sparse Matrix. That is almost a 2x speedup!

With a sparsityFactor of 7, the sparsity of A is about 26.38%. This means a tremendous amount of memory is conserved. For a floating point matrix of size 16M x 16M, a dense matrix uses 1 GB of memory while the sparse array, stored in CSR format, only uses 527 MB. That is a 2x savings on memory usage!

Matrix Elements Memory Size (Dense) Memory Size (Sparse) CG Time (Dense) CG Time (Sparse)
16M x 16M 1024 MB 527 MB 64.94ms 39.31ms
4M x 4M 64 MB 33 MB 4.752ms 3.935ms

Sparse arrays will now let users store more data on the GPU while allowing faster computation, thus allowing to solve for bigger, more complex problems.


ArrayFire v3.4 can be downloaded from these locations:


ArrayFire is continually improving through the addition of new sparse matrix enhancements. We welcome your feedback:

Finally, as you find success with ArrayFire, we invite you to contribute a post to this blog to share with the broader community. Email to contribute to this blog.

Leave a Reply

Your email address will not be published.