In ArrayFire v3.4, we have added support for sparse matrices, which greatly reduce the memory footprint on GPUs and accelerated devices for many applications.
Sparse Data Structure
A sparse data structure is one where all the non-zero elements are not stored. Sparse matrices are useful when the number of zero-values elements are much greater than the number of non-zero elements (i.e. the sparsity of the matrix is high). A sparse data structure is generally stored as 3 arrays:
- A data or values array containing all the non-zero elements
- A vector for row indices (based on storage format)
- A vector for column indices (based on storage format)
There are many ways to store sparse matrices, the most prominent of which are:
- Compressed Sparse Row (CSR)
- Compressed Storage Column (CSC)
- Coordinate Format (COO)
ArrayFire supports CSR and COO format, with support for CSC coming in future releases.
A more detailed description of sparse matrices can be found on Wikipedia.
Sparse Arrays in ArrayFire
In ArrayFire, sparse arrays use the same user-facing object, af::array. This makes working with sparse arrays very seamless. We have added a function af::array::issparse() that allows users to check if an array is sparse or not. Although externally, there is no difference between a sparse and dense array, internally, the sparse array is stored as 3 independent arrays along with other metadata. These arrays, as well as other properties, can be fetched using the sparse functions.
In v3.4, the following features are supported for sparse arrays:
- Creation for sparse arrays.
- From 3 (data, row, column) sparse host or device memory vectors.
- From 3 (data, row, column) sparse af::arrays.
- From dense af::array.
- To create a sparse array from a host or device dense array, first create a dense af::array, and then pass it to create sparse af::array.
- Storage format conversions.
- Sparse-Dense Matrix Multiplication (SPMM/SPMV)
- Sparse Matrix – Dense Matrix
- Sparse Matrix – Dense Vector
- Options for Sparse Matrix to be transposed
The Sparse-Dense Matrix Multiplication is implemented within the same af::matmul() function used for dense arrays.
As with all ArrayFire features, sparse support is available across all backends (CUDA, OpenCL, and CPU) and supported devices and architectures.
Example
In this section, we showcase the difference in running a conjugate gradient solver with a dense array and a sparse array. The inputs are the same for each example. In the sparse case, the “A” matrix is converted to sparse.
#include#include using namespace af; static size_t dimension = 16 * 1024; static const int maxIter = 10; static const int sparsityFactor = 7; static array A; static array spA; // Sparse A static array x0; static array b; void setupInputs() { // Generate a random input: A array T = randu(dimension, dimension, f32); // Create 0s in input. // Anything that is no divisible by sparsityFactor will become 0. A = floor(T * 1000); A = A * ((A % sparsityFactor) == 0) / 1000; // Make it positive definite A = transpose(A) + A + A.dims(0)*identity(A.dims(0), A.dims(0), f32); // Make A sparse as spA spA = sparse(A); // Generate x0: Random guess x0 = randu(A.dims(0), f32); //Generate b b = matmul(A, x0); std::cout << "Sparsity of A = " << 100.f * (float)sparseGetNNZ(spA) / (float)spA.elements() << std::endl; std::cout << "Memory Usage of A = " << A.bytes() << " bytes" << std::endl; std::cout << "Memory Usage of spA = " << sparseGetValues(spA).bytes() + sparseGetRowIdx(spA).bytes() + sparseGetColIdx(spA).bytes() << " bytes" << std::endl; } void conjugateGradient(const af::array in) { array x = constant(0, b.dims(), f32); array r = b - matmul(in, x); array p = r; for (int i = 0; i < maxIter; ++i) { array Ap = matmul(in, p); array alpha_num = dot(r, r); array alpha_den = dot(p, Ap); array alpha = alpha_num/alpha_den; r = r - tile(alpha, Ap.dims())*Ap; x = x + tile(alpha, Ap.dims())*p; array beta_num = dot(r, r); array beta = beta_num/alpha_num; p = r + tile(beta, p.dims())*p; } array res = x0 - x; std::cout << "Final difference in solutions:\n"; af_print(dot(res, res)); } int main(int argc, char *argv[]) { af::info(); setupInputs(); std::cout << "Verifying Dense Conjugate Gradient:" << std::endl; conjugateGradient(A); std::cout << "Verifying Sparse Conjugate Gradient:" << std::endl; conjugateGradient(spA); return 0; }
Performance and Memory Footprint
Using the CUDA backend on a Tesla K40c, the conjugate gradient function took 64.94ms for the dense matrix and 39.31ms for the sparse Matrix. That is almost a 2x speedup!
With a sparsityFactor
of 7, the sparsity of A is about 26.38%. This means a tremendous amount of memory is conserved. For a floating point matrix of size 16M x 16M, a dense matrix uses 1 GB of memory while the sparse array, stored in CSR format, only uses 527 MB. That is a 2x savings on memory usage!
Matrix Elements | Memory Size (Dense) | Memory Size (Sparse) | CG Time (Dense) | CG Time (Sparse) |
---|---|---|---|---|
16M x 16M | 1024 MB | 527 MB | 64.94ms | 39.31ms |
4M x 4M | 64 MB | 33 MB | 4.752ms | 3.935ms |
Sparse arrays will now let users store more data on the GPU while allowing faster computation, thus allowing to solve for bigger, more complex problems.
Download
ArrayFire v3.4 can be downloaded from these locations:
Community
ArrayFire is continually improving through the addition of new sparse matrix enhancements. We welcome your feedback:
- General discussion forums on the ArrayFire Google Group
- Live discussion chat on the ArrayFire Gitter
- Issue reports on the ArrayFire GitHub
Finally, as you find success with ArrayFire, we invite you to contribute a post to this blog to share with the broader community. Email scott@arrayfire.com to contribute to this blog.