Accelerating Java using ArrayFire, CUDA and OpenCL

Pavan Yalamanchili ArrayFire, Java 3 Comments

We have previously mentioned the ability to use ArrayFire through Java.

In this post, we are going to show how you can get the best performance inside Java using ArrayFire for CUDA and OpenCL.

Code

Here is a sample code to perform Monte Caro Estimation of Pi.

import java.util.Random;
// Native Java Code
public static double hostCalcPi(int size) {

    Random rand = new Random();
    int count = 0;

    for (int i = 0; i < size; i++) {
        float x = rand.nextFloat();
        float y = rand.nextFloat();
        boolean lt1 = (x * x + y * y) < 1;
        if (lt1) count++;
    }

    return 4.0 * ((double)(count)) / size;
}

The same code can be written using ArrayFire in the following manner.

import com.arrayfire.Array;

// ArrayFire through Java
public static double deviceCalcPi(int size) throws Exception {

    Array x = null, y = null, res = null;

    try {

        int[] dims =  new int[] {size, 1};
        x = Array.randu(dims, Array.FloatType);
        y = Array.randu(dims, Array.FloatType);

        x = Array.mul(x, x);
        y = Array.mul(y, y);

        res = Array.add(x , y);
        res = Array.lt(res, 1);
        double count = Array.sumAll(res);
        return 4.0 * ((double)(count)) / size;

    } finally {
        if (x != null) x.close();
        if (y != null) y.close();
        if (res != null) res.close();
    }
}
  • Array.randu(dims, Array.FloatType) creates a uniform random Array.
    • Array.FloatType is passed in to create a uniform random array of 32 bit floating point numbers.
    • Other types can include Array.FloatComplexType, Array.DoubleType and so on.

  • Array.mul, Array.add and Array.lt perform element wise operations on the two operands to produce an output.

  • Array.sumAll adds up all the elements in the array to produce a scalar output.

  • x.close(), y.close() and res.close() are necessary in the finally section.
    • This ensures that the unnecessary memory is released when you are exiting the function
    • This is because the Java garbage collector may not control the device being used by ArrayFire.

Performance

Using ArrayFire CUDA in Java, The NVIDIA K5000 is 13x faster than the native Java code on an Intel core i7 3770k CPU.

ArrayFire v2.1 (CUDA, 64-bit Linux, build acac88d)
License: Standalone (/home/pavan/.arrayfire.lic)
Addons: MGL4, DLA, SLA
Platform: CUDA toolkit 6.0, Driver: 340.24
[0]: Quadro K5000, 4096 MB, CUDA Compute 3.0
 1 : GeForce GTX 750, 1024 MB, CUDA Compute 5.0
 Compute Device: [0], Display Device: [0]
 Memory Usage: 3472 MB free (4096 MB total)

Results from host: 3.142604
Results from device: 3.1425944

Time taken for host (ms): 105.5
Time taken for device (ms): 8.21
Speedup: 13

Using ArrayFire OpenCL in Java, the same CPU is 7x faster than the native Java implementation.

ArrayFire v2.1 (OpenCL, 64-bit Linux, build acac88d)
License: Standalone (/home/pavan/.arrayfire.lic)
Addons: MGL4, DLA, SLA
Platform: Intel(R) OpenCL, Driver: 1.2.0.44
[0]: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz, 7946 MB, OpenCL Version: 1.2

Results from host: 3.1416
Results from device: 3.1417456
Time taken for host (ms): 105.1
Time taken for device (ms): 14.5
Speedup: 7

The AMD HD 7970 is 14x faster than native Java using ArrayFire OpenCL.

ArrayFire v2.1 (OpenCL, 64-bit Linux, build fd32605)
License: Standalone (/home/pavan/.arrayfire.lic)
Addons: MGL4, DLA, SLA
Platform: AMD Accelerated Parallel Processing, Driver: 1214.3 (VM)
[0]: Tahiti, 2907 MB, OpenCL Version: 1.2

Results from host: 3.1411656
Results from device: 3.1417456
Time taken for host (ms): 102.8
Time taken for device (ms): 7.1
Speedup: 14

Remarks

ArrayFire for Java is a work in progress. You'll need Java 7 or higher to use ArrayFire through Java. We are trying to add more functionality and documentation in the coming weeks. You can find our Java Wrapper for ArrayFire over here.

If you need help accelerating your Java code using ArrayFire, please contact us at technical@arrayfire.com

Comments 3

  1. Pingback: Accelerating Java using ArrayFire, CUDA, and OpenCL

  2. Pingback: Accelerating Java using ArrayFire, CUDA and Ope...

  3. Why do you think the NVidia card is so much slower using OpenCL compared to CUDA – just NVidia’s crappy driver?

    Will Arrayfire transparently select CUDA.OpenCL backend according to GPU vendor?

Leave a Reply

Your email address will not be published.