We have previously mentioned the ability to use ArrayFire through Java.
In this post, we are going to show how you can get the best performance inside Java using ArrayFire for CUDA and OpenCL.
Code
Here is a sample code to perform Monte Caro Estimation of Pi.
import java.util.Random; // Native Java Code public static double hostCalcPi(int size) { Random rand = new Random(); int count = 0; for (int i = 0; i < size; i++) { float x = rand.nextFloat(); float y = rand.nextFloat(); boolean lt1 = (x * x + y * y) < 1; if (lt1) count++; } return 4.0 * ((double)(count)) / size; }
The same code can be written using ArrayFire in the following manner.
import com.arrayfire.Array; // ArrayFire through Java public static double deviceCalcPi(int size) throws Exception { Array x = null, y = null, res = null; try { int[] dims = new int[] {size, 1}; x = Array.randu(dims, Array.FloatType); y = Array.randu(dims, Array.FloatType); x = Array.mul(x, x); y = Array.mul(y, y); res = Array.add(x , y); res = Array.lt(res, 1); double count = Array.sumAll(res); return 4.0 * ((double)(count)) / size; } finally { if (x != null) x.close(); if (y != null) y.close(); if (res != null) res.close(); } }
Array.randu(dims, Array.FloatType)
creates a uniform random Array.Array.FloatType
is passed in to create a uniform random array of 32 bit floating point numbers.-
Other types can include
Array.FloatComplexType
,Array.DoubleType
and so on. -
Array.mul
,Array.add
andArray.lt
perform element wise operations on the two operands to produce an output. -
Array.sumAll
adds up all the elements in the array to produce a scalar output. x.close()
,y.close()
andres.close()
are necessary in thefinally
section.- This ensures that the unnecessary memory is released when you are exiting the function
-
This is because the Java garbage collector may not control the device being used by ArrayFire.
Performance
Using ArrayFire CUDA in Java, The NVIDIA K5000 is 13x faster than the native Java code on an Intel core i7 3770k CPU.
ArrayFire v2.1 (CUDA, 64-bit Linux, build acac88d) License: Standalone (/home/pavan/.arrayfire.lic) Addons: MGL4, DLA, SLA Platform: CUDA toolkit 6.0, Driver: 340.24 [0]: Quadro K5000, 4096 MB, CUDA Compute 3.0 1 : GeForce GTX 750, 1024 MB, CUDA Compute 5.0 Compute Device: [0], Display Device: [0] Memory Usage: 3472 MB free (4096 MB total) Results from host: 3.142604 Results from device: 3.1425944 Time taken for host (ms): 105.5 Time taken for device (ms): 8.21 Speedup: 13
Using ArrayFire OpenCL in Java, the same CPU is 7x faster than the native Java implementation.
ArrayFire v2.1 (OpenCL, 64-bit Linux, build acac88d) License: Standalone (/home/pavan/.arrayfire.lic) Addons: MGL4, DLA, SLA Platform: Intel(R) OpenCL, Driver: 1.2.0.44 [0]: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz, 7946 MB, OpenCL Version: 1.2 Results from host: 3.1416 Results from device: 3.1417456 Time taken for host (ms): 105.1 Time taken for device (ms): 14.5 Speedup: 7
The AMD HD 7970 is 14x faster than native Java using ArrayFire OpenCL.
ArrayFire v2.1 (OpenCL, 64-bit Linux, build fd32605) License: Standalone (/home/pavan/.arrayfire.lic) Addons: MGL4, DLA, SLA Platform: AMD Accelerated Parallel Processing, Driver: 1214.3 (VM) [0]: Tahiti, 2907 MB, OpenCL Version: 1.2 Results from host: 3.1411656 Results from device: 3.1417456 Time taken for host (ms): 102.8 Time taken for device (ms): 7.1 Speedup: 14
Remarks
ArrayFire for Java is a work in progress. You'll need Java 7 or higher to use ArrayFire through Java. We are trying to add more functionality and documentation in the coming weeks. You can find our Java Wrapper for ArrayFire over here.
If you need help accelerating your Java code using ArrayFire, please contact us at technical@arrayfire.com
Comments 3
Pingback: Accelerating Java using ArrayFire, CUDA, and OpenCL
Pingback: Accelerating Java using ArrayFire, CUDA and Ope...
Why do you think the NVidia card is so much slower using OpenCL compared to CUDA – just NVidia’s crappy driver?
Will Arrayfire transparently select CUDA.OpenCL backend according to GPU vendor?