Mobile devices are carving their niche into the world of computing with more processing power day by day. GPUs on mobile devices have been around for a while, but using them for accelerating computation is still quite new. Until recently, the only way to access the GPU was through OpenGL. Around december 2008, Khronos released OpenCL, a generic API for accelerating non-graphics tasks. OpenCL enables us to take advantage of acceleration hardware. Since it is an open standard, many hardware vendors provide support on their devices. With the recent release of Adreno and Mali SDKs, you can now run OpenCL code on mobile GPUs.
Today’s post is going to be about how to do image processing on camera feed on Android devices using OpenCL. To successfully complete this tutorial, you need to have basic knowledge of how to write OpenCL kernels and native code for Android.
OpenCL
To use OpenCL on your mobile device’s GPU, you must obtain the compatible libOpenCL.so library and the corresponding OpenCL headers.
I initially thought I could just use the C++ OpenCL header available from Khronos website for development, however that didn’t work. I faced compilation errors because of incompatible OpenGL headers. There are couple of ways to get the compatible headers. You can either use the headers provided along with the example application or extract them from Adreno/Mali SDK. For Adreno SDK, you can find the headers at <Adreno_SDK>/Development/Inc/CL
. For Mali SDK, you can find the headers at <MALI_SDK>/include/CL
.
For newer devices such as Samsung Galaxy Note 3 and Samsung Galaxy S5, it is very likely the library is already present on the mobile device. My test device is a Note 3 for which i found the library under /system/vendor/lib
. You can do adb pull /system/vendor/lib/libOpenCL.so ./
to get the file to your desktop machine. In these kind of cases, you might be able to use the library out of the box without any issues. For any other scenario, please use the directions in following sections to get the library and make it work if available.
Adreno GPU
If you do not find the library on your device, check if your device has OpenCL support on the corresponding vendor’s website. If your GPU does have OpenCL support, you might have to update your OS in order to get the recent updates. If you don’t find the library even after OS update, you might want to email the vendor. However, if your GPU does not have OpenCL support, there isn’t much we can do.
If you do find the library and can’t use for some reason. Try to update your device drivers using the ones provided by Qualcomm. You can download them from here. Ensure that you are downloading the Adreno driver but not the Adreno SDK. Follow the instructions in the package and update the drivers on your device. This should resolve any issues you have.
Mali GPU
Download the SDK from here. Follow the instructions in the package and build the shared library and copy it on to your device.
Kernels
I needed to do two things to achieve my target.
- Convert frame data provided by Android camera from NV21 format to RGBA8888 format.
- Run laplacian operator on the RGBA8888 image data.
Getting a grayscale image from NV21 format is cake walk, just read the first width*height bytes and you are done. As per the format of NV21, for each 2×2 sub-matrix of pixels, there will be a pair (v,u). Since 4 pixels are reading same (v,u), i decided to use shared memory to reduce reading latency. The code below is the core of this conversion kernel, that is expected to speed up the conversion if compared to a pure C code.
#define BLK_SIZE 16 //prefix D for decoding #define DSHRD_LEN (BLK_SIZE/2) #define DSHRD_SIZE (2*DSHRD_LEN*DSHRD_LEN) // ... // ... // kernel signature { __local uchar uvShrd[DSHRD_SIZE]; int gx = get_global_id(0); int gy = get_global_id(1); int lx = get_local_id(0); int ly = get_local_id(1); int off = im_width*im_height; int inIdx= gy*im_width+gx; int uvIdx= off + (gy/2)*im_width + (gx & ~1); // took me a while to figure out this formula int shlx = lx/2; int shly = ly/2; int shIdx= 2*(shlx+shly*DSHRD_LEN); if( gx%2==0 && gy%2==0 ) { uvShrd[shIdx+0] = in[uvIdx+0]; uvShrd[shIdx+1] = in[uvIdx+1]; } int y = (0xFF & ((int)in[inIdx])); int y = (0xFF & ((int)in[inIdx])); if( y < 16 ) y=16; barrier(CLK_LOCAL_MEM_FENCE); //..... rest of code that does color space conversion }
Laplacian kernel is pretty straightforward, read all eight neighbors pixel values using shared memory and compute the convolution result. You can find kernels source code here.
Auxilary Code
The JNI native function declarations can be seen in the file processor.h. I have included couple of extra Android related headers which are required to read bitmap files and print debug messages from JNI routines. The C-string pointed by macro app_name
can be used to prune the output of the command adb logcat
to get only messages that are associated with our JNI routines. To do that, you need to pass the value of app_name
to adb command as an option as shown below.
adb logcat -s "JNIProcessor"
This is quite useful when you are debugging problems that are related to accuracy of results.
The full source code for JNI native functions can be seen in the file processor.cpp. Following are brief descriptions of the Android/JNI functions I have used in this project.
AndroidBitmap_getInfo
- provides details like width, height, stride, format.AndroidBitmap_lockPixels
- gives us a pointer to the memory location of image data. Once lockPixels is called, it has to be followed by a unlockPixels at some point later in the code.GetPrimitiveArrayCritical
- as the name suggets, this is some kind of performance critical call. Hence, we should take care that the code amid a GetPrimitiveArrayCritical call and the corresponding ReleasePrimitiveArrayCritical call doesn't stall execution for extended periods of time. As per Oracle's Java documentation, a VM may temporarily disable garbage collection when the native code is holding a pointer to an array obtained via GetPrimitiveArrayCritical. So, just be careful when you use this function.ReleasePrimitiveArrayCritical
- used to release the memory space requested by previous GetPrimitiveArrayCritical. call.AndroidBitmap_unlockPixels
- used to release memory space used by a previous lockPixels call.
Listed below are some of the other auxilary functions that i needed to make the code look clean.
throwJavaException
- convenience function used to throw Java exceptions from JNI routines.cb
- callback function provided when OpenCL kernels are built. When you actually look at this, you might wonder why did i do this part alone using C-interface of OpenCL. I just wanted to try it out, hence it happened so.Java_com_example_LiveFeatureActivity_compileKernels
- as you might have noticed, this is a JNI routine i used to compile the OpenCL kernels only once during the app start up.
Build (on Linux)
Android-NDK provides a very useful tool ndk-build
to build native code for Android.
- Clone the git repo
git clone https://github.com/arrayfire/androidcl
cd androidcl
- Ensure android-ndk installed and environment variable
ANDROID_NDK
is pointing to ndk installation root. - Ensure environment variable
PATH
has the path$ANDROID_NDK
appended to it. - Run
ndk-build
from project root. - If you have
eclipse
installed, then import the project from eclipse and build it as usually. - If you do not have eclipse or don't want to use it, then I assume you might already know about apache-ant tool. Run
ant debug
Android.mk
There are couple of things that you need to do to use OpenCL on Android apart from standard JNI makefile instructions.
- Ensure that you set
LOCAL_ARM_MODE
toarm
, which enables you to build for 32-bit ARM architecture instead of the default 16-bit. - Ensure that the variable
LOCAL_CPPFLAGS
has been set to point to OpenCL headers. - Add
libOpenCL.so
dependency toLOCAL_LDLIBS
variable.
Click here to see full content of Android.mk.
Note: In case you want a static library, replace include $(BUILD_SHARED_LIBRARY)
with include $(BUILD_STATIC_LIBRARY)
.
Application.mk
This file is not mandatory, but you may need it in cases where you want specify your requirements about the system libraries you want to use in your application. In this project i have set APP_STL
to gnustl_static
to declare that i want to use the static version of gnustl. I have also set the Android API target version as 14 by setting APP_PLATFORM
to android-14
.
Click here to see full content of Application.mk.
App Installation
ant-users
For installing the app in your mobile device do ant debug install
. Should you choose to remove it after you are done playing with it, just do ant debug uninstall
.
eclipse-users
Right click the project in workspace and click the menu item "Run As -> Android Application".
You can see below the video of the app running live.
Stay tuned for more exciting developments as we continue to add more examples of OpenCL on Andriod!
Comments 11
It is still possible to use Opencl on Android 4.3 and latter . Then ; does opencl functionalities on emulator.
OpenCL only works if the hardware manufacturer provides the libOpenCL.so file on the devices. OpenCL functions will likely not work on an emulator.
Hi, i have some questions, please help:
– My device is Samsung S5, i got libOpenCL.so, but where i can file header files?
– I would like to use cmake to import OpenCL to my project, is it possible? (Because i also need another c++ library which is PCL Point Cloud Library)
Thanks in advance.
Hello,
As you might already see here(https://github.com/arrayfire/androidcl), we have had a make file based build framework so far. For ArrayFire 3.0 beta , we haven’t explored the options of building ArrayFire with CMake for Android yet but that is definitely in the TO DO list.
You might want to check https://github.com/taka-no-me/android-cmake. I haven’t played with it yet, but seems like a nice one. Hope that helps.
Regards,
Pradeep.
Hi,
Thanks for your reply.
I am new bee in OpenCL, so can you suggest me some documentations or good tutorials about it, please!
Your welcome.
You can start here( http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/introductory-tutorial-to-opencl/ ) for basic understanding on OpenCL terminology.
I understand that specification sheets can be daunting, but if you are interested definitely look at Sections 2 & 3 at chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf . That has the best explanation of what OpenCL is and what it can do.
I have a repository where i add examples whenever i have time. You can look into them at https://github.com/9prady9/CLGLInterop , but these are specific to OpenCL-OpenGL interoperability. You can always google for example code ๐
Regards,
Pradeep.
Thank you so much for sharing. Hope to get your supports in the future ๐
Thanks for your post.
I run this project on Mi2S, which is Qualcomm 8064 chipset.
And I have 2 questions:
1. CameraPreview will not call onPreviewFrame() automatically, so I add mCamera.setPreviewCallback(this); on SurfaceChanged just after setPreviewDisplay(), and it works.
2. in jni helper() function, when call
gQueue.enqueueNDRangeKernel(gNV21Kernel,
cl::NullRange,
cl::NDRange( (int)ceil((float)w/16.0f)*16,(int)ceil((float)h/16.0f)*16),
cl::NDRange(16,16),
NULL,
NULL);
it will get such error: @oclDecoder: clEnqueueNDRangeKernel -54
it’s CL_INVALID_WORK_GROUP_SIZE, the specified local workgroup size and number of workitems specified by global workgroup size is not evenly divisible by local workgroup size.
How can I fix this? Thanks.
Hi Xiaoyoung,
Did you try playing around with the local-group size itself ? For example, try using cl::NDRange(8, 8) instead of (16, 16). May be it is a device specific limit. Try querying the maximum work group size from the device, you can get that by asking for the value of ‘CL_DEVICE_MAX_WORK_GROUP_SIZE’ from the device in question.
Hope that helps.
Regards,
Pradeep.
Yes, I have tried it. However, when switch to laplacian option, the preview frame will be blurred.
How can you profile the OpenCL code and find bottlenecks in the GPU?