Templating and Caching OpenCL Kernels

Pradeep ArrayFire 2 Comments

About a month ago, one of my colleagues did a post on how to author the most concise OpenCL program using the C++ API provided by Khronos. In today's post, we shall further modify that example to achieve the following two goals.

  1. Enable the kernel to work with different integral data types out of the box
  2. Ensure that the kernels compile only once at run time per data type

Let's dive into the details now.

We can template the OpenCL kernels by passing a build option -D T="typename" to the kernel compilation step. To pass such options, we would need a construct that can give us a string literal that represents the corresponding integral type. Let us declare a struct with static method getName and add template specializations for the types we want our kernel to handle. For our example, let's add specializations for int, float and unsigned int. The entire code snippet with struct declaration and template specializations should look like the following:

Our next step is to abstract out the OpenCL related code into a function, addVectors. Given the following function signature for addVectors,

the program's main body will look like the following:

Now, we are ready to work on the code that compiles our kernel source and enqueues it on the device. We shall use the C++11 feature std::call_once to ensure single run time kernel compilation per data type. The body of the function addVectors will look like the following:

That is all that needs to be done for using the same kernel source with different data types and compiling them only once at run time per data type. The complete code sample is available here. This sample is merely an example on how you can parametrize kernel source with respect to types; more complicated algorithms would require further modifications to this sample.