ArrayFire-OpenGL Interop using CUDA

Shehzan ArrayFire, CUDA, OpenGL Leave a Comment

A lot of ArrayFire users have been interested in the usage of ArrayFire in partnership with OpenGL for graphics computation. In the long run, we do plan to expand further on the interoperablilty and make it easier through ArrayFire. For now, we have developed a small example to expand on the usage of the CUDA-OpenGL interop API to assist in the interop operations between ArrayFire and OpenGL.

Some of the advantage of direct ArrayFire-OpenGL interop are:

  • Faster data transfers: Since the OpenGL buffers as well as ArrayFire data reside on the GPU, we can use a direct device to device copy rather than using the CPU as an intermediate and the relatively slow PCIe interface.
  • Offscreen rendering: It is commonly known that rendering to screen is a slow process. Using the API shown in this example, we can render directly of off screen framebuffers and retrieve the data in ArrayFire.
  • Transform feedback: This example also shows the usage of Transform feedback buffers in OpenGL. Transform feedback buffers allow the user to retrieve data from in-between the rendering pipeline, for example getting back the transformed vertices after the operations in vertex shaders.

Since a lot of data transfers to CPU are eliminated, we can do a lot more processing faster.

Let us now discuss the example. This example is relatively simple and renders a single triangle.

Code Walkthrough

The code for this example can be found at our github page.

The code relies upon the following libraries:

  • ArrayFire 2.0
  • CUDA-OpenGL Interop (CUDA 5.5)
  • GLFW 3
  • GLEW (1.x, latest version recommended)

Let us walk through the example step by step. Please note that any steps that do not contribute to the interop and are just common OpenGL code will not be covered in this blog.


This section initializes ArrayFire and create arrays for the basic data structure. Since ArrayFire stores data as column major, we decided that the arrays will have each vertex as one column, ie. if you had an array of 10 vertices with 3 coordinates each, then your array would have dimensions 3x10.

Most of the array names are self explanatory. pkd_image is used to store the packed image data returned by OpenGL through the framebuffer. af_image is used to store the returned image in the ArrayFire format (unpacked). af_project stores the result of the projection returned after the vertex shader stage (using transform feedback). The size of project is same as the vertex buffer.

The onscreen flag is a useful one. When it is set to true, the render happens on the GLFW window. Although not a part of the code, it is possible to retrieve the rendered image back into ArrayFire. However, turning this flag to false results in offscreen rendering. This means rendering is done directly to a software framebuffer rather than to the screen, hence it is faster. It also allows the rendered image to be retrieved back into ArrayFire. If you wish to display this image then the ArrayFire image() function can be used (will be discussed a little later).

init_glfw is used to initialize the GLFW and GLEW context.

Initialize Buffers

These lines are an abstraction to create the OpenGL buffers and bind them to their respective CUDA Graphics Resource object.
This section of code shows the creation and registering of the OpenGL buffer with CUDA. The buffer generation and bindings are done exactly as in regular OpenGL. The cudaGraphicsGLRegisterBuffer registers the buffer with CUDA. If using a texture or renderbuffer or the like, then use cudaGraphicsGLRegisterImage.
In the code above, the framebuffer and depthbuffer are initialized which can now be used for off screen rendering.

Framebuffer, Transform Feedback and Shader Loading

This section of the code is used to initialize depth and framebuffers for off screen rendering.  It also loads and compiles the shaders and initializes the transform feedback buffers.

Render Loop

GLFW allows the user to control to render loop. So we use a simple while loop with a GLFW terminating condition.

Copy Data to OpenGL Buffers

In the render loop, we first copy the vertex and color data to their respective OpenGL buffers. In the copy_from_device_pointer function, we have:
In this function, we first "map" the cuda resource of the buffer using the cudaGraphicsMapResources function, ie. enable the buffer for access by CUDA. This allows CUDA to copy memory to and from the buffer. Once completed, we must unmap the resources.

Other Memory Copying Notes

If using a renderbuffer or texture, the memory needs to be copied using CUDA textures of cudaArrays and then using the cudaMemcpy(To/From)Array function. If using other buffers like GL_ARRAY_BUFFER, we can simply get the pointer using the cudaGraphicsResourceGetMappedPointer function and then copy it using the cudaMemcpyDeviceToDevice flag.

Copying from OpenGL buffers to CUDA is also done in the same way. See the copy_to_device_pointer() function in the code.

In the code that follows the memory copies, we call the render function which renders the data to screen or framebuffer. It uses relatively well know OpenGL calls and hence I will not be covering it in the blog. Any questions can be asked on the ArrayFire forums. Once the rendering is complete, we copy the transform feedback memory and the framebuffer memory back to ArrayFire.

On exit, delete the buffers and exit.

Compile and Run


On Linux, the Makefile is provided. In the Makefile, please update the paths to ArrayFire (AF_PATH), CUDA and GLFW (GLFW_LIB_PATH).

As is, the makefile assumes that CUDA, GLFW and GLEW is installed in the default locations.

To compile and run:


On Windows, you can create a CUDA project in Visual Studio. Then add the locations of the include and library directories of ArrayFire, GLFW and GLEW in the project properties. The code is OS-independent and should run without any modifications.


The output, in case of onscreen rendering should be a rotating triangle. A sample screenshot is shown below.

ArrayFire CUDA OpenGL Interop

Here are some useful resources to look up:

Please feel free to ask any questions about this interop example on the ArrayFire forums. We are working on a larger, more extensible interop code that we will release upon request.

Known issues: The ArrayFire graphics API is known to not work as intended when using external OpenGL context. It is advised that visualizations be limited. If they still do not work, the saveimage function can be a fallback, albeit a last resort fallback.

Github Link