CUDA & OpenCL Training

We provide high-quality 2- or 4-day CUDA and OpenCL training courses, either at your location or at our office in Atlanta. Attendees will receive the latest industry knowledge and techniques for GPU computing in CUDA and OpenCL.

ArrayFire offers up to four days of specialized GPU training in CUDA and OpenCL programming. We provide customized on-site training courses where we can travel to your location and provide a CUDA or OpenCL training course tailored to meet your application-specific needs. We also offer individual trainings at our Atlanta office.

We recommend that attendees have a working knowledge of C/C++ in order to gain the most from the training courses. Groups can register using the button below. Individuals can register for the next quarterly 2-day training session.

Talk to Us to Register a Group Training Course

(or for single individuals, register for an upcoming training session below)


"Can't ask for better individualized instruction than the environment I was fortunate enough to encounter. The instructor was able to completely focus on my particular needs and concerns."

-Brian Rapp, US Army Research Lab

Included in All Courses

You provide the minds, and we'll take care of the rest. Each training comes with the following:

  • Instruction by an excellent and interesting expert
  • Hands-on exercises
  • Use of a laptop with CUDA and OpenCL capable GPUs and CPUs
  • Choice of Linux or Windows operating system
  • Printed manual of lecture material
  • Electronic copy of programming exercises

CUDA and OpenCL Training Syllabus

* Courses are taught in either CUDA or OpenCL. Similar principles apply in each framework.

Day 1, Introduction


  • GPU Computing Overview
  • The Programming Model
  • Basic Dataset Mapping
  • Techniques
  • Libraries, ArrayFire
  • Profiling Tools


  • A Simple Kernel
  • Equivalent ArrayFire Example
  • Using Libraries
  • Monte Carlo Pi Estimation
  • Timing and ArrayFire
  • Debugging Code
Day 2, Optimization


  • Architecture: Grids, Blocks, and Threads
  • Memory Model: Global, Shared, and Constant Memory
  • Advanced Mapping Techniques
  • Streams: Asynchronos Launches and Concurrent Execution
  • ArrayFire: Lazy Evaluation and Code Vectorization


  • Matrix Transpose
  • Optimization Using Shared Memory
  • Median Filter
  • Optimization Using Constant Memory
  • Stream Example
  • ArrayFire Example: Nearest Neighbor Algorithm
Day 3, Multi-GPU

Lectures (customizable):

  • Multi-GPU Use Cases
  • Multi-GPUs: Contexts
  • Existing Libraries
  • Scaling Across Multiple GPUs


  • Out of Core Problems: Matrix Multiply
  • Task Level Parallelism: Optimization
  • ArrayFire Multi-GPU
Day 4, Algorithm Problems

Lectures and Practice (customizable):

  • Reductions
  • Scan Algorithms
  • Sort
  • Convolution
  • Customer-Specific Problem

Xilinx SDAccel Training

In addition to CUDA & OpenCL training, we offer training for Xilinx SDAccel. ArrayFire is the exclusive Xilinx SDAccel Authorized Training Partner (ATP) for North America. Our SDAccel training courses help enable design teams to leverage Xilinx FPGAs for OpenCL application acceleration.

Course Name:  "Developing and Optimizing Applications Using the OpenCL Framework for FPGAs"

Talk to Us to Register an SDAccel Training

(or email us at with any questions)


Individual 2-Day Course

For single individuals, we provide an online 2-day CUDA-only training course once a quarter following Days 1 & 2 syllabus for CUDA training shown above.

Upcoming Dates

  • Q4:  December 14-15, 2021
  • Q1:  March 22-23, 2022
  • Q2:  June 21-22, 2022
  • Q3:  September 27-28, 2022

There are limited number of spots in each course, so reserve your spot as soon as possible.

Purchase Individual 2-Day CUDA Course