Train your team

Attendees will receive the latest industry knowledge and techniques for GPU computing in CUDA and OpenCL.

ArrayFire offers up to four days of specialized GPU training in CUDA and OpenCL programming. We provide customized on-site training courses for customers that have 3 or more attendees from their organization. We can travel to your location and provide a CUDA or OpenCL training course tailored to meet your application-specific needs. We also offer individual trainings at our Atlanta office.

We recommend that attendees have a working knowledge of C/C++ in order to gain the most from the training courses.

Contact us about a training


Included in the course

You provide the minds, and we'll take care of the rest. Each training comes with the following:

  • Instruction by an excellent and interesting expert
  • Hands-on exercises
  • Use of a laptop with CUDA and OpenCL capable GPUs and CPUs
  • Choice of Linux or Windows operating system
  • Printed manual of lecture material
  • Electronic copy of programming exercises
Go ahead, contact us about a training
One can't ask for better individualized instruction than the environment I was fortunate enough to encounter. The instructor was able to completely focus on my particular needs and concerns.

—Brian Rapp,  U.S. Army Research Lab

Training Syllabus

Day 1, Introduction
GPU Computing Overview
The Programming Model
Basic Dataset Mapping Techniques
Libraries, ArrayFire
Profiling Tools

A Simple Kernel
Equivalent ArrayFire Example
Using Libraries
Monte Carlo Pi Estimation
Timing and ArrayFire
Debugging Code

Day 3, Multi-GPU
Lectures (customizable):
Multi-GPU Use Cases
Multi-GPUs: Contexts
Existing Libraries
Scaling Across Multiple GPUs

Out of Core Problems: Matrix Multiply
Task Level Parallelism: Optimization
ArrayFire Multi-GPU

Day 2, Optimization
Architecture: Grids, Blocks, and Threads
Memory Model: Global, Shared, and Constant Memory
Advanced Mapping Techniques
Streams: Asynchronos Launches and Concurrent Execution
ArrayFire: Lazy Evaluation and Code Vectorization

Matrix Transpose
Optimization Using Shared Memory
Median Filter
Optimization Using Constant Memory
Stream Example
ArrayFire Example: Nearest Neighbor Algorithm

Day 4, Algorithm Problems

Lectures and Practice (customizable):
Scan Algorithms
Customer-Specific Problem

Okay, now contact us about a training