CUDA and OpenCL Training

We provide high-quality 2- or 4-day CUDA™ and OpenCL training courses.

Since we specialize solely in CUDA and OpenCL work, we can uniquely immerse students in GPU and heterogeneous computing. Students of our courses walk away proficient at programming CUDA or OpenCL, receive the latest industry knowledge and techniques for GPU computing, and learn the tricks to maximize performance from heterogeneous computing devices.

For groups, we either travel to your location, host in our Atlanta office, or train remotely via video conference, tailoring our instruction to meet your application-specific needs.

We recommend that attendees have a working knowledge of C/C++ for a fruitful learning experience.

Talk to Us to Register a Group Training Course

individual students, register for an upcoming training session

"Can't ask for better individualized instruction than the environment I was fortunate enough to encounter. The instructor was able to completely focus on my particular needs and concerns."
-Brian Rapp, US Army Research Lab

Included in All Courses

You provide the minds, and we'll take care of the rest. Each training comes with the following:

Instruction by an excellent and exciting expert
Hands-on exercises
Use of a laptop with CUDA and OpenCL capable GPUs and CPUs
Choice of Linux or Windows operating system
Printed manual of lecture material
Electronic copy of programming exercises

CUDA and OpenCL Training Syllabus

* Courses are taught in either CUDA or OpenCL. Similar principles apply in each framework.

Day 1, Introduction

Lectures:

GPU Computing Overview
The Programming Model
Basic Dataset Mapping
Techniques
Libraries, ArrayFire
Profiling Tools

Practice:

A Simple Kernel
Equivalent ArrayFire Example
Using Libraries
Monte Carlo Pi Estimation
Timing and ArrayFire
Debugging Code

Day 2, Optimization

Lectures:

Architecture: Grids, Blocks, and Threads
Memory Model: Global, Shared, and Constant Memory
Advanced Mapping Techniques
Streams: Asynchronos Launches and Concurrent Execution
ArrayFire: Lazy Evaluation and Code Vectorization

Practice:

Matrix Transpose
Optimization Using Shared Memory
Median Filter
Optimization Using Constant Memory
Stream Example
ArrayFire Example: Nearest Neighbor Algorithm

Day 3, Multi-GPU

Lectures (customizable):

Multi-GPU Use Cases
Multi-GPUs: Contexts
Existing Libraries
Scaling Across Multiple GPUs

Practice:

Out of Core Problems: Matrix Multiply
Task Level Parallelism: Optimization
ArrayFire Multi-GPU

Day 4, Algorithm Problems

Lectures and Practice (customizable):

Reductions
Scan Algorithms
Sort
Convolution
Customer-Specific Problem