CUDA and OpenCL Training
We provide high-quality 2- or 4-day CUDA™ and OpenCL training courses.
Since we specialize solely in CUDA and OpenCL work, we can uniquely immerse students in GPU and heterogeneous computing. Students of our courses walk away proficient at programming CUDA or OpenCL, receive the latest industry knowledge and techniques for GPU computing, and learn the tricks to maximize performance from heterogeneous computing devices.
For groups, we either travel to your location, host in our Atlanta office, or train remotely via video conference, tailoring our instruction to meet your application-specific needs. For individuals, we offer 2-day training quarterly.
We recommend that attendees have a working knowledge of C/C++ for a fruitful learning experience.
"Can't ask for better individualized instruction than the environment I was fortunate enough to encounter. The instructor was able to completely focus on my particular needs and concerns."
Included in All Courses
You provide the minds, and we'll take care of the rest. Each training comes with the following:
- Instruction by an excellent and exciting expert
- Hands-on exercises
- Use of a laptop with CUDA and OpenCL capable GPUs and CPUs
- Choice of Linux or Windows operating system
- Printed manual of lecture material
- Electronic copy of programming exercises
CUDA and OpenCL Training Syllabus
* Courses are taught in either CUDA or OpenCL. Similar principles apply in each framework.
Day 1, Introduction
Lectures:
- GPU Computing Overview
- The Programming Model
- Basic Dataset Mapping
- Techniques
- Libraries, ArrayFire
- Profiling Tools
Practice:
- A Simple Kernel
- Equivalent ArrayFire Example
- Using Libraries
- Monte Carlo Pi Estimation
- Timing and ArrayFire
- Debugging Code
Day 2, Optimization
Lectures:
- Architecture: Grids, Blocks, and Threads
- Memory Model: Global, Shared, and Constant Memory
- Advanced Mapping Techniques
- Streams: Asynchronos Launches and Concurrent Execution
- ArrayFire: Lazy Evaluation and Code Vectorization
Practice:
- Matrix Transpose
- Optimization Using Shared Memory
- Median Filter
- Optimization Using Constant Memory
- Stream Example
- ArrayFire Example: Nearest Neighbor Algorithm
Day 3, Multi-GPU
Lectures (customizable):
- Multi-GPU Use Cases
- Multi-GPUs: Contexts
- Existing Libraries
- Scaling Across Multiple GPUs
Practice:
- Out of Core Problems: Matrix Multiply
- Task Level Parallelism: Optimization
- ArrayFire Multi-GPU
Day 4, Algorithm Problems
Lectures and Practice (customizable):
- Reductions
- Scan Algorithms
- Sort
- Convolution
- Customer-Specific Problem
Xilinx SDAccel Training
In addition to CUDA & OpenCL training, we offer training for Xilinx SDAccel. ArrayFire is North America's exclusive Xilinx SDAccel Authorized Training Partner (ATP). Our SDAccel training courses help design teams leverage Xilinx FPGAs for OpenCL application acceleration.
Course Name: "Developing and Optimizing Applications Using the OpenCL Framework for FPGAs"
Individual 2-Day Course
For single individuals, we provide an online 2-day CUDA-only training course once a quarter, following the Days 1 and 2 syllabi, as shown above.
Upcoming Dates
- Q2: June 11-12, 2024
- Q3: September 10-11, 2024
- Q4: December 10-11, 2024
Each course has limited spots, so reserve your place as soon as possible by emailing us at sales@arrayfire.com.