CUDA/OpenCL Training

CUDA & OpenCL Training ArrayFire offers up to four days of specialized GPU training in CUDA and OpenCL programming. Attendees will receive the latest industry knowledge and techniques for GPU computing in CUDA and OpenCL. We provide customized on-site training courses. We can travel to your location and provide a CUDA or OpenCL training course tailored to meet your application-specific needs. We also offer trainings at our Atlanta office. We recommend that attendees have a working knowledge of C/C++ in order to gain the most from the training courses. Contact us about a training Included in the course You provide the minds, and we'll take care of the rest. Each training comes with the following: Instruction by an excellent and ...

Feature detection on Xilinx FPGAs using OpenCL

Brian Kloppenborg ArrayFire Leave a Comment

Today at SuperComputing 2015, ArrayFire demonstrated its first FPGA-accelerated application running on a Xilinx FPGA using OpenCL. In May 2015, ArrayFire became a Xilinx Alliance member which provided us early access to the Xilinx SDAccel Development Environment within initial support for OpenCL. We helped Xilinx test and improve its OpenCL implementation and make it more accessible to first-time users. Over the last few months, we implemented several programs for the FPGA and subsequently learned a lot about optimizing OpenCL kernels to take advantage of pipelining, wide memory busses, and the various forms of configurable local memory offered on Xilinx hardware. Our first fully ported application is from computer vision: the FAST feature extractor. FAST finds corners in image. In the image below, the left side shows a frame from ...

Intel OpenCL performance: 3rd generation hardware

Brian Kloppenborg ArrayFire, OpenCL 1 Comment

Introduction With Intel CPUs making up nearly 80% of the CPU market and 66% of computers using integrated graphics one can easily argue that integrated graphics devices represent one of the greatest markets for GPU-accelerated computing. Here at ArrayFire, we have long recognized the potential of these devices and offer built-in support for Intel CPUs, GPUs, and AMD APUs in the OpenCL backend of our ArrayFire GPU computing library. Yet one common theme for debate in the office has been how the hardware performs on different operating systems with different drivers across hardware revisions. To answer these questions (and, perhaps, to win some intra-office geek cred) I decided to write a series of blog posts about Intel's GPU OpenCL performance. In this first installment I will compare the performance ...

OpenCL on Intel HD / Iris graphics on Linux

Brian Kloppenborg OpenCL 16 Comments

Under Windows and Mac the Intel GPU drivers include OpenCL support; however, on Linux OpenCL on Intel GPUs is implemented through an open source project called Beignet (pronnounced like "ben-yay", a type of French pastry akin to a what we would call a "fritter" in English). Below I have written a step-by-step guide on how you can get Beignet running on an Ubuntu 14.10 system which has an Intel 3rd, 4th, or 5th generation Intel processor. Instructions for other variants of Linux will be similar, except for the commands to install the prerequisite packages. There are several little caveats which need to be discussed up front. Foremost, the Beignet project supports the following hardware: 3rd Generation Intel Core Processors Intel ...

Templating and Caching OpenCL Kernels

Pradeep ArrayFire 2 Comments

About a month ago, one of my colleagues did a post on how to author the most concise OpenCL program using the C++ API provided by Khronos. In today's post, we shall further modify that example to achieve the following two goals. Enable the kernel to work with different integral data types out of the box Ensure that the kernels compile only once at run time per data type Let's dive into the details now. We can template the OpenCL kernels by passing a build option -D T="typename" to the kernel compilation step. To pass such options, we would need a construct that can give us a string literal that represents the corresponding integral type. Let us declare a struct ...

Accelerating Java using ArrayFire, CUDA and OpenCL

Pavan ArrayFire, JAVA 2 Comments

We have previously mentioned the ability to use ArrayFire through Java. In this post, we are going to show how you can get the best performance inside Java using ArrayFire for CUDA and OpenCL. Code Here is a sample code to perform Monte Caro Estimation of Pi.

The same code can be written using ArrayFire in the following manner. Array.randu(dims, Array.FloatType) creates a uniform random Array. Array.FloatType is passed in to create a uniform random array of 32 bit floating point numbers. Other types can include Array.FloatComplexType, Array.DoubleType and so on. Array.mul, Array.add and Array.lt perform element wise operations on the two operands to produce an output. Array.sumAll adds up all the elements in the array to produce ...

Generating PTX files from OpenCL code

Peter CUDA, OpenCL 2 Comments

Here at ArrayFire, we develop code that will work efficiently on both CUDA and OpenCL platforms. Therefore, it is not uncommon that CUDA code on NVIDIA GPUs will run faster than OpenCL. A very good way to understand what is behind the curtains is to generate the PTX file for both cases and compare them. In this post, we show how to generate PTX for both CUDA and OpenCL kernels. PTX stands for Parallel Thread eXecution, which is a low-level virtual machine and instruction set architecture (ISA). For those familiar with assembly language, the PTX instruction set is not really more complicated than a single thread assembly code, except that now we are thinking in massive parallel execution. Retrieving the PTX ...

OpenCL SPIR 2.0

Pavan OpenCL 1 Comment

At SIGGRAPH 2014 The Khronos Group announced, among other things, the SPIR 2.0 provisional specification. This release of SPIR (Standard Portable Intermediate Representation) follows the release of OpenCL 2.0 spec last year. In this post, we would like to offer our take on SPIR 2.0 and what it means to OpenCL developers. Feature Parity With OpenCL 2.0 SPIR 1.2 had feature parity with OpenCL 1.2. With 2.0, SPIR has feature parity with OpenCL 2.0. Here are a few new features that we find interesting. Generic Address Space "Where functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions ...

Quest for the Smallest OpenCL Program

Umar C/C++, OpenCL 8 Comments

I have heard many complaints about the verbosity of the OpenCL API. This claim in not unwarranted. The verbosity is due to the low level nature of OpenCL. It is written in the C programming language; the lingua franca of programming languages. While this allows you to run an OpenCL program on virtually any platform, it has some disadvantages. In a typical OpenCL program must: Query for the platform Get the device IDs from the platform Create a context from a set of device IDs Create a command queue from the context Create buffer objects for your data Transfer the data to the buffer Create and build a program from source Extract the kernels Launch the kernels Transfer the data to ...