Generating PTX files from OpenCL code

Peter EntschevCUDA, OpenCL 2 Comments

Here at ArrayFire, we develop code that will work efficiently on both CUDA and OpenCL platforms. Therefore, it is not uncommon that CUDA code on NVIDIA GPUs will run faster than OpenCL. A very good way to understand what is behind the curtains is to generate the PTX file for both cases and compare them. In this post, we show how to generate PTX for both CUDA and OpenCL kernels. PTX stands for Parallel Thread eXecution, which is a low-level virtual machine and instruction set architecture (ISA). For those familiar with assembly language, the PTX instruction set is not really more complicated than a single thread assembly code, except that now we are thinking in massive parallel execution. Retrieving the PTX …

OpenCL SPIR 2.0

Pavan YalamanchiliOpenCL 1 Comment

At SIGGRAPH 2014 The Khronos Group announced, among other things, the SPIR 2.0 provisional specification. This release of SPIR (Standard Portable Intermediate Representation) follows the release of OpenCL 2.0 spec last year. In this post, we would like to offer our take on SPIR 2.0 and what it means to OpenCL developers. Feature Parity With OpenCL 2.0 SPIR 1.2 had feature parity with OpenCL 1.2. With 2.0, SPIR has feature parity with OpenCL 2.0. Here are a few new features that we find interesting. Generic Address Space “Where functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions …

Image editing using ArrayFire: Part 3

Pradeep GarigipatiArrayFire, C/C++, CUDA, Image Processing, OpenCL 1 Comment

Today, we will be doing the third post in our series Image editing using ArrayFire. References to old posts are available below. * Part 1 * Part 2 In this post, we will be looking at the following operations. Image Histogram Simple Binary Theshold Otsu Threshold Iterative Threshold Adaptive Binary Threshold Emboss Filter Today’s post will be mostly dominated by different types of threshold operations we can achieve using ArrayFire. Image Histogram We have a built-in function in ArrayFire that creates a histogram. The input image was converted to gray scale before histogram calculation as our histogram implementation works for vector and 2D matrices only. In case, you need histogram for all three channels of a color image, you can …

Quest for the Smallest OpenCL Program

Umar ArshadC/C++, OpenCL 8 Comments

I have heard many complaints about the verbosity of the OpenCL API. This claim is not unwarranted. The verbosity is due to the low-level nature of OpenCL. It is written in the C programming language; the lingua franca of programming languages. While this allows you to run an OpenCL program on virtually any platform, it has some disadvantages. A typical OpenCL program must: Query for the platform Get the device IDs from the platform Create a context from a set of device IDs Create a command queue from the context Create buffer objects for your data Transfer the data to the buffer Create and build a program from source Extract the kernels Launch the kernels Transfer the data to the host …

ArrayFire Capability Update – July 2014

Oded GreenAndroid, ArrayFire, C/C++, CUDA, Fortran, Java, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD’s GPUs, CPUs and APUs Intel’s CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such …

Remote Off-Screen Rendering with OpenGL

Shehzan MohammedArrayFire, OpenGL 18 Comments

At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display. In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display. A few notes before we get started. This blog is limited to computers running distributions of Linux. The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display). AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices. Requirements You will need access to the remote …

OpenCL on Mobile Devices

Pavan YalamanchiliAndroid, OpenCL 6 Comments

While Google has openly displayed its opposition to OpenCL, many hardware manufacturers seem to be putting their weight behind OpenCL. Qualcomm, ARM, Imagination and Vivante support OpenCL on their GPUs. Android Phone manufacturers – Samsung, HTC, Sony and Amazon – ship OpenCL drivers and libraries on their latest generation of devices. Considering the prevalence of OpenCL on shipped devices, it is likely most Renderscript implementations have an OpenCL backend. To consolidate a list of OpenCL supported Android devices, we created a publicly accessable Google document seen below. If you have an Android phone that is not listed, we’d appreciate it if you contributed to the list. To test if OpenCL is supported on your phone, you can use OpenCL Info …

Image Processing Benchmarks on NVIDIA Jetson TK1

Pradeep GarigipatiArrayFire, Benchmarks, CUDA 7 Comments

In this post we will be looking at benchmarks of the following ArrayFire image processing functions on an ARM device. Erosion/Dilation Median filter Resize Histogram Bilateral filter Convolution We pitted the brand new compute 3.2 GPU on NVIDIA Jetson TK1 against a mobile NVIDIA GPU. The closest match to the GPU (from here on referred as TK1) on the Jetson board we have in our mobile card deck is a NVIDIA GT 650M. The GPU device properties that have critical effect on the function performance are listed below. Property Name / Device Name Jetson TK1 GK20A GT 650M Compute 3.2 3.0 Number of multiprocessors 1 2 Cores 192 384 Base clock rate 852 MHz 950 MHz Total global memory 1746 …

Custom Kernels with ArrayFire

Pavan YalamanchiliArrayFire, C/C++, CUDA, OpenCL Leave a Comment

As extensive as ArrayFire is, there are a few cases where you are still working with custom CUDA or OpenCL kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. In this post we are going to talk about how to integrate your custom kernels into ArrayFire in a seamless fashion. In and Out of ArrayFire First let us look at the following code and then break it down bit by bit. int main() { af::array x = af::randu(num, 1); af::array y = af::array(num, 1); float *d_x = x.device(); float *d_y = y.device(); af::sync(); launch_simple_kernel(d_y, d_x, num); x.unlock(); y.unlock(); float err = …

https://www.youtube.com/watch?v=ZQVzXaOWSZ0

In Case you Missed it: ArrayFire Joint Webinar with AMD

Oded GreenEvents, OpenCL Leave a Comment

ArrayFire recently gave two webinar presentations to OpenCL developers as part of a joint webinar series with AMD. Due to popular demand for the first webinar, we ended up presenting a second! In case you missed it, here’s a recording of the webinar complete with the presentation and an informative Q&A session:  http://bit.ly/SkzIJs This webinar focused on enhancing productivity by using existing OpenCL libraries while achieving a high level of performance and maximizing system utilization. We demonstrated how our ArrayFire software library offers simple GPU programming with the benefit of awesome performance. In the webinar we showed how to use several image processing and computer vision building blocks in less than 3 lines of code.  The immediate takeaway message of …