Triangle Counting in Graphs on the GPU (Part 2)

Oded Benchmarks, CUDA 2 Comments

A while back I wrote a blog on triangle counting in networks using CUDA (see Part 1 for the post). In this post, I will cover in more detail the internals of the algorithm and the CUDA implementation. Before I take a deep dive into the details of the algorithm, I want to remind the reader that there are multiple ways for finding triangles in a graph. Our approach is based off the intersection of two adjacency lists and finding the common elements in both those lists. Two additional approaches would simply be to compare all the possible node-triplets, either in the graph or via matrix multiplication of the incidence array. The latter of these two approaches can be computationally ...

Demystifying PTX Code

Peter C/C++, CUDA, OpenCL 3 Comments

In my recent post, I showed how to generate PTX files from both CUDA and OpenCL kernels. In this post I will address the issue of how a PTX file look, and more importantly, how to understand all those complicated instructions in a PTX files. In this post I will use the same vector addition kernel from the the previous post previous post (the complete code can be found here). For this post, I will focus on OpenCL PTX file. In a future post I will discuss the differences between PTX files of OpenCL and CUDA code. Let's start by looking at the complete PTX code: // // Generated by NVIDIA NVVM Compiler // Compiler built on Sun May 18 ...

Generating PTX files from OpenCL code

Peter CUDA, OpenCL 2 Comments

Here at ArrayFire, we develop code that will work efficiently on both CUDA and OpenCL platforms. Therefore, it is not uncommon that CUDA code on NVIDIA GPUs will run faster than OpenCL. A very good way to understand what is behind the curtains is to generate the PTX file for both cases and compare them. In this post, we show how to generate PTX for both CUDA and OpenCL kernels. PTX stands for Parallel Thread eXecution, which is a low-level virtual machine and instruction set architecture (ISA). For those familiar with assembly language, the PTX instruction set is not really more complicated than a single thread assembly code, except that now we are thinking in massive parallel execution. Retrieving the PTX ...

Image editing using ArrayFire: Part 3

Pradeep Garigipati ArrayFire, C/C++, CUDA, Image Processing, OpenCL 1 Comment

Today, we will be doing the third post in our series Image editing using ArrayFire. References to old posts are available below. * Part 1 * Part 2 In this post, we will be looking at the following operations. Image Histogram Simple Binary Theshold Otsu Threshold Iterative Threshold Adaptive Binary Threshold Emboss Filter Today's post will be mostly dominated by different types of threshold operations we can achieve using ArrayFire. Image Histogram We have a built-in function in ArrayFire that creates a histogram. The input image was converted to gray scale before histogram calculation as our histogram implementation works for vector and 2D matrices only. In case, you need histogram for all three channels of a color image, you can ...

ArrayFire Capability Update - July 2014

Oded Android, ArrayFire, C/C++, CUDA, Fortran, JAVA, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD's GPUs, CPUs and APUs Intel's CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such ...

Image Processing Benchmarks on NVIDIA Jetson TK1

Pradeep Garigipati ArrayFire, Benchmarks, CUDA 7 Comments

In this post we will be looking at benchmarks of the following ArrayFire image processing functions on an ARM device. Erosion/Dilation Median filter Resize Histogram Bilateral filter Convolution We pitted the brand new compute 3.2 GPU on NVIDIA Jetson TK1 against a mobile NVIDIA GPU. The closest match to the GPU (from here on referred as TK1) on the Jetson board we have in our mobile card deck is a NVIDIA GT 650M. The GPU device properties that have critical effect on the function performance are listed below. Property Name / Device Name Jetson TK1 GK20A GT 650M Compute 3.2 3.0 Number of multiprocessors 1 2 Cores 192 384 Base clock rate 852 MHz 950 MHz Total global memory 1746 ...

Custom Kernels with ArrayFire

Pavan ArrayFire, C/C++, CUDA, OpenCL Leave a Comment

As extensive as ArrayFire is, there are a few cases where you are still working with custom CUDA or OpenCL kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. In this post we are going to talk about how to integrate your custom kernels into ArrayFire in a seamless fashion. In and Out of ArrayFire First let us look at the following code and then break it down bit by bit. int main() { af::array x = af::randu(num, 1); af::array y = af::array(num, 1); float *d_x = x.device(); float *d_y = y.device(); af::sync(); launch_simple_kernel(d_y, d_x, num); x.unlock(); y.unlock(); float err = ...

Open Source Initiatives from ArrayFire

Pavan Announcements, ArrayFire, CUDA, Fortran, JAVA, Open Source, OpenCL, OpenGL, R Leave a Comment

At ArrayFire we like to use a lot of Free/Open Source software. We use various Linux distributions, Jenkins, Gitlab, gcc, emacs, vim and numerous other FOSS tools on a daily basis. We also love the idea of developing software collaboratively and openly. Last year we started working with AMD on CL Math Libraries. Internally we've had numerous discussions about contributing to the GPGPU community. However, it's neither simple nor straightforward to take a closed software Open Source. Earlier this year, we decided to take the first step and Open Source all of the ArrayFire library's  tertiary projects. This includes all of our ArrayFire library's language wrappers, examples, and source code used for our blog posts. All of our projects are hosted at our ...

How to Make GPU Hardware Decisions

Scott Computing Trends, CUDA, Hardware, OpenCL Leave a Comment

We get questions all the time about how to make GPU hardware decisions. We've seen just about every scenario you can imagine, and so we always jump at the chance to help others through this decision process. Here's a recent question from a customer. "I've just found your post on Analytic Bridge and have taken a look at your website ... I'm replacing my two Tesla M1060 cards (computing capability too low) and I'm considering used Tesla M2070s or the new GTX 760 cards. Could you offer any insight? I believe the GTX 760 cards may well outperform the older 2070s and are much cheaper." And here's our response. "The GTX 760 will probably outperform the M2070 for single precision ...

ArrayFire v2.1 Official Release

Aaron Announcements, ArrayFire, CUDA, OpenCL Leave a Comment

It's that time again—we're pleased to announce the release of our newest version of ArrayFire: ArrayFire v2.1. ArrayFire v2.1 is now bigger, faster, and stronger, thanks to some key function additions, API changes, feature improvements, and bug fixes. ArrayFire is a CUDA and OpenCL library designed for maximum speed without the hassle of writing time-consuming CUDA and OpenCL device code.  With ArrayFire’s library functions, developers can maximize productivity and performance. Each of ArrayFire’s functions has been hand-tuned by CUDA and OpenCL experts. Major Updates Support for CUDA 6.0 Support for Mac OS X New language support (available on github)       ArrayFire for Java       ArrayFire for R!       ArrayFire for Fortran* ArrayFire Extras on Github       All language wrappers       ...