Category Archive

Below you'll find a list of all posts that have been categorized as “CUDA”

GTC 2015 ArrayFire Recordings

Aaron Taylor April 30, 2015ArrayFire, Computer Vision, CUDA

Missed visiting ArrayFire at GTC this year? We’ve got you covered! You can now check out the recordings of all our GTC 2015 talks and tutorials at your own convenience. Learn about accelerating your code from the best in the business. Talks Real-Time and High Resolution Feature Tracking and Object Recognition Peter Andreas Entschev This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally …

Machine Learning with ArrayFire: Linear Classifiers

Pavan Yalamanchili March 10, 2015ArrayFire, CUDA, OpenCL Leave a Comment

Linear classifiers perform classification based on the linear combinition of the component features. Some examples of Linear Classifiers include: Naive Bayes Classifier, Linear Discriminant Analysis, Logistic Regression and Perceptrons. ArrayFire’s easy to use API enables users to write such classifiers from scratch fairly easily. In this post, we show how you can map mathematical equations to ArrayFire code and implement them from scratch. Naive Bayes Classifier Perceptron Naive Bayes Classifier Naive bayes classifier is a probabilistic classifier that assumes all the features in a feature vector are independent of each other. This assumption simplifies the bayes rule to a simple multiplication of probabilities as show below. First we start with the simple Baye’s rule. $$ p(C_k | x) = \frac{p(C_k)}{p(x)} …

Conway’s Game of Life using ArrayFire

Shehzan Mohammed December 8, 2014ArrayFire, CUDA, Image Processing, Open Source, OpenGL 4 Comments

Conway’s Game of Life is a popular zero player cellular automaton devised by the John Horton Conway in 1970. The game makes for a fun evolution as the player sets the initial condition and then observes the evolution of the game. Each cell has 2 states: live or dead. There are 4 simple rules that determine this: Any live cell with fewer than two live neighbours dies, as if caused by under-population. Any live cell with two or three live neighbours lives on to the next generation. Any live cell with more than three live neighbours dies, as if by overcrowding. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction. From a programmer’s …

Triangle Counting in Graphs on the GPU (Part 2)

Oded Green October 23, 2014Benchmarks, CUDA 2 Comments

A while back I wrote a blog on triangle counting in networks using CUDA (see Part 1 for the post). In this post, I will cover in more detail the internals of the algorithm and the CUDA implementation. Before I take a deep dive into the details of the algorithm, I want to remind the reader that there are multiple ways for finding triangles in a graph. Our approach is based off the intersection of two adjacency lists and finding the common elements in both those lists. Two additional approaches would simply be to compare all the possible node-triplets, either in the graph or via matrix multiplication of the incidence array. The latter of these two approaches can be computationally …

Demystifying PTX Code

Peter Entschev September 17, 2014C/C++, CUDA, OpenCL 3 Comments

In my recent post, I showed how to generate PTX files from both CUDA and OpenCL kernels. In this post I will address the issue of how a PTX file look, and more importantly, how to understand all those complicated instructions in a PTX files. In this post I will use the same vector addition kernel from the the previous post previous post (the complete code can be found here). For this post, I will focus on OpenCL PTX file. In a future post I will discuss the differences between PTX files of OpenCL and CUDA code. Let’s start by looking at the complete PTX code: // // Generated by NVIDIA NVVM Compiler // Compiler built on Sun May 18 …

Generating PTX files from OpenCL code

Peter Entschev August 25, 2014CUDA, OpenCL 2 Comments

Here at ArrayFire, we develop code that will work efficiently on both CUDA and OpenCL platforms. Therefore, it is not uncommon that CUDA code on NVIDIA GPUs will run faster than OpenCL. A very good way to understand what is behind the curtains is to generate the PTX file for both cases and compare them. In this post, we show how to generate PTX for both CUDA and OpenCL kernels. PTX stands for Parallel Thread eXecution, which is a low-level virtual machine and instruction set architecture (ISA). For those familiar with assembly language, the PTX instruction set is not really more complicated than a single thread assembly code, except that now we are thinking in massive parallel execution. Retrieving the PTX …

Image editing using ArrayFire: Part 3

Pradeep Garigipati August 11, 2014ArrayFire, C/C++, CUDA, Image Processing, OpenCL 1 Comment

Today, we will be doing the third post in our series Image editing using ArrayFire. References to old posts are available below. * Part 1 * Part 2 In this post, we will be looking at the following operations. Image Histogram Simple Binary Theshold Otsu Threshold Iterative Threshold Adaptive Binary Threshold Emboss Filter Today’s post will be mostly dominated by different types of threshold operations we can achieve using ArrayFire. Image Histogram We have a built-in function in ArrayFire that creates a histogram. The input image was converted to gray scale before histogram calculation as our histogram implementation works for vector and 2D matrices only. In case, you need histogram for all three channels of a color image, you can …

ArrayFire Capability Update – July 2014

Oded Green July 18, 2014Android, ArrayFire, C/C++, CUDA, Fortran, Java, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD’s GPUs, CPUs and APUs Intel’s CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such …

Image Processing Benchmarks on NVIDIA Jetson TK1

Pradeep Garigipati June 12, 2014ArrayFire, Benchmarks, CUDA 6 Comments

In this post we will be looking at benchmarks of the following ArrayFire image processing functions on an ARM device. Erosion/Dilation Median filter Resize Histogram Bilateral filter Convolution We pitted the brand new compute 3.2 GPU on NVIDIA Jetson TK1 against a mobile NVIDIA GPU. The closest match to the GPU (from here on referred as TK1) on the Jetson board we have in our mobile card deck is a NVIDIA GT 650M. The GPU device properties that have critical effect on the function performance are listed below. Property Name / Device Name Jetson TK1 GK20A GT 650M Compute 3.2 3.0 Number of multiprocessors 1 2 Cores 192 384 Base clock rate 852 MHz 950 MHz Total global memory 1746 …

Custom Kernels with ArrayFire

Pavan Yalamanchili May 27, 2014ArrayFire, C/C++, CUDA, OpenCL Leave a Comment

As extensive as ArrayFire is, there are a few cases where you are still working with custom CUDA or OpenCL kernels. For example, you may want to integrate ArrayFire into an existing code base for productivity or you may want to keep it around the old implementation for testing purposes. In this post we are going to talk about how to integrate your custom kernels into ArrayFire in a seamless fashion. In and Out of ArrayFire First let us look at the following code and then break it down bit by bit. int main() { af::array x = af::randu(num, 1); af::array y = af::array(num, 1); float *d_x = x.device(); float *d_y = y.device(); af::sync(); launch_simple_kernel(d_y, d_x, num); x.unlock(); y.unlock(); float err = …

Page 2 of 11
←
1
2
3
...
11
→