We are pleased to announce a new release of the ArrayFire library, v3.9.0. This release makes it easier than ever to target new devices without sacrificing performance. This post describes four of these new features, including: oneAPI Backend This release is the first time since v3.0 that introduces a new backend. The new backend is built with the oneAPI specification on top of the SYCL language. oneAPI is an open specification providing a full framework for high-performance computing applications without vendor lock-in. While this has been possible with OpenCL, the oneAPI specification includes libraries like BLAS and FFT significantly reducing the burden on the developers to maintain math functions and increasing the performance of these common operations. Here is a …
Cycling through SYCL
We recently gave an overview of recent history in the technical computing hardware market. In it, we mention the energy at Intel right now. The weight of Intel is behind the SYCL standard through its new software approach, oneAPI. SYCL is a cross-platform API that targets heterogeneous hardware, similar to OpenCL and CUDA. The SYCL standard was first introduced by Codeplay and is now being managed by the Khronos group. It allows single-source compilation in C++ to target multiple devices on a system, rather than using C++ for the host and domain specific kernel languages for the device. Furthermore, SYCL is fully C++ 17 standards compliant. You don’t have any extensions to the language that would prevent any standards compliant …
ArrayFire v3.5 Official Release
Today we are pleased to announce the release of ArrayFire v3.5, our open source library of parallel computing functions supporting CUDA, OpenCL, and CPU devices. This new version of ArrayFire improves features and performance for applications in machine learning, computer vision, signal processing, statistics, finance, and more. This release focuses on thread-safety, support for simple sparse-dense arithmetic operations, canny edge detector function, and a genetic algorithm example. A complete list of ArrayFire v3.5 updates and new features are found in the product Release Notes. Thread Safety ArrayFire now supports threading programming models. This is not intended to improve the performance since most of the parallelism is happening on the device, but it does allow you to use multiple devices in …
ArrayFire v3.0 is here!
Today we are pleased to announce the release of ArrayFire v3.0. This new version features major changes to ArrayFire’s visualization library, a new CPU backend, and dense linear algebra for OpenCL devices. It also includes improvements across the board for ArrayFire’s OpenCL backend. A complete list ArrayFire v3.0 updates and new features can be found in the product Release Notes. With over 8 years of continuous development, the open source ArrayFire library is the top CUDA and OpenCL software library. ArrayFire supports CUDA-capable GPUs, OpenCL devices, and other accelerators. With its easy-to-use API, this hardware-neutral software library is designed for maximum speed without the hassle of writing time-consuming CUDA and OpenCL device code. With ArrayFire’s library functions, developers can maximize …
Intel OpenCL performance: 3rd generation hardware
Introduction With Intel CPUs making up nearly 80% of the CPU market and 66% of computers using integrated graphics one can easily argue that integrated graphics devices represent one of the greatest markets for GPU-accelerated computing. Here at ArrayFire, we have long recognized the potential of these devices and offer built-in support for Intel CPUs, GPUs, and AMD APUs in the OpenCL backend of our ArrayFire GPU computing library. Yet one common theme for debate in the office has been how the hardware performs on different operating systems with different drivers across hardware revisions. To answer these questions (and, perhaps, to win some intra-office geek cred) I decided to write a series of blog posts about Intel’s GPU OpenCL performance. In this first installment I will compare the performance …
Using zero-copy buffers on integrated GPUs
One of the most powerful aspects of parallel program on integrated GPUs is taking advantage of shared memory and caches. The best example of this is sharing common data between the CPU and GPU via. zero-copy buffers. This technique permits your program to avoid the O(N) cost of copying data to/from the GPU. This feature is particularly useful for applications that deal with real-time data streams, like video processing.
Machine Learning with ArrayFire: Linear Classifiers
Linear classifiers perform classification based on the linear combinition of the component features. Some examples of Linear Classifiers include: Naive Bayes Classifier, Linear Discriminant Analysis, Logistic Regression and Perceptrons. ArrayFire’s easy to use API enables users to write such classifiers from scratch fairly easily. In this post, we show how you can map mathematical equations to ArrayFire code and implement them from scratch. Naive Bayes Classifier Perceptron Naive Bayes Classifier Naive bayes classifier is a probabilistic classifier that assumes all the features in a feature vector are independent of each other. This assumption simplifies the bayes rule to a simple multiplication of probabilities as show below. First we start with the simple Baye’s rule. $$ p(C_k | x) = \frac{p(C_k)}{p(x)} …
OpenCL on Intel HD / Iris graphics on Linux
Under Windows and Mac the Intel GPU drivers include OpenCL support; however, on Linux OpenCL on Intel GPUs is implemented through an open source project called Beignet (pronnounced like “ben-yay”, a type of French pastry akin to a what we would call a “fritter” in English). Below I have written a step-by-step guide on how you can get Beignet running on an Ubuntu 14.10 system which has an Intel 3rd, 4th, or 5th generation Intel processor. Instructions for other variants of Linux will be similar, except for the commands to install the prerequisite packages. There are several little caveats which need to be discussed up front. Foremost, the Beignet project supports the following hardware: There are also a few noteworthy …
Getting Started with the Intel Xeon Phi on Ubuntu 14.04/Linux Kernel 3.13.0
You may already know that the Intel MPSS (Manycore Platform Software Stack) officially only supports the RedHat and SUSE Linux distros. Using an enterprise distro might be very interesting if your company is running a large server environment or short on specialized people and you rely on the distro official support for more complicated tasks. Not all companies use enterprise Linux distributions. Ubuntu has a large share of the Linux distro market (if not the largest). A while back, I needed to setup a machine running Ubuntu 14.04 and MPSS 3.4.x and could not find any documentation running the newest versions of Ubuntu/Linux Kernel/MPSS. In this blog, I will try to document how to get the Intel Xeon Phi running …
Demystifying PTX Code
In my recent post, I showed how to generate PTX files from both CUDA and OpenCL kernels. In this post I will address the issue of how a PTX file look, and more importantly, how to understand all those complicated instructions in a PTX files. In this post I will use the same vector addition kernel from the the previous post previous post (the complete code can be found here). For this post, I will focus on OpenCL PTX file. In a future post I will discuss the differences between PTX files of OpenCL and CUDA code. Let’s start by looking at the complete PTX code: // // Generated by NVIDIA NVVM Compiler // Compiler built on Sun May 18 …