Making the decision to open source your software is not an easy process. Indeed, here at ArrayFire our choice to release ArrayFire under the open source, commercially friendly, BSD 3-Clause License came only after many hours of consideration and philosophical discussion (e.g. see our CEO’s blog on these topics). Thus far this decision has proven to be strictly beneficial to our company. The impact of third-party contributions Although ArrayFire is primarily developed by our engineers, there are several contributions from other developers. Therefore we feel particularly compelled to elucidate how these contributions have improved the ArrayFire ecosystem. Packaging for Linux and OSX One of the best parts of open source distribution is that your code can be packaged and distributed for …
Explaining FP64 performance on GPUs
Introduction GPUs are really good at doing math. The Achilles heel is when it comes to 64-bit double precision math. GPUs, at least consumer grade, are not built for high performance FP64. This is because they are targeted towards gamers and game developers, who do not really care about high precision compute. So vendors like NVIDIA and AMD do not cram FP64 compute cores in their GPUs. For example, on a GTX 780 Ti, the FP64 performance is 1/24 FP32. Which means in an ideal case, running the same code by only changing float types to double types, would yield the single precision run time to be about 1/24th of the double precision time (time(FP32) = time(FP64)/24). Keep in mind, …
ArrayFire release cycle
Coming off of our latest release cycle, we think it is a good time to talk about our release policy more openly. In this post we are going to talk about our versioning scheme and the expected release cycle. The versioning scheme of the library will follow the Apache Portable Runtime guidelines. Each release version number will be of the following format arrayfire-x.y.z. The releases are going to be categorized broadly into the following categories based on the frequency and scope. Bug fix releases A bugfix release will bump the 3rd digit of the release version. These releases will neither add new functions nor break API and ABI compatiblity from the previous Feature release. Bug fix releases will be infrequent, …
Benchmarking parallel vector libraries
There are many open source libraries that implement parallel versions of the algorithms in the C++ standard template libraries. Inevitably we get asked questions about how ArrayFire compares to the other libraries out in the open. In this post we are going to compare the performance of ArrayFire to that of BoostCompute, HSA-Bolt, Intel TBB and Thrust. The benchmarks include the following commonly used vector algorithms across 3 different architectures. Reductions Scan Transform The following setup has been used for the benchmarking purposes. The code to reproduce the benchmarks is linked at the bottom of the post. The hardware used for the benchmarks is listed below: NVIDIA Tesla K20 AMD FirePro S10000 Intel Xeon E5-2560v2 Background ArrayFire ArrayFire provides high …
ArrayFire v3.0 is here!
Today we are pleased to announce the release of ArrayFire v3.0. This new version features major changes to ArrayFire’s visualization library, a new CPU backend, and dense linear algebra for OpenCL devices. It also includes improvements across the board for ArrayFire’s OpenCL backend. A complete list ArrayFire v3.0 updates and new features can be found in the product Release Notes. With over 8 years of continuous development, the open source ArrayFire library is the top CUDA and OpenCL software library. ArrayFire supports CUDA-capable GPUs, OpenCL devices, and other accelerators. With its easy-to-use API, this hardware-neutral software library is designed for maximum speed without the hassle of writing time-consuming CUDA and OpenCL device code. With ArrayFire’s library functions, developers can maximize …
Feature detection and tracking using ArrayFire
A few weeks ago we added some computer vision functionality to our open source ArrayFire GPU computing library. Specifically, we implemented the FAST feature extractor, BRIEF feature point descriptor, ORB multi-resolution scale invariant feature extractor, and a Hamming distance function. When combined, these functions enable you to find features in videos (or images) and track them between successive frames.
Intel OpenCL performance: 3rd generation hardware
Introduction With Intel CPUs making up nearly 80% of the CPU market and 66% of computers using integrated graphics one can easily argue that integrated graphics devices represent one of the greatest markets for GPU-accelerated computing. Here at ArrayFire, we have long recognized the potential of these devices and offer built-in support for Intel CPUs, GPUs, and AMD APUs in the OpenCL backend of our ArrayFire GPU computing library. Yet one common theme for debate in the office has been how the hardware performs on different operating systems with different drivers across hardware revisions. To answer these questions (and, perhaps, to win some intra-office geek cred) I decided to write a series of blog posts about Intel’s GPU OpenCL performance. In this first installment I will compare the performance …
GTC 2015 ArrayFire Recordings
Missed visiting ArrayFire at GTC this year? We’ve got you covered! You can now check out the recordings of all our GTC 2015 talks and tutorials at your own convenience. Learn about accelerating your code from the best in the business. Talks Real-Time and High Resolution Feature Tracking and Object Recognition Peter Andreas Entschev This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally …
Using zero-copy buffers on integrated GPUs
One of the most powerful aspects of parallel program on integrated GPUs is taking advantage of shared memory and caches. The best example of this is sharing common data between the CPU and GPU via. zero-copy buffers. This technique permits your program to avoid the O(N) cost of copying data to/from the GPU. This feature is particularly useful for applications that deal with real-time data streams, like video processing.
Machine Learning with ArrayFire: Linear Classifiers
Linear classifiers perform classification based on the linear combinition of the component features. Some examples of Linear Classifiers include: Naive Bayes Classifier, Linear Discriminant Analysis, Logistic Regression and Perceptrons. ArrayFire’s easy to use API enables users to write such classifiers from scratch fairly easily. In this post, we show how you can map mathematical equations to ArrayFire code and implement them from scratch. Naive Bayes Classifier Perceptron Naive Bayes Classifier Naive bayes classifier is a probabilistic classifier that assumes all the features in a feature vector are independent of each other. This assumption simplifies the bayes rule to a simple multiplication of probabilities as show below. First we start with the simple Baye’s rule. $$ p(C_k | x) = \frac{p(C_k)}{p(x)} …