Benchmarking parallel vector libraries

Pavan ArrayFire, Benchmarks, C/C++, CUDA Leave a Comment

There are many open source libraries that implement parallel versions of the algorithms in the C++ standard template libraries. Inevitably we get asked questions about how ArrayFire compares to the other libraries out in the open. In this post we are going to compare the performance of ArrayFire to that of BoostCompute, HSA-Bolt, Intel TBB and Thrust. The benchmarks include the following commonly used vector algorithms across 3 different architectures. Reductions Scan Transform The following setup has been used for the benchmarking purposes. The code to reproduce the benchmarks is linked at the bottom of the post. The hardware used for the benchmarks is listed below: NVIDIA Tesla K20 AMD FirePro S10000 Intel Xeon E5-2560v2 Background ArrayFire ArrayFire provides high ...

Intel OpenCL performance: 3rd generation hardware

Brian Kloppenborg ArrayFire, OpenCL 1 Comment

Introduction With Intel CPUs making up nearly 80% of the CPU market and 66% of computers using integrated graphics one can easily argue that integrated graphics devices represent one of the greatest markets for GPU-accelerated computing. Here at ArrayFire, we have long recognized the potential of these devices and offer built-in support for Intel CPUs, GPUs, and AMD APUs in the OpenCL backend of our ArrayFire GPU computing library. Yet one common theme for debate in the office has been how the hardware performs on different operating systems with different drivers across hardware revisions. To answer these questions (and, perhaps, to win some intra-office geek cred) I decided to write a series of blog posts about Intel's GPU OpenCL performance. In this first installment I will compare the performance ...

Using zero-copy buffers on integrated GPUs

Brian Kloppenborg C/C++, OpenCL 1 Comment

One of the most powerful aspects of parallel program on integrated GPUs is taking advantage of shared memory and caches. The best example of this is sharing common data between the CPU and GPU via. zero-copy buffers. This technique permits your program to avoid the O(N) cost of copying data to/from the GPU. This feature is particularly useful for applications that deal with real-time data streams, like video processing.

ArrayFire: Write once, Run anywhere

Shehzan ArrayFire 2 Comments

One of ArrayFire's biggest features is the ability for code to be written just once and run on a plethora of devices. In this post, we show the outputs of af::info() from various devices available to us. Desktop Processors AMD GPU/CPU (OpenCL)

AMD APU (OpenCL)

Intel CPU (OpenCL)

Intel HD Graphics (OpenCL)

Intel Xeon Phi Coprocessor (OpenCL)

NVIDIA GPUs (CUDA)

NVIDIA GPUs (OpenCL)

Embedded Processors ARM Mali GPU (OpenCL) #

NVIDIA Tegra K1 (CUDA)

Qualcomm Snapdragon SoC (OpenCL) #

#: Experimental versions. Email technical@arrayfire.com for access. The devices shown above are ones we have in-house for demonstration purposes. This is not an exhaustive list. If you have OpenCL working on ...

ArrayFire Capability Update - July 2014

Oded Android, ArrayFire, C/C++, CUDA, Fortran, JAVA, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD's GPUs, CPUs and APUs Intel's CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such ...

Partners Magnify the SC13 Experience

John ArrayFire, Events 1 Comment

Yesterday, we posted photos from our exhibit. Today was the last day of SC13, and we want to tip our hat to the wonderful partners that magnified our SC13 experience. Creative Consultants, Mellanox, and Allinea Creative Consultants ran an ArrayFire demo across several nodes using Mellanox interconnect. The demo was a multi-node, multi-GPU lattice boltzmann simulation. Allinea also showcased their debugging and profiling tools on the same ArrayFire based code. AMD ArrayFire OpenCL demos were showcased in the AMD exhibit. It was great to see momentum from AMD at SC13 carried over from the previous week's APU13 conference. Microway In the photo below, you can see ArrayFire running on Microway's WhisperStation. Microway had prime real estate at the conference and surely every ...

ISC 2013 Keynote by Stephen Pawlowski of Intel

John Computing Trends, Events Leave a Comment

Stephen Pawlowski of Intel gave an interesting keynote today at ISC 2013. He continued the theme of yesterday's keynote to address challenges our market faces in getting to exascale computing. Here is a summary of the points he made during his talk: Getting to exascale by 2020 requires performance improvement of 2x every year Innovations anticipated include stacked chips and optical layers DRAM is not scaling with Moore's Law More power goes into transferring data than in computing Need to operate transistors near threshold New materials for DRAM needed. Resistive memory could replace DRAM. Need to explore both the big die and the small die paths as we approach 2020 Big die path leads to 10 billion transistors on a ...

Are You Getting Left Behind?

John Computing Trends Leave a Comment

HPCwire posted a nice article today with trends from IDC on computer processing. These trends fall inline and corroborate things we've been saying here on this blog. Accelerators (including GPUs and co-processors) are taking off. Are you getting left behind? If you're reading this blog, you're probably at the bleeding edge, but nonetheless here are some interesting excerpts from HPCwire's market report (go read the whole thing): "While they expected to see a jump in coprocessor and accelerator uptake, they were wholly unprepared for the overwhelming positive response to GPUs and new entrants into the market, most notably Intel’s shiny new Phi." "Conway said that while accelerator and coprocessor adoption growth was anticipated, they had no idea that it would ...

Parallel Software Development Trends for Dummies

John Computing Trends Leave a Comment

Last month, I posted two articles describing computing trends and why heterogeneous computing will be a significant force in computing for the next decade. Today, I continue that series with an article describing the biggest challenge to continued increases in computing performance - parallel software development. Biggest Challenge As I described previously, in order to use an accelerator, software changes must be made. Regular x86-based compilers cannot compile code to run on accelerators without these needed changes. The amount of software change required varies depending upon the availability of and reliance upon software tools that increase performance and productivity. There are four possible approaches to take advantage of accelerators in heterogeneous computing environments:  do-it-yourself, use compilers, use libraries, or use ...

Heterogeneous Computing Trends for Dummies

John Computing Trends Leave a Comment

Ten days ago, I posted an article on CPU Processing Trends for Dummies. Today, I continue that series with an article describing the latest major trend in computing, namely Heterogeneous Computing. The Point The point of these articles is to paint the high-level picture for trends in computer processing. I hope this bigger picture will help summarize things for those that do not breathe computer processors and technical software on a daily basis. Over the last 20 years, big gains in computer processing have been defined by increases in CPU clock speeds, then by increases in the number of CPU cores. The next 10+ years will be defined by heterogeneous computing. Heterogeneous Computing So let's start with a definition:  Heterogeneous ...