Using GPUs in KVM Virtual Machines

Pavan Hardware, Infrastructure, Open Source 2 Comments

Introduction A couple of months ago, I began investigating GPU passthrough on my workstation to test ArrayFire on different operating systems. Around the same time, we at ArrayFire found ourselves with a few surplus GPUs. Having had great success with my virtualization efforts, we decided to build a Virtualized GPU Server to utilize these GPUs. Building a Virtualized GPU Server alleviated one of the pain points at our company: We no longer need to swap GPUs or Hard Disks to test a new environment. To maximize the number of GPUs we can put in a machine, we ended up getting a Quantum TXR430-0768R from Exxact Computing which comes in a 4U form factor and supports upto 8x double width GPUs. ...

Benchmarking parallel vector libraries

Pavan ArrayFire, Benchmarks, C/C++, CUDA Leave a Comment

There are many open source libraries that implement parallel versions of the algorithms in the C++ standard template libraries. Inevitably we get asked questions about how ArrayFire compares to the other libraries out in the open. In this post we are going to compare the performance of ArrayFire to that of BoostCompute, HSA-Bolt, Intel TBB and Thrust. The benchmarks include the following commonly used vector algorithms across 3 different architectures. Reductions Scan Transform The following setup has been used for the benchmarking purposes. The code to reproduce the benchmarks is linked at the bottom of the post. The hardware used for the benchmarks is listed below: NVIDIA Tesla K20 AMD FirePro S10000 Intel Xeon E5-2560v2 Background ArrayFire ArrayFire provides high ...

ArrayFire: Write once, Run anywhere

Shehzan ArrayFire 2 Comments

One of ArrayFire's biggest features is the ability for code to be written just once and run on a plethora of devices. In this post, we show the outputs of af::info() from various devices available to us. Desktop Processors AMD GPU/CPU (OpenCL)


Intel CPU (OpenCL)

Intel HD Graphics (OpenCL)

Intel Xeon Phi Coprocessor (OpenCL)



Embedded Processors ARM Mali GPU (OpenCL) #


Qualcomm Snapdragon SoC (OpenCL) #

#: Experimental versions. Email for access. The devices shown above are ones we have in-house for demonstration purposes. This is not an exhaustive list. If you have OpenCL working on ...

ArrayFire Capability Update - July 2014

Oded Android, ArrayFire, C/C++, CUDA, Fortran, JAVA, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD's GPUs, CPUs and APUs Intel's CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such ...

Remote Off-Screen Rendering with OpenGL

Shehzan ArrayFire, OpenGL 18 Comments

At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display. In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display. A few notes before we get started. This blog is limited to computers running distributions of Linux. The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display). AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices. Requirements You will need access to the remote ...

APU 2013 – Day 3 Recap

John Computing Trends, Events, OpenCL Leave a Comment

Big announcement here at #APU13! AMD CTO, Mark Papermaster, just announced 2 additions to the 2014 Mobile APU roadmap — AMD (@AMD) November 13, 2013 Today was the final day of AMD's APU 2013 conference. The theme of today was mostly focused on gaming topics, so it was not as relevant to technical computing as yesterday. However, the mobile product announcement from AMD in the tweet above was interesting. OpenCL is just as important in mobile computing as it is in HPC computing. Both ends of the spectrum have a need for speed and can achieve it through great data parallelism. AMD is looking to make better inroads into mobile computing with these APU announcements. Overall, APU 2013 was a fantastic ...

APU 2013 - Day 1 Recap

John Events, OpenCL Leave a Comment

AMD's APU 2013 kicked off today with keynotes and a welcome reception. The developer summit is themed as the epicenter of heterogeneous computing. AMD has a world class CPU and a world class GPU and is pushing the industry forward by combining both of those devices into the same chip, the APU. AMD's APUs are programmable via OpenCL, the industry standard for heterogeneous development. AMD is also leading the way with standards for Heterogeneous System Architecture (HSA). APU13 will have many technical sessions, keynotes, and demos around OpenCL and HSA. We are at the APU conference demoing ArrayFire acceleration on two of AMD's newest hardware offerings: A machine with the latest AMD Radeon R9 209X discrete GPU A machine with the ...

clMath: An Open Source BLAS and FFT Library for OpenCL

Scott Announcements, OpenCL Leave a Comment

If you're reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire's higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a big boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute back to the OpenCL community by open sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have used AMD's OpenCL libraries in the past within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to now join AMD and the open source community in maintaining and improving these ...

Parallel Software Development Trends for Dummies

John Computing Trends Leave a Comment

Last month, I posted two articles describing computing trends and why heterogeneous computing will be a significant force in computing for the next decade. Today, I continue that series with an article describing the biggest challenge to continued increases in computing performance - parallel software development. Biggest Challenge As I described previously, in order to use an accelerator, software changes must be made. Regular x86-based compilers cannot compile code to run on accelerators without these needed changes. The amount of software change required varies depending upon the availability of and reliance upon software tools that increase performance and productivity. There are four possible approaches to take advantage of accelerators in heterogeneous computing environments:  do-it-yourself, use compilers, use libraries, or use ...

Heterogeneous Computing Trends for Dummies

John Computing Trends Leave a Comment

Ten days ago, I posted an article on CPU Processing Trends for Dummies. Today, I continue that series with an article describing the latest major trend in computing, namely Heterogeneous Computing. The Point The point of these articles is to paint the high-level picture for trends in computer processing. I hope this bigger picture will help summarize things for those that do not breathe computer processors and technical software on a daily basis. Over the last 20 years, big gains in computer processing have been defined by increases in CPU clock speeds, then by increases in the number of CPU cores. The next 10+ years will be defined by heterogeneous computing. Heterogeneous Computing So let's start with a definition:  Heterogeneous ...