Using GPUs in KVM Virtual Machines

Pavan YalamanchiliHardware & Infrastructure, Open Source 2 Comments

Introduction A couple of months ago, I began investigating GPU passthrough on my workstation to test ArrayFire on different operating systems. Around the same time, we at ArrayFire found ourselves with a few surplus GPUs. Having had great success with my virtualization efforts, we decided to build a Virtualized GPU Server to utilize these GPUs. Building a Virtualized GPU Server alleviated one of the pain points at our company: We no longer need to swap GPUs or Hard Disks to test a new environment. To maximize the number of GPUs we can put in a machine, we ended up getting a Quantum TXR430-0768R from Exxact Computing which comes in a 4U form factor and supports upto 8x double width GPUs. …

Benchmarking parallel vector libraries

Pavan YalamanchiliArrayFire, Benchmarks, C/C++, CUDA Leave a Comment

There are many open source libraries that implement parallel versions of the algorithms in the C++ standard template libraries. Inevitably we get asked questions about how ArrayFire compares to the other libraries out in the open. In this post we are going to compare the performance of ArrayFire to that of BoostCompute, HSA-Bolt, Intel TBB and Thrust. The benchmarks include the following commonly used vector algorithms across 3 different architectures. Reductions Scan Transform The following setup has been used for the benchmarking purposes. The code to reproduce the benchmarks is linked at the bottom of the post. The hardware used for the benchmarks is listed below: NVIDIA Tesla K20 AMD FirePro S10000 Intel Xeon E5-2560v2 Background ArrayFire ArrayFire provides high …

ArrayFire: Write once, Run anywhere

Shehzan MohammedArrayFire 2 Comments

One of ArrayFire’s biggest features is the ability for code to be written just once and run on a plethora of devices. In this post, we show the outputs of af::info() from various devices available to us. Desktop Processors AMD GPU/CPU (OpenCL) ArrayFire v2.1 (OpenCL, 64-bit Linux, build 4b9115c) License: Standalone (/home/pavan/.arrayfire.lic) Addons: MGL4, DLA, SLA Platform: AMD Accelerated Parallel Processing, Driver: 1526.3 (VM) [0]: Tahiti, 2864 MB, OpenCL Version: 1.2 1 : AMD FX(tm)-8350 Eight-Core Processor, 7953 MB, OpenCL Version: 1.2 Compute Device: [0] AMD APU (OpenCL) ArrayFire v2.1 (OpenCL, 64-bit Linux, build 586ef59) License: Standalone (/home/arrayfire/.arrayfire.lic) Addons: MGL4, DLA, SLA Platform: AMD Accelerated Parallel Processing, Driver: 1445.5 (VM) [0]: Spectre, 624 MB, OpenCL Version: 1.2 1 : AMD …

ArrayFire Capability Update – July 2014

Oded GreenAndroid, ArrayFire, C/C++, CUDA, Fortran, Java, OpenCL, R 1 Comment

In response to user requests for additional ArrayFire capabilities, we have decided to extend the library to have CPU fall back when OpenCL drivers for CPUs are not available. This means that ArrayFire code will be portable to both devices that have OpenCL setup and devices without it. This is done through the creation of additional backends. This will allow ArrayFire users to write their code once and have it run on multiple systems. We currently support the following systems and architectures: NVIDIA GPUs (Tesla, Fermi, and Kepler) AMD’s GPUs, CPUs and APUs Intel’s CPUs, GPUs and Xeon Phi Co-Processor Mobile and Embedded devices As part of this update process we are also looking at extending ArrayFire capabilities to low power systems such …

Remote Off-Screen Rendering with OpenGL

Shehzan MohammedArrayFire, OpenGL 18 Comments

At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display. In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display. A few notes before we get started. This blog is limited to computers running distributions of Linux. The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display). AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices. Requirements You will need access to the remote …

APU 2013 – Day 3 Recap

John MelonakosComputing Trends, Events, OpenCL Leave a Comment

Big announcement here at #APU13! AMD CTO, Mark Papermaster, just announced 2 additions to the 2014 Mobile APU roadmap http://t.co/sWHMhb9AAe — AMD (@AMD) November 13, 2013 Today was the final day of AMD’s APU 2013 conference. The theme of today was mostly focused on gaming topics, so it was not as relevant to technical computing as yesterday. However, the mobile product announcement from AMD in the tweet above was interesting. OpenCL is just as important in mobile computing as it is in HPC computing. Both ends of the spectrum have a need for speed and can achieve it through great data parallelism. AMD is looking to make better inroads into mobile computing with these APU announcements. Overall, APU 2013 was a fantastic …

APU 2013 – Day 1 Recap

John MelonakosEvents, OpenCL Leave a Comment

AMD’s APU 2013 kicked off today with keynotes and a welcome reception. The developer summit is themed as the epicenter of heterogeneous computing. AMD has a world class CPU and a world class GPU and is pushing the industry forward by combining both of those devices into the same chip, the APU. AMD’s APUs are programmable via OpenCL, the industry standard for heterogeneous development. AMD is also leading the way with standards for Heterogeneous System Architecture (HSA). APU13 will have many technical sessions, keynotes, and demos around OpenCL and HSA. We are at the APU conference demoing ArrayFire acceleration on two of AMD’s newest hardware offerings: A machine with the latest AMD Radeon R9 209X discrete GPU A machine with the …

clMath: An Open Source BLAS and FFT Library for OpenCL

ScottAnnouncements, OpenCL Leave a Comment

If you’re reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire’s higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a significant boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute to the OpenCL community by open-sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have previously used AMD’s OpenCL libraries within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to join AMD and the open-source community in maintaining and improving these libraries for the benefit of all. …

Parallel Software Development Trends for Dummies

John MelonakosComputing Trends Leave a Comment

Last month, I posted two articles describing computing trends and why heterogeneous computing will be a significant force in computing for the next decade. Today, I continue that series with an article describing the biggest challenge to continued increases in computing performance – parallel software development. Biggest Challenge As I described previously, in order to use an accelerator, software changes must be made. Regular x86-based compilers cannot compile code to run on accelerators without these needed changes. The amount of software change required varies depending upon the availability of and reliance upon software tools that increase performance and productivity. There are four possible approaches to take advantage of accelerators in heterogeneous computing environments:  do-it-yourself, use compilers, use libraries, or use …

Heterogeneous Computing Trends for Dummies

John MelonakosComputing Trends Leave a Comment

Ten days ago, I posted an article on CPU Processing Trends for Dummies. Today, I continue that series with an article describing the latest major trend in computing, namely Heterogeneous Computing. The Point The point of these articles is to paint the high-level picture for trends in computer processing. I hope this bigger picture will help summarize things for those that do not breathe computer processors and technical software on a daily basis. Over the last 20 years, big gains in computer processing have been defined by increases in CPU clock speeds, then by increases in the number of CPU cores. The next 10+ years will be defined by heterogeneous computing. Heterogeneous Computing So let’s start with a definition:  Heterogeneous …