APU 2013 – Day 2 Recap

John MelonakosComputing Trends, Events, OpenCL 1 Comment

Today was the first full day of AMD’s APU 2013 conference. It was a whirlwind of heterogeneous computing. From the morning keynotes, three particular salient points stuck out to us: Mike Muller, CTO at ARM, talked about heterogeneous computing. He said it nicely with, “Heterogeneous computing is the future. It has also been our past, but we didn’t notice because a few shiny companies overshadowed everything else.” That is a great way to describe it. The future of heterogeneous computing involves the rise in importance of non-x86 processors. Throwing a few more MHz onto a CPU no longer is capable of satiating computational demands. Nandini Ramani, VP at Oracle, talked about the importance of Java for heterogeneous computing. She pointed …

APU 2013 – Day 1 Recap

John MelonakosEvents, OpenCL Leave a Comment

AMD’s APU 2013 kicked off today with keynotes and a welcome reception. The developer summit is themed as the epicenter of heterogeneous computing. AMD has a world class CPU and a world class GPU and is pushing the industry forward by combining both of those devices into the same chip, the APU. AMD’s APUs are programmable via OpenCL, the industry standard for heterogeneous development. AMD is also leading the way with standards for Heterogeneous System Architecture (HSA). APU13 will have many technical sessions, keynotes, and demos around OpenCL and HSA. We are at the APU conference demoing ArrayFire acceleration on two of AMD’s newest hardware offerings: A machine with the latest AMD Radeon R9 209X discrete GPU A machine with the …

ArrayFire v2.0 Release Candidate Now Available for Download

Aaron TaylorAnnouncements, ArrayFire, CUDA, OpenCL Leave a Comment

ArrayFire v2.0 is now available for download. The second iteration of our free, fast, and simple GPU library now supports both CUDA and OpenCL devices. Major Updates ArrayFire now works on OpenCL enabled devices New and improved documentation Optimized for new GPUs–NVIDIA Kepler (K20) and AMD Tahiti (7970) New in ArrayFire OpenCL Same APIs as ArrayFire CUDA version Supports both Linux and Windows Just In Time Compilation (JIT) of kernels Parallel for: gfor Accelerated algorithms in the following domains Image Processing Signal Processing Data Analysis and Statistics Visualization And more New in ArrayFire CUDA New Signal and Image processing functions Faster transpose and matrix multiplication Better debugging support for GDB and Visual Studio Bug fixes to make overall experience better For a more complete list of  the …

clMath: An Open Source BLAS and FFT Library for OpenCL

ScottAnnouncements, OpenCL Leave a Comment

If you’re reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire’s higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a significant boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute to the OpenCL community by open-sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have previously used AMD’s OpenCL libraries within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to join AMD and the open-source community in maintaining and improving these libraries for the benefit of all. …

ArrayFire Examples (Part 7 of 8) – PDE

ArrayFireArrayFire, CUDA Leave a Comment

This is the seventh in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the pde/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9.1 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 319.17 GPU0 Tesla K20c, 5120 MB, Compute 3.5 (current) GPU1 Tesla C2075, 6144 MB, Compute 2.0 GPU2 Tesla C1060, 4096 MB, Compute 1.3 Display Device: GPU0 Tesla K20c Memory Usage: 5044 MB free (5120 MB total)   The followings are the examples of formulating Partial Differential Equations, generally used to create a relevant computer model with several variables. In these examples, …

ArrayFire Examples (Part 6 of 8) – Multiple GPUs

ArrayFireArrayFire, CUDA Leave a Comment

This is the sixth in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the multi_gpu/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9.1 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 319.17 GPU0 Tesla K20c, 5120 MB, Compute 3.5 (current) GPU1 Tesla C2075, 6144 MB, Compute 2.0 GPU2 Tesla C1060, 4096 MB, Compute 1.3 Memory Usage: 4935 MB free (5120 MB total) *The following order represents the speed of GPUs in my machine from fastest to slowest: K20c, C2070, C1060. ArrayFire is capable of multi-GPU management. This capability becomes useful for benchmarking …

ArrayFire Examples (Part 5 of 8) – Machine Learning

ArrayFireArrayFire, CUDA Leave a Comment

This is the fifth in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the machine_learning/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Mac OSX) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 304.54 GPU0 GeForce GT 560M, 1024 MB, Compute 3.0 (single,double) Display Device: GPU0 GeForce GT 650M Memory Usage: 245 MB free (1024 MB total)…    1. K-Means Clustering – kmeans.cpp Figure 1 This is an example of K-Means Clustering Algorithm. K-Means Clustering Algorithm is a data mining technique that partitions the given data into groups by their similarities. All you need to …

Solution to NVIDIA Toolkit Installation Error for Ubuntu 12.10 [Driver: Installation Failed]

ArrayFireArrayFire, CUDA Leave a Comment

Driver: Installation Failed   You may find this error message while trying to set up the NVIDIA CUDA Toolkit in Ubuntu. I found it when I was installing the toolkit for ArrayFire   [1] CUDA Toolkit Installation 1. Download the CUDA Toolkit in the link.  2. Extract the .run file in a location sudo sh cuda_5.0.35_linux_64_ubuntu11.10-1.run –extract <location>   3. Exit the X server (press Ctrl+Alt+F1) and stop the display manager by the following command. sudo stop lightdm   4. cd to the location and now there are run files named samples*, devdriver* and cudatoolkit*. 5. Install devdriver (*only if NVIDIA Driver is not installed) sudo sh devdriver_5.0_linux_64_304.54.run   6. Install cudatoolkit sudo sh cudatoolkit-5.0.35_linux_64_ubuntu11.10.run In the end, when it asks …

Beamforming with ArrayFire

ScottArrayFire, Case Studies, CUDA Leave a Comment

Alessandro Savoia and researchers at Università degli Studi Roma Tre have achieved an order of magnitude improvement in the performance of a beamforming application using ArrayFire for GPU acceleration with CUDA-capable NVIDIA GPUs. This application involves conventional beamforming. Steps include the application of a time delay to each signal vector, summation across all vectors, and processing on the result. Processing includes demodulation, envelope extraction, and logarithmic compression. ArrayFire’s functions for shifting, interpolation, and filtering made this application possible for acceleration on GPUs and reduced the time to develop significantly. Alessandro’s benchmarks show that a CPU-only version was only running at 1 frame/sec, while the ArrayFire-accelerated version was running at 10-20 frames/sec, depending on the dataset. Alessandro and his team are looking forward to …

ArrayFire Examples (Part 4 of 8) – Image Processing

ArrayFireArrayFire, CUDA Leave a Comment

This is the fourth in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the image_processing/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Windows) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 306.94 GPU0 GeForce GT 650M, 2048 MB, Compute 3.0 (single,double) Display Device: GPU0 GeForce GT 650M Memory Usage: 1981 MB free (2048 MB total)… Image Demo The purpose of this example is to show how to do some common image manipulations. The method channel_split shows how easily multi-dimensional arrays can be subdivided: // Split a MxNx3 image into 3 separate channel …