This past week marked the passage of Leonard Nimoy. Here at ArrayFire, we are deeply saddened and touched by his departure. While Mr. Nimoy was anything but a scientist, mathematician, or programmer, he portrayed a character that embodied the best of all of those professions. Whereas Mr. Nimoy has had a send off on major news outlets regarding his amazing career throughout his entire life, we would rather like to express how he specifically inspired us to ultimately pursue our careers and why we toil under the ArrayFire banner. As everyone (hopefully) knows, Leonard Nimoy portrayed the character of Spock on Star Trek. Logical, brilliant, and supposedly emotionless, Spock served as friend and counter-balance to Kirk – the brash, head-strong, …
ArrayFire Benchmarks: AMD Kaveri vs Intel Haswell Part-1
We have had queries in the past requesting benchmarks on integrated GPUs of Intel and AMD processors. This post is a modest attempt to answer those questions. In this post, we focus on the GPU benchmarks of AMD A10-7850K APU & Intel i7-4790K HD 4600 for the following ArrayFire functions. Bilateral Filter Erosion/Dilation 2D Convolution 2D Fast Fourier Transform Median Filter Resize Rotate Scan 1D Array Reduction of 1D Array Sort Matrix Transpose Remarks For most of the benchmarks the Intel system was outperformed by the AMD APU. We believe that we will be able to get more performance from the Intel system by modifying the kernels to use vector operations which will increase the resource utilization. Keep an eye …
The ArrayFire Blog’s Best of 2014
The year 2014 was a big one for us! Before we get too far into 2015, we thought we’d share the most popular posts of 2014. So, without further adieu, we give you the TOP TEN ARRAYFIRE BLOG POSTS OF 2014: 1. Getting Started with OpenCL on Android: In which we review how to do image processing on camera feed on Android devices using OpenCL. 2. Image Processing Benchmarks on NVIDIA Jetson TK1: In which we look at benchmarks of the following ArrayFire image processing functions on an ARM device: erosion/dilation, median filter, resize, histogram, bilateral filter, and convolution. 3. OpenCL on Mobile Devices: In which we share a consolidated list of OpenCL supported Android devices. 4. Quest for the Smallest OpenCL Program: In …
ArrayFire Open Source Buzz
Over the weekend we celebrated the month-iversary of ArrayFire going open source. A month later, we’re still pumped about this move, and the response from the parallel computing community has been tremendous. We thought we’d share some of our favorite ArrayFire buzz from the last month. On the day of the release, we watched as the ArrayFire open source release steadily climbed up Hacker News, eventually landing the number three spot! Admittedly, it’s hard to compete with a comet landing. With eager eyes, we followed the rise of our GitHub repository’s star count to an incredible 860 stars. We received shout-outs from several major blogs including Phoronix, insideHPC, and HPCWire. In the AMD Developer Blog, Brent Hollingsworth wrote “On the AMD side, we have been very impressed …
Conway’s Game of Life using ArrayFire
Conway’s Game of Life is a popular zero player cellular automaton devised by the John Horton Conway in 1970. The game makes for a fun evolution as the player sets the initial condition and then observes the evolution of the game. Each cell has 2 states: live or dead. There are 4 simple rules that determine this: Any live cell with fewer than two live neighbours dies, as if caused by under-population. Any live cell with two or three live neighbours lives on to the next generation. Any live cell with more than three live neighbours dies, as if by overcrowding. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction. From a programmer’s …
Triangle Counting in Graphs on the GPU (Part 3)
In this blog I will finalize the work that we completed on triangle counting in graphs on the GPU using CUDA. The two previous blogs can be found: Part 1 and Part 2. The first part introduces the significance of the problem and the second part explains the algorithms that we used in our solution. This work was presented in finer detail in the Workshop on Irregular Applications: Architectures and Algorithms which took place as part of Supercomputing 2014. The full report can be found here. In previous blogs, we discussed that the performance of the triangle counting is dependent on the algorithm and the CUDA kernel. Our implementation gives the data scientist control over several important parameters: Number of …
ArrayFire is Now Open Source
Yes, you read that right! ArrayFire is open source—it’s all there and it’s all free. This is big, and you and the rest of the parallel computing community are going to love it! You can download our pre-compiled binary installers which are optimized for a wide variety of systems or you can get a copy of the ArrayFire source code from our GitHub page. ArrayFire is being released under the BSD 3-Clause License, which will enable unencumbered deployment and portability of ArrayFire for commercial use. So go check it out! We welcome your feedback and look forward to your future contributions to ArrayFire. The move to open source isn’t our only news—we’ve also made ArrayFire better than ever. Check out our recent …
New Features in ArrayFire
We have previously talked about upcoming computer vision algorithms in the next version of ArrayFire. Today we are going to discuss some of the bigger changes and additions to ArrayFire. New CPU backend In addition to CUDA and OpenCL backends, you can now run ArrayFire natively on any CPU. This is another step we’ve taken in our efforts to make ArrayFire truly portable. The biggest benefits the new CPU backend include: Hardware and Software neutrality: You can now build and ship applications without worrying about the hardware and drivers preset on end users’ machines. You can also port your applications easily to embedded and mobile platforms where CUDA and OpenCL may not be available. Heterogeneous Computing: It is now easier …
ArrayFire at SC14
HPC matters. That’s the tagline for SC 14, and here at ArrayFire we’re in complete agreement with them. We’ve exhibited at SC for the past few years, and we’re excited to once again be a part of this excellent conference! It’s a great place for soaking up HPC knowledge, getting inspired, and connecting with the brightest minds in the industry. Here’s a quick run-down of where we’ll be. Visit our booth. We’re booth #2725. We’ll have beautiful demos running and our engineers will be available for questions. Ask your questions, meet the team, or just bounce some ideas. Maybe—just maybe—you’ll get a sneak peek at our most ambitious project yet… Try our in-booth tutorials. Want to learn how to use ArrayFire to accelerate …
CUDA Optimization tips for Matrix Transpose in real world applications
Computer algorithms are extra friendly towards data sizes that are powers of two. GPU compute algorithms work particularly well with data sizes that are multiples of 32. In most real-world situations, however, data is rarely so conveniently sized. In today’s post, we’ll be looking at one such scenario related to GPU compute. Specifically, we’ll provide you with some tips on how to optimize matrix transpose algorithm for a GPU. Let’s start with the transpose kernel available from NVIDIA’s Parallel Forall blog. It’s been optimized to avoid bank conflicts as well, but only works on matrices with dimensions that are multiples of 32. template __global__ void transpose32(T * out, const T * in, unsigned dim0, unsigned dim1) { __shared__ T shrdMem[TILE_DIM][TILE_DIM+1]; …