ArrayFire + Scorpii Demo by CreativeC

ScottArrayFire, Benchmarks, Case Studies, CUDA, Events Leave a Comment

CreativeC makes awesome compute + visualization systems. We got to see the demo in live action at the GPU Technology Conference last month. Tim Thomas was kind enough to let us film the demo showing how ArrayFire can be used to drive a multi-node, 9 GPU system in a physics application. Checkout the video below. If you are interested in high-throughput compute coupled with high-pixel visualizations, we recommend you talk with the folks at CreativeC. They are always pushing the envelope on what can be done with GPU computing and GPU visualizations. Also, if you have cool demos showing ArrayFire in action, let us know. We’d love to film your work and make it available on this blog! Related articles …

Parallel Software Development Trends for Dummies

John MelonakosComputing Trends Leave a Comment

Last month, I posted two articles describing computing trends and why heterogeneous computing will be a significant force in computing for the next decade. Today, I continue that series with an article describing the biggest challenge to continued increases in computing performance – parallel software development. Biggest Challenge As I described previously, in order to use an accelerator, software changes must be made. Regular x86-based compilers cannot compile code to run on accelerators without these needed changes. The amount of software change required varies depending upon the availability of and reliance upon software tools that increase performance and productivity. There are four possible approaches to take advantage of accelerators in heterogeneous computing environments:  do-it-yourself, use compilers, use libraries, or use …

7 Highlights of GTC 2013 – Day 3 of 4

John MelonakosEvents Leave a Comment

Day 3 at GTC was awesome. It was super hard to narrow down our list to just 7 highlights. For instance, the stress ball pyramid in our booth does not count. Neither does the massive ArrayFire poster in front of the keynote hall. Here are 7 of the highlights we’ve collected from our team on the third day of GTC 2013: Professor Erez Lieberman Aiden of Baylor and Rice Universities gave a great keynote on “Parallel Processing of the Genomes, by the Genomes and for the Genomes.” He discussed how folding of genes and interactions between multiple folded genes can impact genetic expressions. It’s not just about the composition of the gene, but also how the gene folds. It turns …

7 Highlights of GTC 2013 – Day 1 of 4

John MelonakosEvents Leave a Comment

AccelerEyes is out in force at GTC. We ended up with 10 of our engineers and sales staff here onsite. I collected feedback from the team to learn what people enjoyed the most from today’s activities. Here are 7 of the highlights we’ve collected from our team on the first day of GTC 2013: Will Ramey of NVIDIA kicked off GTC with a tutorial on the CUDA ecosystem. He talked about the three different approaches to getting GPU acceleration:  1) Libraries, 2) Compiler Directives, and 3) Programming Languages. He talked about how libraries, if you can find one for your application (hint, hint),  are the best of the 3 options, because you get great performance and you don’t have to …

ArrayFire Examples (Part 2 of 8) – Benchmarks

ArrayFireArrayFire, Benchmarks, CUDA Leave a Comment

This is the second in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the benchmarks/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 304.54 GPU0 Quadro 6000, 6144 MB, Compute 2.0 (single,double) Display Device: GPU0 Quadro 6000 Memory Usage: 5549 MB free (6144 MB total)… Blas This example shows a simple bench-marking process using ArrayFire’s matrix multiply routine. For more information on Blas, click here. The data measured in this example is the Giga-Flop (GFLOP Floating Point Operations Per Second). I got the following results using …

GTC 2013 Tutorial – CUDA Accelerated Image Processing Libraries

John MelonakosArrayFire, CUDA, Events Leave a Comment

The 2013 GPU Technology Conference is just two weeks away. We’re super excited. We’re spending a lot of time preparing for our tutorial on CUDA Accelerated Image Processing Libraries. We think it will be well worth your while to attend. This is an 80-minute share all about CUDA image processing from James Malcolm, an AccelerEyes co-founder and lead engineer. You will walk away from the tutorial much better prepared to build fast computer vision and image processing codes. The session abstract is as follows: Image processing has consistently proven to benefit greatly from GPU acceleration. A number of libraries available from NVIDIA and AccelerEyes make image processing development efficient and lead to big speedups. Using these libraries can often significantly shorten …

Benchmarking Tesla K20

Pavan YalamanchiliArrayFire, Benchmarks, CUDA 1 Comment

In this blog post, we are going to compare NVIDIA’s latest high end offering, the Tesla K series (PDF) with their previous offering. In particular we are comparing the Tesla K20C with Tesla C2070/2075. This blog post follows a similar post about benchmarking the GTX680 we did last year. We take a look at similar set of functions (and a little bit more) to see what benefits the newer line brings. All of the benchmarks were done using double precision. In all of the graphs, higher trendlines are better. Matrix Multiplication In house at AccelerEyes, we use matrix multiplication as the gold standard for testing the maximum performance of all new GPUs we end up with. The K20c reaches a peak at …

7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps

ArrayFireArrayFire, CUDA, OpenCL Leave a Comment

In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

ArrayFire Reception in France

John MelonakosArrayFire, Case Studies, CUDA, OpenCL Leave a Comment

As an engineers company, we spend a lot of time wrestling in the weeds of low-level GPU and accelerator codes. This is our battleground, and it can often be dizzying in its complexity. Our whole purpose is to hide that mess and tame those low-level beasts so that ArrayFire users get better performance than anyone else. The joy of ArrayFire comes when we get feedback from ArrayFire users, often from different parts of the world. For instance, the week I share excerpts from two recent emails we received in France: 1) From Barep, a French manufacturing company:  “I think ArrayFire is a ‘must have’ library. It’s very easy to use and can be used under Linux and Windows. Personally, I’m happy …