Big announcement here at #APU13! AMD CTO, Mark Papermaster, just announced 2 additions to the 2014 Mobile APU roadmap http://t.co/sWHMhb9AAe — AMD (@AMD) November 13, 2013 Today was the final day of AMD’s APU 2013 conference. The theme of today was mostly focused on gaming topics, so it was not as relevant to technical computing as yesterday. However, the mobile product announcement from AMD in the tweet above was interesting. OpenCL is just as important in mobile computing as it is in HPC computing. Both ends of the spectrum have a need for speed and can achieve it through great data parallelism. AMD is looking to make better inroads into mobile computing with these APU announcements. Overall, APU 2013 was a fantastic …
APU 2013 – Day 2 Recap
Today was the first full day of AMD’s APU 2013 conference. It was a whirlwind of heterogeneous computing. From the morning keynotes, three particular salient points stuck out to us: Mike Muller, CTO at ARM, talked about heterogeneous computing. He said it nicely with, “Heterogeneous computing is the future. It has also been our past, but we didn’t notice because a few shiny companies overshadowed everything else.” That is a great way to describe it. The future of heterogeneous computing involves the rise in importance of non-x86 processors. Throwing a few more MHz onto a CPU no longer is capable of satiating computational demands. Nandini Ramani, VP at Oracle, talked about the importance of Java for heterogeneous computing. She pointed …
APU 2013 – Day 1 Recap
AMD’s APU 2013 kicked off today with keynotes and a welcome reception. The developer summit is themed as the epicenter of heterogeneous computing. AMD has a world class CPU and a world class GPU and is pushing the industry forward by combining both of those devices into the same chip, the APU. AMD’s APUs are programmable via OpenCL, the industry standard for heterogeneous development. AMD is also leading the way with standards for Heterogeneous System Architecture (HSA). APU13 will have many technical sessions, keynotes, and demos around OpenCL and HSA. We are at the APU conference demoing ArrayFire acceleration on two of AMD’s newest hardware offerings: A machine with the latest AMD Radeon R9 209X discrete GPU A machine with the …
Beamforming with ArrayFire
Alessandro Savoia and researchers at Università degli Studi Roma Tre have achieved an order of magnitude improvement in the performance of a beamforming application using ArrayFire for GPU acceleration with CUDA-capable NVIDIA GPUs. This application involves conventional beamforming. Steps include the application of a time delay to each signal vector, summation across all vectors, and processing on the result. Processing includes demodulation, envelope extraction, and logarithmic compression. ArrayFire’s functions for shifting, interpolation, and filtering made this application possible for acceleration on GPUs and reduced the time to develop significantly. Alessandro’s benchmarks show that a CPU-only version was only running at 1 frame/sec, while the ArrayFire-accelerated version was running at 10-20 frames/sec, depending on the dataset. Alessandro and his team are looking forward to …
Heterogeneous Computing Trends for Dummies
Ten days ago, I posted an article on CPU Processing Trends for Dummies. Today, I continue that series with an article describing the latest major trend in computing, namely Heterogeneous Computing. The Point The point of these articles is to paint the high-level picture for trends in computer processing. I hope this bigger picture will help summarize things for those that do not breathe computer processors and technical software on a daily basis. Over the last 20 years, big gains in computer processing have been defined by increases in CPU clock speeds, then by increases in the number of CPU cores. The next 10+ years will be defined by heterogeneous computing. Heterogeneous Computing So let’s start with a definition: Heterogeneous …
CPU Processing Trends for Dummies
Over the years at AccelerEyes, it has been surprising to me how many people miss a big picture understanding of the trends affecting the computing industry. To help, I’m going to post a few articles with high-level explanations. I’m going to do so in a hand-wavy manner. I look forward in advance to the lively comments on my mistakes. But, in general, I think these posts will be a fairly accurate view of the important trends. Today, I’ll start by talking about CPU processing trends. Let’s start with something we all know: CPUs are central processing units and are the main processor in the computer. You probably had to label the CPU on a diagram at some point in grade school, …
7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps
In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …
How much speedup can you get with CUDA or OpenCL?
Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware: The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes: In general, accelerators will outperform CPUs to …