Getting Started with OpenCL on Android

Pradeep GarigipatiAndroid, Java, OpenCL 11 Comments

Mobile devices are carving their niche into the world of computing with more processing power day by day. GPUs on mobile devices have been around for a while, but using them for accelerating computation is still quite new. Until recently, the only way to access the GPU was through OpenGL. Around december 2008, Khronos released OpenCL, a generic API for accelerating non-graphics tasks. OpenCL enables us to take advantage of acceleration hardware. Since it is an open standard, many hardware vendors provide support on their devices. With the recent release of Adreno and Mali SDKs, you can now run OpenCL code on mobile GPUs. Today’s post is going to be about how to do image processing on camera feed on …

Joint Webinar with AMD: An Introduction to OpenCL Libraries

ScottAnnouncements, ArrayFire, Events, OpenCL Leave a Comment

Back by popular demand! You’re invited to join us for a second webinar held jointly with AMD to discuss productive OpenCL Programming – An Introduction to OpenCL Libraries. We had so many people attend the first one, we decided to hold a second webinar! The webinar will be held on Monday, May 19 at 9 am PT / 11 am CT / 12 pm ET. Join ArrayFire COO Oded Green as he demonstrates best practices to help you quickly get started with OpenCL programming.  Learn how to get the best performance from AMD hardware in various programming languages using the ArrayFire library. Oded will discuss the latest advancements in the OpenCL ecosystem, including cutting edge OpenCL libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. …

ARM Showcases ArrayFire OpenCL Support for Mali GPU at Supercomputing ’13

ScottArrayFire, Events, OpenCL Leave a Comment

ARM showcased ArrayFire support for the Mali GPU at the Supercomputing ’13 conference recently held in Denver.  This exciting development caught the attention of many attendees as they viewed the ArrayFire demos running in the ARM and AccelerEyes exhibits.   Energy budgets are always constrained, and form an expensive component of any HPC system. ARM Mali GPUs provide the best performance and throughput for a given energy envelope. Partnering with ARM, AccelerEyes further reduces the cost of HPC by minimizing development time and costs. AccelerEyes offers the most productive software solutions for accelerating code using GPUs, coprocessors, and OpenCL devices.  AccelerEyes delivers ArrayFire to accelerate C, C++, and Fortran codes on CUDA and OpenCL devices.  ArrayFire customers come from a wide range …

clMath: An Open Source BLAS and FFT Library for OpenCL

ScottAnnouncements, OpenCL Leave a Comment

If you’re reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire’s higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a significant boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute to the OpenCL community by open-sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have previously used AMD’s OpenCL libraries within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to join AMD and the open-source community in maintaining and improving these libraries for the benefit of all. …

7 Tips for CUDA & OpenCL Programming and How ArrayFire Helps

ArrayFireArrayFire, CUDA, OpenCL Leave a Comment

In order to get the best performance from your CUDA or OpenCL code, it is helpful to keep in mind some useful tips for optimizing performance. Note: By “accelerator” we refer to GPUs, APUs, co-processors, FPGAs, and any devices capable of running CUDA or OpenCL. Vectorized Code: Accelerators perform best with vectorized code because the computations map naturally onto arithmetic cores of the hardware. ArrayFire functions are inherently vectorized, so if you are using ArrayFire, you are writing vectorized code. Memory Transfers: Avoid excessive memory transfers. Each casting operation to and from the accelerator moves data back and forth between CPU memory and accelerator memory. ArrayFire makes many automatic optimizations to minimize these memory transfers by only transferring data when …

Upcoming CUDA & OpenCL Training Courses

John MelonakosAnnouncements, CUDA, OpenCL Leave a Comment

We’re pleased to announce upcoming CUDA and OpenCL training courses. Over the past couple of years, we’ve received numerous requests from around the world to be trained by AccelerEyes engineers. We finally got our act together and now have a nice schedule of CUDA and OpenCL training courses for 2013 within the United States: CUDA Feb 25-26, Houston, TX Mar 4-5, Baltimore/Washington D.C. Mar 25-26, Los Angeles, CA Apr 9-10, Seattle, WA Apr 15-16, San Francisco, CA May 6-7, Austin, TX May 27-28, Atlanta, GA Jun 10-11, Baltimore/Washington D.C. Jul 8-9, San Jose, CA Sep 2-3, Boston, MA Sep 23-24, Baltimore/Washington, D.C. Oct 7-8, Houston, TX Oct 21-22, Atlanta, GA Nov 4-5, Baltimore/Washington, D.C. Dec 2-3, New York, NY OpenCL …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …

OpenCL vs CUDA Comparisons

ArrayFireCUDA, Events, OpenCL 4 Comments

In case you missed it, we recently held an ArrayFire Webinar, focused on exploring the tradeoffs of OpenCL vs CUDA. This webinar is part of an ongoing series of webinars held each month to present new GPU software topics as well as programming techniques with Jacket and ArrayFire. For those of you who missed it, we provide a recap here. Lots of questions were fielded by our team, so it’s a must-watch. We hope to see you at the next one! Recap Download the slides.  Here is a transcript of the content portion of the webinar: AccelerEyes is pleased to present today’s ArrayFire webinar looking at OpenCL and CUDA Trade-offs and Comparisons. Everyday, we interact with many programmers in various stages of GPU …

NVIDIA Fermi with CUDA and OpenCL

ArrayFireBenchmarks, CUDA, OpenCL 1 Comment

In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket.  If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post.  For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples.  Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance.  The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak). Before we present the benchmarks, we should comment on …