Webinar - OpenCL vs CUDA Comparisons

ArrayFire ArrayFire, CUDA, Events, OpenCL, Webinar 4 Comments

In case you missed it, we recently held an ArrayFire Webinar, focused on exploring the tradeoffs of OpenCL vs CUDA. This webinar is part of an ongoing series of webinars held each month to present new GPU software topics as well as programming techniques with Jacket and ArrayFire. For those of you who missed it, we provide a recap here. Lots of questions were fielded by our team, so it's a must-watch. We hope to see you at the next one! Recap Download the slides.  Here is a transcript of the content portion of the webinar: AccelerEyes is pleased to present today's ArrayFire webinar looking at OpenCL and CUDA Trade-offs and Comparisons. Everyday, we interact with many programmers in various stages of GPU ...

NVIDIA Fermi with CUDA and OpenCL

ArrayFire Benchmarks, CUDA, OpenCL 1 Comment

In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket.  If you have not reviewed that blog post to gain some insight into our progress you can access it here - http://blog.accelereyes.com/blog/2008/12/30/opencl/. Some things have changed since that original post.  For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples.  Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance.  The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak). Before we present the benchmarks, we should comment on ...

OpenCL

John CUDA, OpenCL 4 Comments

We often get questions such as the one we just received via email: 1) Any idea if you will be supporting AMD/ATI cards in future ? 2) Have you considered OpenCL as a potential pathway for the future ? I can see an advantage there for you (if it takes off) in that you're not tied to a single vendor any more and potentially you'd be able to take advantage of other accelerators that may support it. It's very early days yet but certainly from our point of view the current paradigm of code to a single vendors card doesn't seem sustainable.. OpenCL is a community effort to create a standard for parallel computing, with early emphasis on GPGPU computing, ...

ArrayFire v3.6 Release

Umar Announcements, ArrayFire 3 Comments

Today we are pleased to announce the release of ArrayFire v3.6.  It can be downloaded from these locations: Official installers GitHub repository This latest version of ArrayFire is better than ever! We added several new features that improve the performance and usability of the ArrayFire library. The main features are: Support for batched matrix multiply Added the topk function Added the anisotropic diffusion filter We have also spent a significant amount of effort improving the internals of the library. The build system is significantly improved and organized. Batched Matrix Multiplication The new batch matmul allows you to perform several matrix multiplication operations in one call of matmul. You might want to call this function if you are performing multiple smaller matrix multiplication operations. Here ...

ArrayFire v3.5.1 Release

miguel@arrayfire.com Announcements, ArrayFire 1 Comment

We are excited to announce ArrayFire v3.5.1! This release focuses on fixing bugs and improving performance. Here are the improvements we think are most important: Performance improvements We've improved element-wise operation performance for the CPU backend. The af::regions() function has been modified to leverage texture memory, improving its performance. Our JIT engine has been further optimized to boost performance. Bug fixes We've squashed a long standing bug in the CUDA backend responsible for breaking whenever the second, third, or fourth dimensions were large enough to exceed limits imposed by the CUDA runtime. The previous implementation of af::mean() suffered from overflows when the summation of the values lied outside the range of the backing data type. New kernels for each of ...

ArrayFire v3.5 Official Release

Umar Announcements, ArrayFire, CUDA, Open Source, OpenCL 1 Comment

Today we are pleased to announce the release of ArrayFire v3.5, our open source library of parallel computing functions supporting CUDA, OpenCL, and CPU devices. This new version of ArrayFire improves features and performance for applications in machine learning, computer vision, signal processing, statistics, finance, and more. This release focuses on thread-safety, support for simple sparse-dense arithmetic operations, canny edge detector function, and a genetic algorithm example. A complete list of ArrayFire v3.5 updates and new features are found in the product Release Notes. Thread Safety ArrayFire now supports threading programming models. This is not intended to improve the performance since most of the parallelism is happening on the device, but it does allow you to use multiple devices in ...

Xilinx SDAccel Training

Xilinx SDAccel Training ArrayFire is the exclusive Xilinx SDAccel™ Authorized Training Partner (ATP) for North America. Our SDAccel training courses help enable design teams to leverage Xilinx FPGAs for OpenCL application acceleration. Course NameDeveloping and Optimizing Applications Using the OpenCL Framework for FPGAs Contact us to register for training Course Description Learn how to develop new applications written in OpenCL, C/C++, and RTL in the SDAccel development environment for use on Xilinx FPGAs. Porting existing applications is also covered. Training lectures and labs for this course will enable attendees to gain necessary skills to: identify parallel computing applications suitable for accelerating on FPGAs, understand how the FPGA architecture lends itself to parallel computing, write OpenCL programs for FPGAs, and utilize ...

GSoC 17 Ideas Page

Here are some suggestions for this year's Google Summer of Code. This list is not definitive and students may propose their own projects by creating a new topic on the ArrayFire-User Google Group with "[GSOC]" in the subject line. Improvements to the ArrayFire Library ArrayFire aims to be a portable, high performance scientific computing library. Key areas in ArrayFire can be improved include: Improving the performance of existing functions Adding support for more hardware and backends Adding a new domain of functions With these in mind, we suggest the follow ideas for prospective GSOC17 students. IMPLEMENT PARALLEL VERSION OF ARRAYFIRE'S CPU BACKEND This project focuses on performance improvements to ArrayFire's CPU backend. This is done by adding both vectorization and ...

ArrayFire v3.4 Official Release

John ArrayFire Leave a Comment

Today we are pleased to announce the release of ArrayFire v3.4, our open source library of parallel computing functions supporting CUDA, OpenCL, and CPU devices. This new version of ArrayFire improves features and performance for applications in machine learning, computer vision, signal processing, statistics, finance, and more. This release focuses on 5 major components of the library that are common to many areas of mathematical, scientific, and financial computing:  sparse matrix operations, random number generation, image processing, just-in-time (JIT) compilation, and visualizations. Sparse Matrix and BLAS (see blog post) Support for CSR and COO storage types Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication Conversion to and from dense matrix to CSR and COO storage types Support for Random Number Generator Engines (see blog post) Philox Threefry Mersenne Twister Image Processing (see blog post) ...

Performance Improvements to JIT in ArrayFire v3.4

Pavan Announcements, ArrayFire, Benchmarks Leave a Comment

ArrayFire uses Just In Time compilation to combine many light weight functions into a single kernel launch. This along with our easy-to-use API allows users to not only quickly prototype their algorithms, but also get the best out of the underlying hardware. This feature has been a favorite among our users in the domains of finance and scientific simulation. That said, ArrayFire v3.3 and earlier had a few limitations. Namely: Multiple outputs with inter-dependent variables were generating multiple kernels. The number of operations per kernel was fairly limited by default. In the latest release of ArrayFire, we addressed these issues to get some pretty impressive numbers. In the rest of the post, we demonstrate the performance improvements using our BlackScholes ...