ArrayFire v2.0 is now available for download. The second iteration of our free, fast, and simple GPU library now supports both CUDA and OpenCL devices. Major Updates ArrayFire now works on OpenCL enabled devices New and improved documentation Optimized for new GPUs–NVIDIA Kepler (K20) and AMD Tahiti (7970) New in ArrayFire OpenCL Same APIs as ArrayFire CUDA version Supports both Linux and Windows Just In Time Compilation (JIT) of kernels Parallel for: gfor Accelerated algorithms in the following domains Image Processing Signal Processing Data Analysis and Statistics Visualization And more New in ArrayFire CUDA New Signal and Image processing functions Faster transpose and matrix multiplication Better debugging support for GDB and Visual Studio Bug fixes to make overall experience better For a more complete list of the …
ArrayFire for Defense and Intelligence Applications – Joint Webinar Recap
In case you missed it, hundreds of attendees recently joined us in a special joint webinar with NVIDIA. The webinar was led by Kyle Spafford, a Senior Developer at AccelerEyes. Kyle detailed how GPU computing can be implemented in the defense and intelligence fields. Kyle specifically addressed enabling unique solutions for applications related to video analysis, recognition, and tracking using the ArrayFire software library for C, C++, and Fortran. At the conclusion of the presentation Kyle fields questions from those in attendance, including “How does ArrayFire Fortran Lib compare to CUDA Fortran?” (see 59:36 mark), “Can you target a specific GPU if you have multiple on the machine?” (56:14), and “How can I combine several kernels to one fat kernel by using …
ArrayFire for Defense and Intelligence Applications
AccelerEyes and NVIDIA invite you to participate in a joint webinar designed to help you learn about ArrayFire, a productive, easy-to-use GPU software library for C, C++, and Fortran. Major defense and intelligence institutions are discovering just how effective GPU computing can be in enabling unique solutions for applications related to video analysis, recognition, and tracking. During this informative webinar, Kyle Spafford, a Senior Software Developer at AccelerEyes, will explain how to accelerate common defense and intelligence algorithms using ArrayFire. The webinar will take place on Tuesday, September 17, 2013 at 10:00 AM PDT. Register for this webinar by clicking here. We hope you will join us as we discuss exciting developments in GPU computing software!
Application Time vs Solver Time
Last week, HPCwire ran an interesting article entitled, “Where has HPC’s math gone?” The article analyzes the increasing importance of math solvers to successful HPC outcomes. As the number of cores grows, the percentage of time HPC codes spend in solvers increases significantly. The following chart illustrates this trend nicely: ArrayFire is ideally suited for HPC applications that need to accelerate the toughest math problems. ArrayFire contains hundreds of math functions across numerous domains. In general, if the HPC community really wants to solve this problem, it will begin to invest more in libraries than in compilers that have no chance at optimizing these tough math problems automatically. Rather, it is only through expertly-tuned codes, such as those developed …
clMath: An Open Source BLAS and FFT Library for OpenCL
If you’re reading our blog, BLAS and FFT libraries likely form an important basis for your work. For instance, BLAS and FFT libraries are used in some of ArrayFire’s higher-level functions for linear algebra, signal processing, and image processing. Today, OpenCL is getting a significant boost in BLAS and FFT library availability. AMD has announced a bold and generous move to contribute to the OpenCL community by open-sourcing its APPML BLAS and FFT OpenCL libraries. At AccelerEyes, we have previously used AMD’s OpenCL libraries within our higher-level ArrayFire library. These libraries are the best BLAS and FFT OpenCL libraries available anywhere. We are thrilled to join AMD and the open-source community in maintaining and improving these libraries for the benefit of all. …
ArrayFire Examples (Part 7 of 8) – PDE
This is the seventh in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the pde/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9.1 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 319.17 GPU0 Tesla K20c, 5120 MB, Compute 3.5 (current) GPU1 Tesla C2075, 6144 MB, Compute 2.0 GPU2 Tesla C1060, 4096 MB, Compute 1.3 Display Device: GPU0 Tesla K20c Memory Usage: 5044 MB free (5120 MB total) The followings are the examples of formulating Partial Differential Equations, generally used to create a relevant computer model with several variables. In these examples, …
ISC 2013 Keynote by Stephen Pawlowski of Intel
Stephen Pawlowski of Intel gave an interesting keynote today at ISC 2013. He continued the theme of yesterday’s keynote to address challenges our market faces in getting to exascale computing. Here is a summary of the points he made during his talk: Getting to exascale by 2020 requires performance improvement of 2x every year Innovations anticipated include stacked chips and optical layers DRAM is not scaling with Moore’s Law More power goes into transferring data than in computing Need to operate transistors near threshold New materials for DRAM needed. Resistive memory could replace DRAM. Need to explore both the big die and the small die paths as we approach 2020 Big die path leads to 10 billion transistors on a …
ISC 2013 Keynote by Bill Dally of NVIDIA
Bill Dally of NVIDIA gave a wonderful keynote today at ISC 2013. He focused on addressing the challenges facing our market in getting to exascale computing. He talked about how Moore’s law is alive and well because transistors continue to double at an astonishing rate. However, the additional transistors are not translating into the same big performance gains as they did in the 1990’s. Whereas performance used to grow 50% per year, performance today is growing at a much slower pace. The biggest bottleneck to more performance is energy efficiency. Bill showed slides of chips and talked about the picojoules required to compute versus those required to move data and operands around the chip. The take home message was that …
ArrayFire Examples (Part 6 of 8) – Multiple GPUs
This is the sixth in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the multi_gpu/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9.1 (build XXXXXXX) by AccelerEyes (64-bit Linux) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 319.17 GPU0 Tesla K20c, 5120 MB, Compute 3.5 (current) GPU1 Tesla C2075, 6144 MB, Compute 2.0 GPU2 Tesla C1060, 4096 MB, Compute 1.3 Memory Usage: 4935 MB free (5120 MB total) *The following order represents the speed of GPUs in my machine from fastest to slowest: K20c, C2070, C1060. ArrayFire is capable of multi-GPU management. This capability becomes useful for benchmarking …
ArrayFire Examples (Part 5 of 8) – Machine Learning
This is the fifth in a series of posts looking at our current ArrayFire examples. The code can be compiled and run from arrayfire/examples/ when you download and install the ArrayFire library. Today we will discuss the examples found in the machine_learning/ directory. In these examples, my machine has the following configuration: ArrayFire v1.9 (build XXXXXXX) by AccelerEyes (64-bit Mac OSX) License: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX CUDA toolkit 5.0, driver 304.54 GPU0 GeForce GT 560M, 1024 MB, Compute 3.0 (single,double) Display Device: GPU0 GeForce GT 650M Memory Usage: 245 MB free (1024 MB total)… 1. K-Means Clustering – kmeans.cpp Figure 1 This is an example of K-Means Clustering Algorithm. K-Means Clustering Algorithm is a data mining technique that partitions the given data into groups by their similarities. All you need to …