Benchmarking Tesla K20

Pavan YalamanchiliArrayFire, Benchmarks, CUDA 1 Comment

In this blog post, we are going to compare NVIDIA’s latest high end offering, the Tesla K series (PDF) with their previous offering. In particular we are comparing the Tesla K20C with Tesla C2070/2075. This blog post follows a similar post about benchmarking the GTX680 we did last year. We take a look at similar set of functions (and a little bit more) to see what benefits the newer line brings. All of the benchmarks were done using double precision. In all of the graphs, higher trendlines are better. Matrix Multiplication In house at AccelerEyes, we use matrix multiplication as the gold standard for testing the maximum performance of all new GPUs we end up with. The K20c reaches a peak at …

How much speedup can you get with CUDA or OpenCL?

ScottArrayFire, Benchmarks, CUDA, OpenCL Leave a Comment

Everyday developers ask us to predict how much speedup they can get with CUDA or OpenCL. Rather than gaze mysteriously into a crystal ball, we ask the developers questions to explore pertinent acceleration factors. Note, we’ll use the term accelerator to include GPUs, Xeon Phi coprocessor, APUs, FPGAs, and any other CUDA or OpenCL device. The principles we discuss below are equally applicable to all of these accelerators. The following are some of the important factors that must be considered when estimating the potential for accelerated speedups: Hardware:  The more advanced the accelerator hardware, the more the speedup you get (e.g. the NVIDIA Kepler K20 outperforms the previous NVIDIA Fermi C2090 generation). Data Sizes:  In general, accelerators will outperform CPUs to …

Fast Computatio​n of Isotropic Gradients with Jacket’s Convolutions

ArrayFireBenchmarks, Case Studies, CUDA Leave a Comment

Researchers from the École Polytechnique de Montréal showed that Jacket is very efficient to rapidly calculate 2D or 3D isotropic gradients in MATLAB® code. From a mathematical point of view, the isotropic gradients are characterized by their very precise orientation compared to the standard 1D finite difference discretizations. Using convolution functions developed by AccelerEyes, the method becomes very simple to apply and provides a very fast evaluation of isotropic gradients of functions or images. This type of isotropic discretization currently has an application in computational fluid dynamics. They are useful for simulating immiscible multiphase flows using the Lattice Boltzmann Method (LBM), where the orientation of the various fluid interfaces has to be computed very frequently and precisely. In multiphase flow …

Genomics Applications on the GPU

ArrayFireBenchmarks, Case Studies, CUDA Leave a Comment

Recently, AccelerEyes held a free webinar that dealt with accelerating genomics MATLAB applications on the GPU. We recently added new genomics examples to Jacket, and wanted to use this webinar to showcase these examples and run through some code. This was part of the free series of AccelerEyes webinars that provide a great opportunity for you to interact with AccelerEyes engineers, see demos executing live on GPUs, and learn about AccelerEyes products and services. Over the course of the last decade, GPUs have continued to advance at a large pace, and are leaving CPUs behind in some ways, specifically in terms of their ability to perform massively parallel computations. Jacket is proven to be very efficient at harnessing this ability …

Optics Applications with ArrayFire

ArrayFireBenchmarks, Case Studies, CUDA, Events Leave a Comment

In case you missed it, we recently held a webinar on the Jacket GPU Computing Engine for MATLAB® and its applications to Optics and Photonics on Aug 1.  From beam propagation methods to lens design, optics engineers are enjoying the benefit of GPU computing with Jacket to accelerate MATLAB® codes. This was part of a free series of webinars that help you learn about ArrayFire (for C/C++/Fortran/Python) and Jacket (for use with MATLAB®). Anyone can attend these webinars, for they are absolutely free and open for anyone to attend and interact with AccelerEyes engineers. Learn more at http://www.accelereyes.com/webinars. Jacket allows you to envision really fast applications for GPU computing, and the team at AccelerEyes recently helped Northrop Grumman Corporation achieve …

Machine Learning with ArrayFire

ArrayFireBenchmarks, C/C++, Case Studies, CUDA, Events Leave a Comment

In case you missed it, we recently held a webinar on the ArrayFire GPU Computing Library and its applications to Machine Learning on June 15. This webinar was part of a free series of webinars that help you learn about ArrayFire and Jacket (our MATLAB® product). Anyone can attend these webinars, for they are absolutely free and open for anyone to attend and interact with AccelerEyes engineers. Learn more at http://www.accelereyes.com/webinars. Chris, a Software Engineer at AccelerEyes, explained ArrayFire’s position in the GPU computing world, and presented benchmarks where ArrayFire beats GPU libraries such as Thrust in many critical applications. He also mentioned that ArrayFire could be used either standalone, or in combination with other options for GPU computing such …

Option Pricing

ArrayFireArrayFire, Benchmarks, C/C++, Case Studies, CUDA 2 Comments

Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using Jacket to accelerate his MATLAB® code with GPUs. Andrew says, “My buddy and I are, at best, novice programmers and we couldn’t imagine having to figure out how to code all this in CUDA.” But he found Jacket to be straight-forward. With these results, he says he can see Jacket and GPUs populating Koch’s mark-to-futures cube, which contains its assets, simulations, and simulated asset prices. Modern option pricing techniques are often considered among the most mathematically complex of all applied areas of finance. Andrew shared some exemplary code to demonstrate how much leverage you can get out of Jacket and GPUs for financial computing in MATLAB® and C/C++. …

Benchmarking the new Kepler (GTX 680)

Pavan YalamanchiliBenchmarks, CUDA 13 Comments

NVIDIA has launched their next generation GPU based on their Kepler Architecture. They followed it up with a rather quick update to their CUDA toolkit. Considering that we have access to 3 generations of their GTX cards (480, 580 and 680), we thought we would show case how the performance has changed over the generations. Matrix multiplication: It can be seen that the GTX 680 breaches the 1 Terraflop mark comfortably for single precision, while the GTX 580 barely scratches it. However the performance seems to peak around 2048 x 2048 and then rallies downward to match the performance of the GTX 580 at larger sizes. The high end Tesla C2070 finishes last for single precision behind the third placed …

GPU Computing with Jacket in Automated Trader

John MelonakosBenchmarks, Case Studies Leave a Comment

The Q1 2012 issue of Automated Trader contains an excellent “Mashup!” piece reviewing software for algorithmic trading.  The article provides a wonderful glimpse into the 1-2 month adventure of Andy Webb, Automated Trader’s Founder, and Wrecking Crew building a fast trading platform from several technologies.  We heartily recommend that those of you in financial computing go subscribe to get the full story and access to ongoing developments from these Automated Trader thought leaders! The full trading platform they built was quite extensive.  The part that caught our eye was the core computational component of the pipeline.  That component involved permuting 1,000 potential pairs with cointegration tests for 350 time windows on each potential pair. The single core MATLAB® version took 70 minutes …

CUDA and OpenCL Benchmarks – Keeneland Workshop Day 1

John MelonakosBenchmarks, CUDA, Events, OpenCL 3 Comments

Today was Day 1 of the Keeneland Workshop.  Many great talks were given, across a broad range of GPU computing topics. With last week’s ArrayFire Webinar fresh in mind, it was interesting to see similar conclusions drawn in a presentation by Kyle Spafford of Oak Ridge National Laboratory.  Kyle independently ran a number of benchmarks over a period of time which show how quickly OpenCL has matured and where it yet has room for improvement.  The slide below comes from Kyle’s presentation.  For numbers >1, CUDA is faster.  For numbers <1, OpenCL is faster.  Performance in most cases is close to equivalent. Just as we showed in the ArrayFire webinar, OpenCL performance is quite comparable with CUDA performance.  The Achilles heel …