NVIDIA Fermi with CUDA and OpenCL

ArrayFireBenchmarks, CUDA, OpenCL 1 Comment

In December of 2008, we did a blog post answering questions from customers and prospects about the use of OpenCL for Jacket.  If you have not reviewed that blog post to gain some insight into our progress you can access it here – http://blog.accelereyes.com/blog/2008/12/30/opencl/.

Some things have changed since that original post.  For example, NVIDIA now provides an OpenCL driver, toolkit, programming guide, and SDK examples.  Given the new tools available and the new Fermi hardware, we ran some tests on the Tesla c2050 to compare OpenCL performance to CUDA performance.  The Tesla C2050 is an amazing beast of a card, providing upto 512 Gigaflops of double precision arithmetic (at peak).

Before we present the benchmarks, we should comment on the programmability of OpenCL versus CUDA.  OpenCL is notably more difficult to program and debug than CUDA since OpenCL documentation, tools, and scientific computation libraries are still very limited.  Considering these handicaps, only a few matrix / vector operations were considered for this benchmark.  All the vector operations are modified versions of the SDK examples provided by NVIDIA.  All the tests were for single precision numbers.

Here are the results:

MatrixMultipleFermi

VectorAddFermiCUDAOpenCL

reductionsfermicudaopencl

The results indicate that there is an overhead when using OpenCL with smaller data sizes, which seems to disappear at larger data sizes. Currently it is unknown whether the overhead is due to the time taken to launch a kernel in OpenCL or something else within the API.

This is our report of the current status of OpenCL relative to CUDA on the new NVIDIA hardware.  We continue to watch the progress of OpenCL, ATI, and other GPU computing initiatives.  Our focus is to deliver the best GPU computing platform on the planet for engineers, scientists, analysts, and students, and we guarantee that Jacket customers will always have the very best in GPU hardware choices for your applications.  As the GPU landscape continues to evolve, your Jacket code will simply get faster without you having to do anything new.  So, we invite you to sit back, relax, and enjoy watching your Jacket code scale with each new release!

Useful links:

1)  http://www.anandtech.com/show/2977/nvidia-s-geforce-gtx-480-and-gtx-470-6-months-late-was-it-worth-the-wait-/6

2)  http://www.pcgameshardware.com/aid,743498/Geforce-GTX-480-and-GTX-470-reviewed-Fermi-performance-benchmarks/Reviews/

3) http://www.appleinsider.com/articles/08/12/10/nvidia_pioneering_opencl_support_on_top_of_cuda.html

4)  http://www.sisoftware.co.uk/?d=qa&f=gpu_opencl&a=AMD

5)  http://unigine.blogspot.com/2010/02/cuda-vs-opencl-vs-directcompute.html

Comments 1

Leave a Reply

Your email address will not be published. Required fields are marked *