7 Highlights of GTC 2013 – Day 4 of 4

John MelonakosEvents Leave a Comment

Day 4 at GTC is always a little less hyped than the first 3 days, but it is when some of the best sessions are found. Here are 7 of the highlights we’ve collected from our team on the last day of GTC 2013: Paulius Micikevicius of NVIDIA gave a great talk entitled, “Performance Optimization: Programming Guidelines and GPU Architecture Details Behind Them.” It was so great, we have 2 highlights from this talk. The first Paulius highlight is the information about how instruction level parallelism is essential to fully take advantage of Kepler GPUs. Paulius gave a clear presentation on these difficult concepts. The second Paulius highlight is the thorough treatment of memory hierarchy for Kepler. It is very detailed and …

GTC 2013 Tutorial – CUDA Accelerated Image Processing Libraries

John MelonakosArrayFire, CUDA, Events Leave a Comment

The 2013 GPU Technology Conference is just two weeks away. We’re super excited. We’re spending a lot of time preparing for our tutorial on CUDA Accelerated Image Processing Libraries. We think it will be well worth your while to attend. This is an 80-minute share all about CUDA image processing from James Malcolm, an AccelerEyes co-founder and lead engineer. You will walk away from the tutorial much better prepared to build fast computer vision and image processing codes. The session abstract is as follows: Image processing has consistently proven to benefit greatly from GPU acceleration. A number of libraries available from NVIDIA and AccelerEyes make image processing development efficient and lead to big speedups. Using these libraries can often significantly shorten …

Benchmarking the new Kepler (GTX 680)

Pavan YalamanchiliBenchmarks, CUDA 13 Comments

NVIDIA has launched their next generation GPU based on their Kepler Architecture. They followed it up with a rather quick update to their CUDA toolkit. Considering that we have access to 3 generations of their GTX cards (480, 580 and 680), we thought we would show case how the performance has changed over the generations. Matrix multiplication: It can be seen that the GTX 680 breaches the 1 Terraflop mark comfortably for single precision, while the GTX 580 barely scratches it. However the performance seems to peak around 2048 x 2048 and then rallies downward to match the performance of the GTX 580 at larger sizes. The high end Tesla C2070 finishes last for single precision behind the third placed …