Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060.
Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers.
All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every case and better than 2x in many cases.
Note: ECC was enabled on the Fermi boards
In addition to the standard Jacket examples, matrix multiplication with SGeMM and DGeMM was performed and plotted in the following charts. This matrix multiply implementation was developed in-house at AccelerEyes and outperforms both CUBLAS and Magma considerably, see MTIMES benchmarks. Special thanks to Torben Larsen for benchmarking results.