Tesla C2050 versus C1060 on Real MATLAB Applications

Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060.

Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers.

All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every case and better than 2x in many cases.

Note: ECC was enabled on the Fermi boards

In addition to the standard Jacket examples, matrix multiplication with SGeMM and DGeMM was performed and plotted in the following charts. This matrix multiply implementation was developed in-house at AccelerEyes and outperforms both CUBLAS and Magma considerably, see MTIMES benchmarks. Special thanks to Torben Larsen for benchmarking results.

As we generate or receive more comparison data we will communicate results.

Comments 7

Drazick

August 3, 2010 at 12:24 pm

Would anyone evaluate the GTX460?

Anonymous
September 3, 2010 at 12:30 am

We don’t have a GTX460 in house (we do have GTX480s). The GTX480s are giving great performance, but don’t have the memory availability like the Teslas do.

Reply
1. Deburgess
  September 12, 2010 at 1:43 pm
  
  For HPC, would it better to purchase one C1060 or two GTX480s?
  
  Reply
  1. Anonymous
    September 12, 2010 at 5:25 pm
    
    It is not easy to say one way or the other because we don’t know the parameters that you face. One easy metric would be memory size. If your problem won’t fit in the smaller 480 memory footprint, then you should go with the C1060. However, if it will fit in the footprint, then the 480s will probably give you more speed (since they are Fermi-based and the C1060 is not Fermi-based).
    
    Also, the C1060 typically comes with better support and warranty than the 480s, so if those parameters are important, they may influence your decision.
    
    Also, the C1060 will be available “new” on the market for much longer than the 480s. They GeForce lines are typically not shipped more than 2 years or so as they get replaced by newer lines. But since commercial entities are building products with the C1060s in them, the Tesla lines have promised to keep the lines going for a longer period of time.
    
    Hope this helps and good luck!
    
    Reply
    1. Deburgess
      September 12, 2010 at 6:31 pm
      
      Thank you very much for your reply.
      
      We are planning to do tissue-level simulations using Continuity:
      http://www.continuity.ucsd.edu/
      
      In the long run I am concerned about double precision support for general computational problems. So the recent C2050 might be a natural choice, but it is outside our budget ($1200) and has less memory (3GB) than the C1060(4GB).
    2. Anonymous
      September 12, 2010 at 7:17 pm
      
      You’re right on all of the above. Good luck – Continuity looks like a cool project. Let us know if we can do anything to help.

fermion

October 24, 2010 at 9:32 pm

I’m going to build a big cluster based on gtx480s and not c2050s – your own benchmarks show why! little gain over c1060…
(everybody knows that 480s are almost 5 times more expensive. So even the compute-bound applications will
benefit, and bnadwidth-bound by a huge factor. )

the fact that 480s have half the mem of 2050s is not so crucial.

Comments 7

Leave a Reply to Anonymous Cancel reply