Remote Off-Screen Rendering with OpenGL

Shehzan Mohammed ArrayFire, OpenGL 18 Comments

At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display. In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display. A few notes before we get started. This blog is limited to computers running distributions of Linux. The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display). AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices. Requirements You will need access to the remote …

How to Make GPU Hardware Decisions

Scott Computing Trends, CUDA, Hardware, OpenCL Leave a Comment

We get questions all the time about how to make GPU hardware decisions. We’ve seen just about every scenario you can imagine, and so we always jump at the chance to help others through this decision process. Here’s a recent question from a customer. “I’ve just found your post on Analytic Bridge and have taken a look at your website … I’m replacing my two Tesla M1060 cards (computing capability too low) and I’m considering used Tesla M2070s or the new GTX 760 cards. Could you offer any insight? I believe the GTX 760 cards may well outperform the older 2070s and are much cheaper.” And here’s our response. “The GTX 760 will probably outperform the M2070 for single precision …

Benchmarking Tesla K20

Pavan Yalamanchili ArrayFire, Benchmarks, CUDA 1 Comment

In this blog post, we are going to compare NVIDIA’s latest high end offering, the Tesla K series (PDF) with their previous offering. In particular we are comparing the Tesla K20C with Tesla C2070/2075. This blog post follows a similar post about benchmarking the GTX680 we did last year. We take a look at similar set of functions (and a little bit more) to see what benefits the newer line brings. All of the benchmarks were done using double precision. In all of the graphs, higher trendlines are better. Matrix Multiplication In house at AccelerEyes, we use matrix multiplication as the gold standard for testing the maximum performance of all new GPUs we end up with. The K20c reaches a peak at …

LIBJACKET on Amazon EC2 GPU Cloud Instances

Pavan Yalamanchili Benchmarks, CUDA 1 Comment

Amazon recently added GPUs to their Elastic Compute Cloud. We decided to throw LIBJACKET into this GPU cloud to see how it would fare. The $2/hr pay-on-demand pricing is a great option for many Jacket programmers. This post is full of screenshots detailing the steps we took to get going with GPU computing in Amazon’s cloud: Sign up with Amazon EC2 Launch a GPU instance Login to the instance using ssh Setup the environment Download, build, and test LIBJACKET! Everything in this post applies equally well to running Jacket for MATLAB® on EC2. Simply install MATLAB + Jacket in your Amazon GPU instance and start working over ssh.

Tesla C2050 versus C1060 on Real MATLAB Applications

John Melonakos Benchmarks 7 Comments

Following our recent Jacket v1.4 Fermi architecture release, many of you requested data comparing the new NVIDIA Fermi-based Tesla C2050 versus the older Tesla C1060. Over the years, AccelerEyes has developed an extensive suite of benchmark MATLAB applications, which are included in every Jacket installation. Using this suite of tests, we compared performance of the C2050 vs C1060 and are pleased to report the results here. We hope this information will be useful to Jacket programmers. All tests were run on the same standard workstation with Jacket 1.4. The only thing that changed was the actual GPU board. In every case the C2050 beat the C1060. Double-precision examples on the Fermi-based board outperformed the older board by 50% in every …

Jacket for MATLAB now available for NVIDIA Fermi!

ArrayFire Announcements 2 Comments

We are pleased to announce Jacket 1.4, with support for the latest NVIDIA graphics processing units based on the Fermi architecture (Tesla 20-series and GeForce GTX 4xx-series). NVIDIA’s release of the Fermi architecture brings with it 448 computational cores, increased IEEE-754 floating-point arithmetic precision, error-correcting memory for reliable computation, and enhanced memory caching mechanisms. Highlights for Jacket 1.4 are as follows: Added support for the NVIDIA Fermi architecture (GTX400 and Tesla C2000 series) Jacket DLA support for Fermi Dramatically improved the performance of Jacket’s JIT (Just-In-Time) compilation technology Operations involving random scalar constants do not incur a recompile Removed dependencies on MINGW and NVCC Logical indexing now supported for SUBSREF and SUBSASGN, e.g. B = A(A > x) MTIMES supports …