Remote Off-Screen Rendering with OpenGL

Shehzan Mohammed July 9, 2014ArrayFire, OpenGL 18 Comments

At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display.
In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display.

A few notes before we get started.

This blog is limited to computers running distributions of Linux.
The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display).
AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices.

Requirements

You will need access to the remote system over SSH.
To run the tool, you will need libGL.so and libX11.so. Another tool I would recommend strongly is glewinfo. Most linux distributions ship this with the glew-utils package. An alternate to glewinfo is glxinfo which is present on all systems with X. You can substitute glewinfo with glxinfo in all the commands below if needed.

Configuring X (NVIDIA Only)

To get X to running on NVIDIA cards, we need to make changes to the xorg.conf file. Before making the change, make sure you create a back up of the current version on your system (name it xorg.conf.stable).
You can find a sample xorg.conf file here. The sample file is for a NVIDIA GTX 690. The specific things to notice are the use of “UseDisplayDevice” option under “Screen” and the “Virtual” option under “SubSection Display”. You can use parts of this file to configure your own config file. Make sure the options listed in the sample file get listed in your config file as well.
Save and close the file.

Now run this command: # nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
Restart the system.

Note: To abort trying this, just copy xorg.conf.stable back to xorg.conf and restart.

The following is only for GeForce cards. Quadro and Tesla cards can skip to Initial Diagnosis section.

On restart, run the command # /usr/bin/X :0 &. Ideally, this command should give an output similar to:

X.Org X Server 1.13.0
Release Date: 2012-09-05
X Protocol Version 11, Revision 0



Loading extension GLX
Loading extension NV-GLX
Loading extension NV-CONTROL

This means X has started successfully on the virtual display.
If it fails, restart the system and try again. You can also look at /var/log/Xorg.0.log for the log of the failure.

Update: Known issue with Starting X

If the output is not similar to the one shown above, run the Initial Diagnosis section below. If the output shows an error as follows:

$ env DISPLAY=:0 glewinfo
X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  137 (NV-GLX)
  Minor opcode of failed request:  4 ()
  Resource id in failed request:  0x200004
  Serial number of failed request:  39
  Current serial number in output stream:  39

run the following commands:

sudo mv /usr/lib/xorg/modules/extensions/libglx.so /usr/lib/xorg/modules/extensions/libglx.so.orig
sudo ln -s /usr/lib/xorg/modules/extensions/libglx.so.XXX.YY /usr/lib/xorg/modules/extensions/libglx.so

Where XXX.YY is the NVIDIA driver version.

Now try starting X again.

Initial Diagnosis

Once X has started successfully, run echo $DISPLAY. It is very likely that the output of this will be empty.
If you have glewinfo installed, run the following command env DISPLAY=:0 glewinfo | less.
The goal of this command is to run glewinfo having temporarily set DISPLAY to :0 (virtual display on remote system). If this command runs successfully, you should be able to see the graphics card on the remote system along with the full OpenGL profile. And now you are ready to deploy applications using X.

If you want to set DISPLAY for the entire session, run export DISPLAY=:0. To set DISPLAY permanently, add the same line to your bashrc file.

Deploying Off-Screen Rendering Applications on Remote System

This is where X becomes crucial. Tools like GLFW may or may not work on remote systems because of their dependence on Xrandr and other software. The trick is use X to create an OpenGL context, and the run everything using off-screen rendering using framebuffers and renderbuffers. I took the source to create an OpenGL context using X from and modified it slighly. Thanks to the folks at OpenGL.org for providing this. The source code with the changes I made can be found here: glContext.hpp.

Include this in your source code. To create context and delete contexts run createGLContext() and deleteGLContext() respectively.

I have specified that the minimum OpenGL version should be 4.4 with forward context enabled. You can modify this version by changing the values at lines 26 and 27 of glContext.hpp.
The forward compatibility can be disabled at line 241 by changing line 241 to None.
If the application fails to create the specified version of the context, the application will exit.

At line 189, we create the window. The API description can be found here. “0, 0, 10, 10” specifies the top-left corner (0,0) and the width and height of the window (10, 10). Since the goal of this is to be used for off-screen rendering, the size of the window has no effect on the rendering.
Line 202 specifies the window title.

If everything goes successfully, you should see an output like this:

glContext.hpp:297: GL Version  = 4.4.0 NVIDIA 331.79
glContext.hpp:298: GL Vendor   = NVIDIA Corporation
glContext.hpp:299: GL Renderer = GeForce GTX 690/PCIe/SSE2
glContext.hpp:300: GL Shader   = 4.40 NVIDIA via Cg compiler

If you wish to use a lower minimum version, say 3.0, then the version provided by the output will reflect 3.0 since GL fetches version from the context.

To test the context creation as a stand alone, create a cpp file with the following contents (glContext.cpp):

#include "glContext.hpp"

int main(int argc, char* argv[])
{
    createGLContext();
    deleteGLContext();
    return 0;
}

Compile this with g++ -o gl glContext.cpp -lGL -lX11 and run with ./gl (make sure DISPLAY is :0). If this works successfully, any off-screen rendering code will work perfectly with this.
Note: If you use GLEW, make sure you include glew.h before including gl.h.

If you wish to know about remote OpenGL further or work with us on remote rendering, contact us at technical@arrayfire.com.

Links:
OpenGL Context Creation Tutorial
ArrayFire glContext repo on Github

Comments 18

Randi Cabezas
July 17, 2014 at 5:31 pm

Hi,
Thanks for the awesome post. I had some issues using ubuntu 12.04 LTS and GTX 580. I run into issues with the initial diagnostics. Namely, “$env DISPLAY=:0 glewinfo | less” reports:
No protocol specified
No protocol specified
Error: glewCreateContext failed

The xorg log does not show any errors and “$/usr/bin/X :0 &” tells me that X is already running.

Any ideas on how to troubleshoot some more?
Thanks in advance.

Reply
1. Shehzan Mohammed
  July 17, 2014 at 6:58 pm
  
  Try running glxinfo with and without the evn DISPLAY=:0. I would assume that the one without the display would give OpenGL version 2.1. With the display, I thing it would error out like glewinfo.
  
  Did you make the required changes to the xorg.conf and also run the nvidia-xconfig command?
  
  Lastly, did you the see the “Loading extension GLX…” output from the the log?
  Note: all the commands with # need to be run as root or using sudo.
  
  Try this: restart your system. Run “sudo ls”. This will ask for password and get you a buffer time to run more sudo commands without entering the password each time. After sudo ls, run the “sudo /usr/bin/X :0 &” command.
  
  Reply
  1. Randi Cabezas
    July 17, 2014 at 10:11 pm
    
    Hi,
    glewinfo (and glxinfo) without DISPLAY=:0 both work. (but they give me the info of my local machine. The graphics card info is the local one not the remote one, openGL version is 2.1.2 and display is localhost:10.0. glxinfo produces the same error as glewinfo when using DISPLAY=:0.
    
    My xorg.conf has the same format as the sample one, with the exceptions that I have a single monitor/screen/device. I’ll try with the double configuration.
    
    Yeah, I see the “[ 41.772] (II) Loading extension GLX” log entry
    
    X is running when the machine restarts. running “sudo /usr/bin/X :0 &” yields:
    “Fatal server error:
    Server is already active for display 0
    If this server is no longer running, remove /tmp/.X0-lock
    and start again.”
    
    I also see /user/bin/X under list of current processes.
    
    Any other idea of what to try?
    Thanks
    
    Reply
    1. Shehzan Mohammed
      July 17, 2014 at 10:39 pm
      
      You can keep the single configuration. The sample one has 2 of all because a 690 has 2 GPUs.
      
      It looks like X is not starting properly. I’m not completely sure why this is.
      
      Can you revert back to your old xorg.conf file, restart the computer and confirm that X is not running when it restarts?
      
      It may also be an architecture issue. All the GPUs we tested were Kepler. I’ll try testing it on a Fermi and get back to you.
      
      Reply
      1. Randi Cabezas
        July 18, 2014 at 7:56 pm
        
        Hi,
        I kept the single configuration as you suggested.
        I reverted to the old xorg.conf file and indeed X did not start when the machine is restarted.
        
        I also tried on a different machine with a GTX 780 (kepler) and had the same issue.
        I checked the log entries, it was fine, no errors and GLX loaded.
        
        Any ideas?
        Thanks
      2. Shehzan Mohammed
        July 18, 2014 at 8:11 pm
        
        Can you email your xorg.conf file to shehzan@arrayfire.com. Can you also send you log file?
      3. Randi Cabezas
        July 25, 2014 at 7:33 pm
        
        Hello,
        After some debugging and troubleshooting with Shehzan we have figured out that my system had X running at boot for some reason. The solution is to kill it and then the procedure would work as outlined above.
        
        1. Check if X is running
        $ps aux | grep X
        
        my system produced:
        “root 1945 0.1 0.1 212380 51960 tty7 Ss+ 14:23 0:04 /usr/bin/X :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch”
        
        2. If you get a similar output, kill X:
        $sudo service lightdm stop
        
        3. Check if X is still running
        $ps aux | grep X
        
        4. start local X
        $/usr/bin/X :0 &
        
        5. check the configuration
        $env DISPLAY=:0 glewinfo | less
        
        Note the first time step 5 is ran from any SSH session it would be instant, subsequent calls will be much slower (~ 1.5 minutes for the info to show up.) Not sure why…
Michael Ivanov
September 9, 2014 at 7:59 am

Are you sure that gl context init code will actually work on remote headless servers?From my previous attempts I always had to create also a dummy PBuffer to get the actual context.In your case you rely upon Window object which is not accessible(at least in my attempts) on the remote servers like those of Amazon EC2 .

Reply
1. JR Smith
  February 10, 2015 at 12:53 am
  
  I just got this working on an GPU Enabled Ubuntu AWS instance. I didn’t have to modify the code, just some conf files and drivers. The sample .cpp program he provided works for me. I hope that helps narrow down any issues you might encounter.
  
  Reply
JR Smith
February 10, 2015 at 12:52 am

Thanks for posting this. I had to change a few things around to get this to work on the system I’m supporting (and I need to integrate it into the whole codebase). That said, this guide was a way better starting point than anything else I found.

Reply
1. Shehzan Mohammed
  November 2, 2015 at 4:27 pm
  
  If you can let me know what you had to change, I’ll be happy to update the blog.
  
  Reply
Pingback: Unix:How to efficiently use 3D via a remote connection? – Unix Questions
Spiky
October 31, 2015 at 6:42 pm

Won’t headless openGL work with Xvfb? That sounds much simpler.

http://simonwalton.me/blog/?p=63

Reply
1. Shehzan Mohammed
  November 2, 2015 at 4:27 pm
  
  Thanks for that. I tried testing this out but was unsuccessful. I got the XVFB to run, but the OpenGL context fails. Here are the steps I used:
  
  sudo Xvfb :12 -screen 0 800x600x24 &
  [1] 24161
  Initializing built-in extension Generic Event Extension
  Initializing built-in extension SHAPE
  Initializing built-in extension MIT-SHM
  Initializing built-in extension XInputExtension
  Initializing built-in extension XTEST
  Initializing built-in extension BIG-REQUESTS
  Initializing built-in extension SYNC
  Initializing built-in extension XKEYBOARD
  Initializing built-in extension XC-MISC
  Initializing built-in extension SECURITY
  Initializing built-in extension XINERAMA
  Initializing built-in extension XFIXES
  Initializing built-in extension RENDER
  Initializing built-in extension RANDR
  Initializing built-in extension COMPOSITE
  Initializing built-in extension DAMAGE
  Initializing built-in extension MIT-SCREEN-SAVER
  Initializing built-in extension DOUBLE-BUFFER
  Initializing built-in extension RECORD
  Initializing built-in extension DPMS
  Initializing built-in extension Present
  Initializing built-in extension DRI3
  Initializing built-in extension X-Resource
  Initializing built-in extension XVideo
  Initializing built-in extension XVideo-MotionCompensation
  Initializing built-in extension SELinux
  Initializing built-in extension GLX
  
  ~$ export DISPLAY=:12
  ~$ glewinfo
  Xlib: extension “GLX” missing on display “:12”.
  Error: glewCreateContext failed
  Xlib: extension “GLX” missing on display “:12”.
  6 XSELINUXs still allocated at reset
  SCREEN: 0 objects of 256 bytes = 0 total bytes 0 private allocs
  DEVICE: 0 objects of 96 bytes = 0 total bytes 0 private allocs
  CLIENT: 0 objects of 144 bytes = 0 total bytes 0 private allocs
  WINDOW: 0 objects of 48 bytes = 0 total bytes 0 private allocs
  PIXMAP: 1 objects of 16 bytes = 16 total bytes 0 private allocs
  GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
  CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
  TOTAL: 6 objects, 88 bytes, 0 allocs
  1 PIXMAPs still allocated at reset
  PIXMAP: 1 objects of 16 bytes = 16 total bytes 0 private allocs
  GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
  CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
  TOTAL: 6 objects, 88 bytes, 0 allocs
  4 GCs still allocated at reset
  GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
  CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
  TOTAL: 5 objects, 72 bytes, 0 allocs
  1 CURSORs still allocated at reset
  CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
  TOTAL: 1 objects, 8 bytes, 0 allocs
  
  As you can see, the OpenGL part failed to initialize. My hunch is that, even if it does get initialized, it be software based rendering which may limit it to OpenGL 2.1. I don’t think you will be able to get 4.x.
  
  If you have any other information, I will be happy to test it.
  
  Reply
xear
November 14, 2015 at 2:42 pm

Is there any reason to create a hidden window instead of a GLX pbuffer?

Reply
1. Shehzan Mohammed
  November 20, 2015 at 10:40 pm
  
  I haven’t tested that. You can give it a shot if you want.
  
  Reply
  1. xear
    November 21, 2015 at 2:33 am
    
    I’m using that. I was just wondering whether your solution may have any known advantages.
    
    Reply
Dylan Baars
July 9, 2018 at 8:10 pm

Hi,

this was great, thanks heaps for the blog post. FYI I was trying to get this to work on a Centos 7 x64 box with a Tesla K40m (if it makes a difference – probably not) and for the “Known issue starting X” the link I had to make was:

ln -s /usr/lib64/nvidia/xorg/libglx.so /usr/lib64/xorg/modules/extensions/libglx.so

/usr/lib64/nvidia/xorg/libglx.so is itself a symlink, as per:

[zgraphics@hamlmg01 xorg]$ pwd
/usr/lib64/nvidia/xorg
[zgraphics@hamlmg01 xorg]$ ls -l
total 13684
lrwxrwxrwx. 1 root root 16 Apr 17 16:46 libglx.so -> libglx.so.390.30
lrwxrwxrwx. 1 root root 16 Apr 17 16:46 libglx.so.1 -> libglx.so.390.30
-rwxr-xr-x. 1 root root 14008936 Feb 1 18:31 libglx.so.390.30

I do have a problem that although user root seems to be able to start our graphical job, a “normal” user can’t, and e.g. gets

[zgraphics@hamlmg01 ~]$ glewinfo
No protocol specified
Error: glewCreateContext failed

Any idea what rights the normal user account needs?

Thanks!
Dylan

Reply