At ArrayFire, we constantly encounter projects that require OpenGL and run on a remote server that does not have a display.
In this blog, we have compiled a list of steps that users can use to run full profile OpenGL applications over SSH on remote systems without a display.
A few notes before we get started.
- This blog is limited to computers running distributions of Linux.
- The first part of the blog that shows the configuration of the xorg.conf file is limited to NVIDIA cards (with display).
- AMD cards support this capability without the modification of xorg.conf file. However, we have not been able to get a comprehensive list of supported devices.
Requirements
You will need access to the remote system over SSH.
To run the tool, you will need libGL.so
. Another tool I would recommend strongly is glewinfo. Most linux distributions ship this with the glew-utils package. An alternate to glewinfo is glxinfo which is present on all systems with X. You can substitute glewinfo with glxinfo in all the commands below if needed. and
libX11.so
Configuring X (NVIDIA Only)
To get X to running on NVIDIA cards, we need to make changes to the xorg.conf file. Before making the change, make sure you create a back up of the current version on your system (name it xorg.conf.stable).
You can find a sample xorg.conf file here. The sample file is for a NVIDIA GTX 690. The specific things to notice are the use of “UseDisplayDevice” option under “Screen” and the “Virtual” option under “SubSection Display”. You can use parts of this file to configure your own config file. Make sure the options listed in the sample file get listed in your config file as well.
Save and close the file.
Now run this command: # nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
Restart the system.
Note: To abort trying this, just copy xorg.conf.stable back to xorg.conf and restart.
The following is only for GeForce cards. Quadro and Tesla cards can skip to Initial Diagnosis section.
On restart, run the command # /usr/bin/X :0 &
. Ideally, this command should give an output similar to:
X.Org X Server 1.13.0 Release Date: 2012-09-05 X Protocol Version 11, Revision 0Loading extension GLX Loading extension NV-GLX Loading extension NV-CONTROL
This means X has started successfully on the virtual display.
If it fails, restart the system and try again. You can also look at /var/log/Xorg.0.log for the log of the failure.
Update: Known issue with Starting X
If the output is not similar to the one shown above, run the Initial Diagnosis section below. If the output shows an error as follows:
$ env DISPLAY=:0 glewinfo X Error of failed request: BadWindow (invalid Window parameter) Major opcode of failed request: 137 (NV-GLX) Minor opcode of failed request: 4 () Resource id in failed request: 0x200004 Serial number of failed request: 39 Current serial number in output stream: 39
run the following commands:
sudo mv /usr/lib/xorg/modules/extensions/libglx.so /usr/lib/xorg/modules/extensions/libglx.so.orig sudo ln -s /usr/lib/xorg/modules/extensions/libglx.so.XXX.YY /usr/lib/xorg/modules/extensions/libglx.so
Where XXX.YY is the NVIDIA driver version.
Now try starting X again.
Initial Diagnosis
Once X has started successfully, run echo $DISPLAY
. It is very likely that the output of this will be empty.
If you have glewinfo installed, run the following command env DISPLAY=:0 glewinfo | less
.
The goal of this command is to run glewinfo having temporarily set DISPLAY to :0 (virtual display on remote system). If this command runs successfully, you should be able to see the graphics card on the remote system along with the full OpenGL profile. And now you are ready to deploy applications using X.
If you want to set DISPLAY for the entire session, run export DISPLAY=:0
. To set DISPLAY permanently, add the same line to your bashrc file.
Deploying Off-Screen Rendering Applications on Remote System
This is where X becomes crucial. Tools like GLFW may or may not work on remote systems because of their dependence on Xrandr and other software. The trick is use X to create an OpenGL context, and the run everything using off-screen rendering using framebuffers and renderbuffers. I took the source to create an OpenGL context using X from and modified it slighly. Thanks to the folks at OpenGL.org for providing this. The source code with the changes I made can be found here: glContext.hpp.
Include this in your source code. To create context and delete contexts run createGLContext() and deleteGLContext() respectively.
I have specified that the minimum OpenGL version should be 4.4 with forward context enabled. You can modify this version by changing the values at lines 26 and 27 of glContext.hpp.
The forward compatibility can be disabled at line 241 by changing line 241 to None.
If the application fails to create the specified version of the context, the application will exit.
At line 189, we create the window. The API description can be found here. “0, 0, 10, 10” specifies the top-left corner (0,0) and the width and height of the window (10, 10). Since the goal of this is to be used for off-screen rendering, the size of the window has no effect on the rendering.
Line 202 specifies the window title.
If everything goes successfully, you should see an output like this:
glContext.hpp:297: GL Version = 4.4.0 NVIDIA 331.79 glContext.hpp:298: GL Vendor = NVIDIA Corporation glContext.hpp:299: GL Renderer = GeForce GTX 690/PCIe/SSE2 glContext.hpp:300: GL Shader = 4.40 NVIDIA via Cg compiler
If you wish to use a lower minimum version, say 3.0, then the version provided by the output will reflect 3.0 since GL fetches version from the context.
To test the context creation as a stand alone, create a cpp file with the following contents (glContext.cpp):
#include "glContext.hpp" int main(int argc, char* argv[]) { createGLContext(); deleteGLContext(); return 0; }
Compile this with g++ -o gl glContext.cpp -lGL -lX11
and run with ./gl
(make sure DISPLAY is :0). If this works successfully, any off-screen rendering code will work perfectly with this.
Note: If you use GLEW, make sure you include glew.h before including gl.h.
If you wish to know about remote OpenGL further or work with us on remote rendering, contact us at technical@arrayfire.com.
Links:
OpenGL Context Creation Tutorial
ArrayFire glContext repo on Github
Comments 18
Hi,
Thanks for the awesome post. I had some issues using ubuntu 12.04 LTS and GTX 580. I run into issues with the initial diagnostics. Namely, “$env DISPLAY=:0 glewinfo | less” reports:
No protocol specified
No protocol specified
Error: glewCreateContext failed
The xorg log does not show any errors and “$/usr/bin/X :0 &” tells me that X is already running.
Any ideas on how to troubleshoot some more?
Thanks in advance.
Try running glxinfo with and without the evn DISPLAY=:0. I would assume that the one without the display would give OpenGL version 2.1. With the display, I thing it would error out like glewinfo.
Did you make the required changes to the xorg.conf and also run the nvidia-xconfig command?
Lastly, did you the see the “Loading extension GLX…” output from the the log?
Note: all the commands with # need to be run as root or using sudo.
Try this: restart your system. Run “sudo ls”. This will ask for password and get you a buffer time to run more sudo commands without entering the password each time. After sudo ls, run the “sudo /usr/bin/X :0 &” command.
Hi,
glewinfo (and glxinfo) without DISPLAY=:0 both work. (but they give me the info of my local machine. The graphics card info is the local one not the remote one, openGL version is 2.1.2 and display is localhost:10.0. glxinfo produces the same error as glewinfo when using DISPLAY=:0.
My xorg.conf has the same format as the sample one, with the exceptions that I have a single monitor/screen/device. I’ll try with the double configuration.
Yeah, I see the “[ 41.772] (II) Loading extension GLX” log entry
X is running when the machine restarts. running “sudo /usr/bin/X :0 &” yields:
“Fatal server error:
Server is already active for display 0
If this server is no longer running, remove /tmp/.X0-lock
and start again.”
I also see /user/bin/X under list of current processes.
Any other idea of what to try?
Thanks
You can keep the single configuration. The sample one has 2 of all because a 690 has 2 GPUs.
It looks like X is not starting properly. I’m not completely sure why this is.
Can you revert back to your old xorg.conf file, restart the computer and confirm that X is not running when it restarts?
It may also be an architecture issue. All the GPUs we tested were Kepler. I’ll try testing it on a Fermi and get back to you.
Hi,
I kept the single configuration as you suggested.
I reverted to the old xorg.conf file and indeed X did not start when the machine is restarted.
I also tried on a different machine with a GTX 780 (kepler) and had the same issue.
I checked the log entries, it was fine, no errors and GLX loaded.
Any ideas?
Thanks
Can you email your xorg.conf file to shehzan@arrayfire.com. Can you also send you log file?
Hello,
After some debugging and troubleshooting with Shehzan we have figured out that my system had X running at boot for some reason. The solution is to kill it and then the procedure would work as outlined above.
1. Check if X is running
$ps aux | grep X
my system produced:
“root 1945 0.1 0.1 212380 51960 tty7 Ss+ 14:23 0:04 /usr/bin/X :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch”
2. If you get a similar output, kill X:
$sudo service lightdm stop
3. Check if X is still running
$ps aux | grep X
4. start local X
$/usr/bin/X :0 &
5. check the configuration
$env DISPLAY=:0 glewinfo | less
Note the first time step 5 is ran from any SSH session it would be instant, subsequent calls will be much slower (~ 1.5 minutes for the info to show up.) Not sure why…
Are you sure that gl context init code will actually work on remote headless servers?From my previous attempts I always had to create also a dummy PBuffer to get the actual context.In your case you rely upon Window object which is not accessible(at least in my attempts) on the remote servers like those of Amazon EC2 .
I just got this working on an GPU Enabled Ubuntu AWS instance. I didn’t have to modify the code, just some conf files and drivers. The sample .cpp program he provided works for me. I hope that helps narrow down any issues you might encounter.
Thanks for posting this. I had to change a few things around to get this to work on the system I’m supporting (and I need to integrate it into the whole codebase). That said, this guide was a way better starting point than anything else I found.
If you can let me know what you had to change, I’ll be happy to update the blog.
Pingback: Unix:How to efficiently use 3D via a remote connection? – Unix Questions
Won’t headless openGL work with Xvfb? That sounds much simpler.
http://simonwalton.me/blog/?p=63
Thanks for that. I tried testing this out but was unsuccessful. I got the XVFB to run, but the OpenGL context fails. Here are the steps I used:
sudo Xvfb :12 -screen 0 800x600x24 &
[1] 24161
Initializing built-in extension Generic Event Extension
Initializing built-in extension SHAPE
Initializing built-in extension MIT-SHM
Initializing built-in extension XInputExtension
Initializing built-in extension XTEST
Initializing built-in extension BIG-REQUESTS
Initializing built-in extension SYNC
Initializing built-in extension XKEYBOARD
Initializing built-in extension XC-MISC
Initializing built-in extension SECURITY
Initializing built-in extension XINERAMA
Initializing built-in extension XFIXES
Initializing built-in extension RENDER
Initializing built-in extension RANDR
Initializing built-in extension COMPOSITE
Initializing built-in extension DAMAGE
Initializing built-in extension MIT-SCREEN-SAVER
Initializing built-in extension DOUBLE-BUFFER
Initializing built-in extension RECORD
Initializing built-in extension DPMS
Initializing built-in extension Present
Initializing built-in extension DRI3
Initializing built-in extension X-Resource
Initializing built-in extension XVideo
Initializing built-in extension XVideo-MotionCompensation
Initializing built-in extension SELinux
Initializing built-in extension GLX
~$ export DISPLAY=:12
~$ glewinfo
Xlib: extension “GLX” missing on display “:12”.
Error: glewCreateContext failed
Xlib: extension “GLX” missing on display “:12”.
6 XSELINUXs still allocated at reset
SCREEN: 0 objects of 256 bytes = 0 total bytes 0 private allocs
DEVICE: 0 objects of 96 bytes = 0 total bytes 0 private allocs
CLIENT: 0 objects of 144 bytes = 0 total bytes 0 private allocs
WINDOW: 0 objects of 48 bytes = 0 total bytes 0 private allocs
PIXMAP: 1 objects of 16 bytes = 16 total bytes 0 private allocs
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 6 objects, 88 bytes, 0 allocs
1 PIXMAPs still allocated at reset
PIXMAP: 1 objects of 16 bytes = 16 total bytes 0 private allocs
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 6 objects, 88 bytes, 0 allocs
4 GCs still allocated at reset
GC: 4 objects of 16 bytes = 64 total bytes 0 private allocs
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 5 objects, 72 bytes, 0 allocs
1 CURSORs still allocated at reset
CURSOR: 1 objects of 8 bytes = 8 total bytes 0 private allocs
TOTAL: 1 objects, 8 bytes, 0 allocs
As you can see, the OpenGL part failed to initialize. My hunch is that, even if it does get initialized, it be software based rendering which may limit it to OpenGL 2.1. I don’t think you will be able to get 4.x.
If you have any other information, I will be happy to test it.
Is there any reason to create a hidden window instead of a GLX pbuffer?
I haven’t tested that. You can give it a shot if you want.
I’m using that. I was just wondering whether your solution may have any known advantages.
Hi,
this was great, thanks heaps for the blog post. FYI I was trying to get this to work on a Centos 7 x64 box with a Tesla K40m (if it makes a difference – probably not) and for the “Known issue starting X” the link I had to make was:
ln -s /usr/lib64/nvidia/xorg/libglx.so /usr/lib64/xorg/modules/extensions/libglx.so
/usr/lib64/nvidia/xorg/libglx.so is itself a symlink, as per:
[zgraphics@hamlmg01 xorg]$ pwd
/usr/lib64/nvidia/xorg
[zgraphics@hamlmg01 xorg]$ ls -l
total 13684
lrwxrwxrwx. 1 root root 16 Apr 17 16:46 libglx.so -> libglx.so.390.30
lrwxrwxrwx. 1 root root 16 Apr 17 16:46 libglx.so.1 -> libglx.so.390.30
-rwxr-xr-x. 1 root root 14008936 Feb 1 18:31 libglx.so.390.30
I do have a problem that although user root seems to be able to start our graphical job, a “normal” user can’t, and e.g. gets
[zgraphics@hamlmg01 ~]$ glewinfo
No protocol specified
Error: glewCreateContext failed
Any idea what rights the normal user account needs?
Thanks!
Dylan