I worked around a supremely annoying bug in xdm (X Display Manager) in Ubuntu 11.04. The issue was initially very puzzling to me. It looked potentially complex to correct. However the workaround turned out to be ultra simple. I like this turn of events. Here is the story.
Bug Description
I would boot my laptop, log in, and attempt to edit a file with vi, but vi complained that another session was editing this file and, more importantly, that the other vi process was still running. How is this possible? I just booted the system! I checked the process list and, sure enough, another vi had my file open. Looking more closely, I saw 2 vi processes. I exited my session, and both processes terminated. Huh? More puzzling was that this behavior was inconsistent. Sometimes after a reboot the problem never occurred at all. Sometimes after another reboot, it occurred about 1 out of 5 attempts to edit a file.
Switching consoles (ctrl-alt-f1, etc) would always crash my graphical Xorg session; I was not sure if this was general instability related to Xorg and the new Plymouth graphical subsystem, or if it was somehow related to my bug.
The idea crossed my mind that my system was compromised. Perhaps a buggy keylogger or kernel backdoor.
However I would sometimes ssh to a remote system, then attempt to attach to one of its screen sessions. But screen immediately exited as if someone was detaching me at the exact same second. So, strange symptoms could be seen on other systems as well.
Root cause
It was as if the the commands I typed were executed twice ("vi file", "screen -dr", even the ssh command used to log on the remote system), and a race condition made either the process I saw, or the process in the background, execute first, causing the other vi or screen lose exclusive access to the resource requested. In fact, I proved this by running "echo x >>file" and two lines were sometimes appended to the file instead of one. I poked around and discovered when I switched to the first console tty1 after crashing Xorg that all keyboard events in Xorg were mirrored to tty1, and vice versa! Everything suddenly made sense. Typing in my user name and password in xdm(1) was also unknowingly logging me in via login(1). The subsequent commands I typed were executed in both my xterm shells and the console shell. The inconsistency of the problem across reboots was because sometimes xdm initialized a little faster than login (thanks to the very fast Plymouth and kernel-based mode-setting), and I started typing my user name before login was completely initialized (I sometimes saw "login: rb" followed by authentication failures on tty1; the "m" of my username "mrb" was missing).
However, Web searches led me nowhere. I had the feeling that this could be complicated to investigate, perhaps related to the new Plymouth/DRM/kernel-based mode-setting subsystems.
As a shot in the dark, I randomly decided to switch graphics drivers from the open source "radeon" to the proprietary "fglrx". Although it was a workaround that made the problem go away, it introduced new problems: (1) fglrx implements a way-too-distracting color dithering mechanism that dynamically varies the dithering depending on the shades of colors on the screen (eg. scroll a browser page and its gray background becomes darker because some even darker elements scrolled off the page), (2) fglrx dynamically changes the brightness in a brain-dead way (eg. switch to a workspace with dark-colored xterms and it reduces the general brightness of the LCD screen). I was spending an inordinate amount of time manually adjusting the brightness (which also affected dithering) when switching workspaces or tasks...
fglrx was so maddening, that a few months later I decided I would rather switch back to radeon and attempt to fix the original problem.
My regularly updated Ubuntu system now exposed the bug in a more flagrant way: xdm and getty were completely sharing tty1 as seen in the picture. Before, the tty1 textual output was invisible from the Xorg session. Now, it overlays it.
More Web searches led me to Ubuntu bug #767168. One key similarity between the reporter and I is that we are both running xdm, as opposed to gdm. One comment, a link to a forum, gives a hint that creating an unspecified /etc/init/xdm.conf upstart script fixes the problem. There is no information about its order with respect to others scripts (which later turned out to be critical).
Fix
Inspired by this vague comment, I found out that the order of the xdm startup script does matters (when it should not in theory). The jobs for runlevel 2 launched in parallel by upstart include:
- /etc/init/tty1.conf through tty6.conf which are hard coded to launch getty on tty1 through tty6, and
- /etc/init/rc.conf which runs "/etc/init.d/rc 2" which runs "/etc/rc2.d/S99xdm start", which attempts to grab the first available terminal (normally tty7)
Xorg/xdm initialize very quickly and, although it does seem to grab tty7 (according to "lsof -nPp `pgrep Xorg`" showing an open file descriptor to /dev/tty7), it appears to share keyboard input with tty1 as soon as getty is launched. An effective fix is to remove the misplaced xdm script:
$ update-rc.d -f xdm remove
And create an upstart script /etc/init/xdm.conf with the following content:
start on (started tty1 and started tty2 and started tty3 and started tty4 and started tty5 and started tty6)
expect fork
exec /usr/bin/xdm
This waits for all 6 getty instances to be launched before launching xdm, and solves the problem completely for me. Something remains to be investigated in the Xorg/Plymouth/DRM stack, as it clearly does not handle tty7 being initialized before tty1 through tty6.