After great success at running 8 GPUs under Linux in one machine, I have decided to try 10 GPUs (5 dual-GPU AMD Radeon HD 5970). Unfortunately the X11 fglrx driver crashes. So the limit appears to be 8 GPUs on Linux (whereas Windows is limited to 4).
The machine boots and the 10 PCI graphics devices are detected (1002:689c is the vendor and device ID for the 5970 GPUs):
$ lspci -d 1002:689c
06:00.0 Display controller: ATI Technologies Inc Device 689c
07:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
0a:00.0 Display controller: ATI Technologies Inc Device 689c
0b:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
0e:00.0 Display controller: ATI Technologies Inc Device 689c
0f:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
16:00.0 Display controller: ATI Technologies Inc Device 689c
17:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
1a:00.0 Display controller: ATI Technologies Inc Device 689c
1b:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
The fglrx.ko kernel module can be loaded:
$ modprobe fglrx
$ dmesg
[...]
[fglrx] module loaded - fglrx 8.80.5 [Nov 25 2010] with 10 minors
But after writing (generating) an xorg.conf config file for the 10 GPUs and launching Xorg, the fglrx driver segfaults:
$ X
[...]
(EE) fglrx(0): DLM initialization failed
Backtrace:
0: X (xorg_backtrace+0x28) [0x4a33e8]
1: X (0x400000+0x6574d) [0x46574d]
2: /lib/libpthread.so.0 (0x7fbcb4005000+0xf8f0) [0x7fbcb40148f0]
3: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxDisplayScreenDestroy+0x4c) [0x7fbcb08bb8cc]
4: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxDisplayPreInit+0x4bf) [0x7fbcb08b89cf]
5: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxPreInit+0x8fd) [0x7fbcb088dc1d]
6: X (InitOutput+0x552) [0x473b52]
7: X (0x400000+0x26005) [0x426005]
8: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fbcb2cfcc4d]
9: X (0x400000+0x25d59) [0x425d59]
Segmentation fault at address (nil)
Caught signal 11 (Segmentation fault). Server aborting
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
Please also check the log file at "/var/log/Xorg.0.log" for additional information.
ddxSigGiveUp: Closing log
Details about my system:
- ATI Catalyst driver 10.12
- Ubuntu 10.04 Lucid Lynx amd64 with kernel 2.6.32-27.49
- 5 x AMD Radeon HD 5970
- Mobo: MSI 890FXA-GD70 [Update 2011-01-26: the layout I used was three 5970 on three PCIe x16 slots, plus two 5970 connected to two other slots via flexible PCIe extenders (like this).]
- CPU: AMD Sempron 140
- RAM: 2GB DDR3
Any AMD engineer wants to investigate? I can only confirm in GDB that invalid pointers are being manipulated in xdl_x750_atiddxDisplayScreenDestroy()...