mrb's blog

10 AMD/ATI GPUs in 1 Computer: SIGSEGV

Keywords: amd gpu hardware largescale linux

After great success at running 8 GPUs under Linux in one machine, I have decided to try 10 GPUs (5 dual-GPU AMD Radeon HD 5970). Unfortunately the X11 fglrx driver crashes. So the limit appears to be 8 GPUs on Linux (whereas Windows is limited to 4).

The machine boots and the 10 PCI graphics devices are detected (1002:689c is the vendor and device ID for the 5970 GPUs):

$ lspci -d 1002:689c
06:00.0 Display controller: ATI Technologies Inc Device 689c
07:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
0a:00.0 Display controller: ATI Technologies Inc Device 689c
0b:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
0e:00.0 Display controller: ATI Technologies Inc Device 689c
0f:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
16:00.0 Display controller: ATI Technologies Inc Device 689c
17:00.0 VGA compatible controller: ATI Technologies Inc Device 689c
1a:00.0 Display controller: ATI Technologies Inc Device 689c
1b:00.0 VGA compatible controller: ATI Technologies Inc Device 689c

The fglrx.ko kernel module can be loaded:

$ modprobe fglrx
$ dmesg
[...]
[fglrx] module loaded - fglrx 8.80.5 [Nov 25 2010] with 10 minors

But after writing (generating) an xorg.conf config file for the 10 GPUs and launching Xorg, the fglrx driver segfaults:

$ X
[...]

(EE) fglrx(0): DLM initialization failed

Backtrace:
0: X (xorg_backtrace+0x28) [0x4a33e8]
1: X (0x400000+0x6574d) [0x46574d]
2: /lib/libpthread.so.0 (0x7fbcb4005000+0xf8f0) [0x7fbcb40148f0]
3: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxDisplayScreenDestroy+0x4c) [0x7fbcb08bb8cc]
4: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxDisplayPreInit+0x4bf) [0x7fbcb08b89cf]
5: /usr/lib/xorg/extra-modules/modules/drivers/fglrx_drv.so (xdl_x750_atiddxPreInit+0x8fd) [0x7fbcb088dc1d]
6: X (InitOutput+0x552) [0x473b52]
7: X (0x400000+0x26005) [0x426005]
8: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7fbcb2cfcc4d]
9: X (0x400000+0x25d59) [0x425d59]
Segmentation fault at address (nil)

Caught signal 11 (Segmentation fault). Server aborting

Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
Please also check the log file at "/var/log/Xorg.0.log" for additional information.

 ddxSigGiveUp: Closing log

Details about my system:

  • ATI Catalyst driver 10.12
  • Ubuntu 10.04 Lucid Lynx amd64 with kernel 2.6.32-27.49
  • 5 x AMD Radeon HD 5970
  • Mobo: MSI 890FXA-GD70 [Update 2011-01-26: the layout I used was three 5970 on three PCIe x16 slots, plus two 5970 connected to two other slots via flexible PCIe extenders (like this).]
  • CPU: AMD Sempron 140
  • RAM: 2GB DDR3

Any AMD engineer wants to investigate? I can only confirm in GDB that invalid pointers are being manipulated in xdl_x750_atiddxDisplayScreenDestroy()...

Comments

tatgdi wrote: Hello Marc,

Which application in Windows did you get to successfully detect and use x2 5970's = 4 GPU cores? Have you ported your application to a windows platform?

I saw that the developer of oclhashcat has made a recent announcement of his application supporting 16 GPU cores. I am not sure if that would translate to a total of 16 5970's or just 8. One slight difference that I noticed was the version of the Ubuntu kernel that you are running and you did not mention which ATI stream SDK you installed.
22 Jan 2011 15:31 UTC

mrb wrote: I once briefly tried 2 x 5970 under Windows 7. It detected them all. I tried running the brute forcer ighashgpu, but it used only 1 of the 4 GPUs. I must have done something wrong because according to multiple sources ighashgpu can use 4 ATI GPUs on Windows.

I have never compiled whitepixel for Windows, but it can be done. See the comment from "mrhg" here: http://blog.zorinaq.com/?e=42

oclhashcat (and many other brute forcers) support up to 16 Nvidia GPUs, but typically only 1-4 ATI GPUs. oclhashcat 0.24 released this week is the second bruteforer, after whitepixel, to support 8 ATI GPUs. That means 4 x 5970.

I use the Stream SDK (now renamed APP SDK) 2.3.
23 Jan 2011 11:47 UTC

mosie wrote: yop

Je vais m'adresser a toi en Fr car je ne maitrise pas encore suffisamment l'anglais, et vu que tu a fait tes études a l'EPITA ....

Bref je pense que l on a pas mal de truc en commun:

ma config (vieux post, que je ne tien plus trop a jour car création d un site dédier sur ma config: http://www.overclocking-tv.com/forum/index.php?topic=351.0

J' est était agréablement surpris en tombant sur ton blog, tu te pose de bonne questions dans des domaines intéressante...

Bref j aimerai pas mal discuter avec toi, expliqué mes différent projets ...

Bref je pense qu il y a moyen que l on gère pas mal de truc ensemble.

Je suis pas un gros accro de la redaction et preferae toujours parler de vive voie.

mon skype: mosie ( en france)
24 Jan 2011 21:45 UTC

mrb wrote: Salut! Email moi. Je ne suis sur Skype que rarement. 25 Jan 2011 04:42 UTC

mosie wrote: Trouve un peut de temps pour Y être, car par écrit c'est trop limitant. 25 Jan 2011 17:51 UTC

junky wrote: May I ask what model of mboard are you using?
You're setup is awesome!!!
02 Feb 2011 06:39 UTC

mrb wrote: The mobo model number is in the post. 02 Feb 2011 08:57 UTC

tatgdi wrote: Hey Marc,

Try the following updated driver released today and let us know if it corrects the issue.

http://support.amd.com/us/gpudownload/linux/Pages/radeon_linux.aspx?type=2.4.1&product=2.4.1.3.42&lang=English
17 Feb 2011 03:20 UTC

jro wrote: Did the updated driver work and allow for 10x gpus? 31 Mar 2011 14:32 UTC

none wrote: WTF? Did the new driver work or what!? 23 May 2011 15:33 UTC

mrb wrote: I haven't had a chance to test 10+ GPUs with it. It is kind of cumbersome as I have to use 2 PSUs and at this moment all my hardware is in use. 25 May 2011 04:31 UTC

psus wrote: Nice blog.

How do you use 2 PSUS ?
27 May 2011 08:40 UTC

mrb wrote: Thanks. Unless you are familiar with the ATX specs, there are very few good guides documenting how to do this. A few techniques are documented here:

http://www.directron.com/2powersupplies.html

The easiest for non-technical users is to buy an adapter like this:

http://www.legitreviews.com/images/reviews/952/nzxt_khaos_019.jpg

Personally I wire the green PS_ON wire of the main power supply to the PS_ON wire of the secondary power supply.
28 May 2011 23:50 UTC

psus wrote: Thanks for your reply.

I had seen a similar guide, and indeed found very little material on this subject. I was afraid to go the way described in this guide, because I feared that my PSUs would compete against each other to set the voltage, or that I'd have unwanted current flows in my hardware to compensate difference between the voltage of the PSUs. No problem here?
30 May 2011 15:03 UTC

tsm wrote: Hey,

did you find the time to do a check on the new driver?

8 GPUs is really a boundary at the moment, where there are lots of Mainboards with 5 and more PCIe-slots available.
21 Jun 2011 15:54 UTC

Brian wrote: Installed 10 GPUs in my box today same basic results, seems it still cannot happen:

Ubuntu 11.04
Catalyst 11.6

used aticonfig to generate an xorg.conf, with 8 GPUs box is great, up it to 10 and X crashes.
08 Jul 2011 01:13 UTC

Brian wrote: Spoke to AMD support, it seems 4 GPUs is the only official supported configuration and anything more will not be supported and may or may not work. I have been working on jumping from Nvidia to AMD because of the big push AMD executives have put behind OpenCL (at least in the talks I've seen) but it seems the company hasn't caught up with Nvidia when it comes to multi-GPU work. 09 Jul 2011 14:32 UTC

best graphic card for the money wrote: Hey, I noticed my Ubuntu Lucid giving me the same error, however for some reason I was stuck on 4 GPU limits. Any ideas? 26 Oct 2011 23:58 UTC

gregg814 wrote: I'm very interested in being able to pass the 8 GPU limit on AMD drivers. There is an article here that may help you. https://community.amd.com/thread/197524 26 Oct 2016 16:15 UTC

mrb wrote: gregg814: thanks for pointing me to it--this was a very interesting read. Glad to see they got 16 GPUs working with cooperation from the Supemicro BIOS engineers. Too bad other vendors do not care/will not be motivated to fix their BIOS. 28 Oct 2016 15:33 UTC