I am asked sometimes why I used a cheap $60 motherboard with PCIe x1 slots and four $30 flexible PCIe x1 extenders to build this 4 x AMD Radeon HD 5970 graphics card machine, when I could buy a motherboard with four x16 slots and not need any extender for the same price. As I said in that post, and as I showed in pictures, this makes cooling easy as the ~2cm gap between each card dramatically helps with cooling. Without this gap, full loads on the cards raise GPU temperatures to 95-100°C within minutes, at which point GPU clocks start being throttled, degrading performance.
But I found a simpler solution to take care of the cooling problem that does not involve extenders:
- Lay a motherboard with four x16 slots flat (horizontal) in a rackable chassis, or on a rack shelf.
- Plug the four HD 5970 cards in. Do not screw them into the chassis.
- Obtain some plastic elements the size of pen caps, or wire nuts.
- Insert them between the cards. The mechanical imprecisions of PCIe slots allow ~1cm gaps to open between the top edges of each card.
- Use the undocumented, reverse-engineered Linux "aticonfig --pplib-cmd" PowerPlay interface to manually set the graphics card fan speeds to 70% or higher.
- Make sure to maintain ambient room temperature below 30°C or so.
- Voilà! Enjoy running GPGPU apps on this easy-to-build hardware beast (no watercooling!) with no fear of overheating.
Wire nuts are convenient to use in my experience for creating the gaps, as they lock securely in place on the card brackets, as shown in the picture.
I experimentally found that a fan speed of 70% is sufficient to achieve almost the same cooling as a fan speed of 100%. Example to set the fan speed of the first GPU to 70% (DISPLAY=:0.0 for the first, DISPLAY=:0.1 for the second, etc):
$ env DISPLAY=:0.0 aticonfig --pplib-cmd "set fanspeed 0 70"
(On a dual-GPU HD 5970 with one fan shared by two GPUs, this command needs to be run only for the "master" GPU, it will fail on the other. Try them both to discover which one is the master.)
To query the current fanspeed:
$ env DISPLAY=:0.0 aticonfig --pplib-cmd "get fanspeed 0" Fan speed query: Query Index: 0, Speed in percent Result: Fan Speed: 70%
I have been stress-testing this cooling hack on a 4 x HD 5970 machine based on the MSI 890FXA-GD70 motherboard in harsh conditions —25°C ambient— for multiple days, and the GPU temperatures have been holding quite fine in the 65-90°C range. The graph hereafter shows temperatures for each of the 8 GPUs at 100% load over a 12-hour period. GPUs #6 and #7 correspond to the rightmost dual-GPU card (in the picture) whose fan is the only one not obstructed by another card, which is why they have significantly lower temperatures.