mrb's blog

Whitepixel breaks 28.6 billion password/sec

Keywords: amd attack bruteforcing gpu hardware performance

I am glad to announce, firstly, the release of whitepixel, an open source GPU-accelerated password hash auditing software for AMD/ATI graphics cards that qualifies as the world's fastest single-hash MD5 brute forcer; and secondly, that a Linux computer built with four dual-GPU AMD Radeon HD 5970 graphics cards for the purpose of running whitepixel is the first demonstration of eight AMD GPUs concurrently running this type of cryptographic workload on a single system. This software and hardware combination achieves a rate of 28.6 billion MD5 password hashes tested per second, consumes 1230 Watt at full load, and costs 2700 USD as of December 2010. The capital and operating costs of such a system are only a small fraction of running the same workload on Amazon EC2 GPU instances, as I will detail in this post.

[Update 2010-12-14: whitepixel v2 achieves a higher rate of 33.1 billion password/sec on 4xHD 5970.]

Software: whitepixel

See the whitepixel project page for more information, source code, and documentation.

Currently, whitepixel supports attacking MD5 password hashes only, but more hash types will come soon. What prompted me to write it was that sometime in 2010, ATI Catalyst drivers started supporting up to 8 GPUs (on Linux at least) when previously they were limited to 4, which made it very exciting to be able to play with this amount of raw computing performance, especially given that AMD GPUs are roughly 2x-3x faster than Nvidia GPUs on ALU-bound workloads. Also, I had previously worked on MD5 chosen-prefix collisions on AMD/ATI GPUs. I had a decent MD5 implementation, wanted to optimize it further, and put it to other uses.

Overview of whitepixel

  • It is the fastest of all single-hash brute forcing tools: ighashgpu, BarsWF, oclHashcat, Cryptohaze Multiforcer, InsidePro Extreme GPU Bruteforcer, ElcomSoft Lightning Hash Cracker, ElcomSoft Distributed Password Recovery.
  • Targets AMD HD 5000 series and above GPUs, which are roughly 2x-3x faster than high-end Nvidia GPUs on ALU-bound workloads.
  • Best AMD multi-GPU support. Works on at least 8 GPUs. Whitepixel is built directly on top of CAL (Compute Abstract Layer) on Linux. Other brute forcers support fewer AMD GPUs due to OpenCL libraries or Windows platform/drivers limitations.
  • Hand-optimized AMD IL (Intermediate Language) MD5 implementation.
  • Leverages the bitalign instruction to implement rotate operations in 1 clock cycle.
  • MD5 step reversing. The last few of the 64 steps are pre-computed in reverse so that the brute forcing loop only needs to execute 46 of them to evaluate potential password matches, which speeds it up by 1.39x.
  • Linux support only.
  • Last but not least, it is the only performant open source brute forcer for AMD GPUs. The author of BarsWF recently open sourced his code but as shown in the graphs below it is about 4 times slower.

That said, speed is not everything. Whitepixel is currently very early-stage software and lacks features such as cracking multiple hashes concurrently, charset selection, and attacking hash algorithms other than MD5.

To compile and test whitepixel, install the ATI Catalyst Display drivers (I have heavily tested 10.11), install the latest ATI Stream SDK (2.2 as of December 2010), adjust the include path in the Makefile, build with "make", and start cracking with "./whitepixel $HASH". Performance-wise, whitepixel scales linearly with the number of GPUs and the number of ALUs times the frequency clock (as documented in this handy reference from the author of ighashgpu).

Performance

The first chart compares single-hash MD5 brute forcers on the fastest single GPU they support:

  • whitepixel 1: HD 5870: 4200 Mhash/sec
  • ighashgpu 0.90.17.3 with "-t:md5 -c:a -min:8 -max:8": HD 5870: 3690 Mhash/sec, GTX 580: 2150 Mhash/sec (estimated)
  • oclHashCat 0.23 with "-n 160 --gpu-loops 1024 -m 0 '?l?l?l?l' '?l?l?l?l'": HD 5870: 2740 Mhash/sec, GTX 580: 1340 Mhash/sec (estimated)
  • BarsWF CUDA v0.B or AMD Brook 0.9b with "-c 0aA~ -min_len 8:": GTX 580: 1740 Mhash/sec (estimated), HD 5870: 1240 Mhash/sec

The second chart compares single-hash MD5 brute forcers running on as many of the fastest GPUs they each support. Note that 8 x HD 5870 has not been tested with any of the tools because it is unknown if this configuration is supported:

  • whitepixel 1: 4xHD 5970: 28630 Mhash/sec
  • ighashgpu 0.90.17.3 with "-t:md5 -c:a -min:8 -max:8": 8xGTX 580: 17200 Mhash/sec (estimated), 2xHD 5970: 12600 Mhash/sec
  • BarsWF CUDA v0.B or AMD Brook 0.9b with "-c 0aA~ -min_len 8:": 8xGTX 580: 13920 Mhash/sec (estimated)
  • oclHashCat 0.23 with "-n 160 --gpu-loops 1024 -m 0 '?l?l?l?l' '?l?l?l?l'": 8xGTX 580: 10720 Mhash/sec (estimated)

Hardware: 4 x Dual-GPU HD 5970

To demonstrate and use whitepixel, I built a computer supporting four of the currently fastest graphics card: four dual-GPU AMD Radeon HD 5970. One of my goals was to keep the cost as low as low as possible without sacrificing system reliability. Some basic electrical and mechanical engineering were necessary to reach this goal. The key design points were:

  • Modified flexible PCIe extenders: allows down-plugging PCIe x16 video cards into x1 connectors, enables running off an inexpensive motherboard, gives freedom to arrange the cards to optimize airflow and cooling (max GPU temps 85-90 C at 25 C ambient temp).
  • Two server-class 560 Watt power supplies instead of one high-end desktop one: increases reliability, 80 PLUS Silver certification increases power efficiency (measured 88% efficiency).
  • Custom power cables to the video cards: avoids reliance on Y-cable splitters and numerous cable adapters which would ultimately increase voltage drop and decrease system reliability.
  • Rackable server chassis: high density of 8 GPUs in 3 rack units.
  • Entry-level Core 2 Duo processor, 2GB RAM, onboard LAN, diskless: low system system cost, gives ability to boot from LAN.
  • Undersized ATX motherboard: allows placing all the components in a rackable chassis.
  • OS: Ubuntu Linux 8.04 64-bit.

Detailed list of parts:

Note: I could have built it with a less expensive and low-power processor instead, and saved about $100, but I had that spare Core 2 Duo and used it instead.

The key design points are explained in great details in the coming sections.

Down-plugging x16 Cards in x1 Connectors

It is not very well known, but the PCIe specification electrically supports down-plugging a card in a smaller link connector, such as an x16 card in an x1 connector. This is possible because the link width is dynamically detected during link initialization and training. However PCIe mechanically prevents down-plugging by closing the end of the connector, for a reason that I do not understand. Fortunately this can be worked around by cutting the end of the PCIe connector, which I did by using a knife to tediously carve the plastic. But another obstacle to down-plugging especially long cards into x1 connectors is that motherboards have tall components such as heatsinks or small fans that might come in contact with the card. For these reasons, I bought flexible PCIe x1 extenders in order to allow placing the cards anywhere in the vicinity of the motherboard, and to cut their connectors instead of the motherboard's (easier and less risky). Here are a few links to manufacturers of x1 extenders:

Shorting Pins for "Presence Detection"

As I have briefly mentioned in my MD5 chosen-prefix collisions slide about hardware implementation details, some motherboards require pins A1 and B17 to be shorted for an x16 card to work in an x1 connector. Let me explain why. The PCI Express Card Electromechanical Specification describes five "presence detect" pins:

  1. A1: PRSNT1# Hot-plug presence detect
  2. B17: PRSNT2# Hot-plug presence detect (for x1 cards)
  3. B31: PRSNT2# Hot-plug presence detect (for x4 cards)
  4. B48: PRSNT2# Hot-plug presence detect (for x8 cards)
  5. B81: PRSNT2# Hot-plug presence detect (for x16 cards)

The motherboard connects PRSNT1# to ground. PCIe cards must have an electrical trace connecting PRSNT1# to the corresponding PRSNT2# pin depending on their link width. The motherboard detects if a card is present if it detects ground on one of the PRSNT2# pins. It is unclear to me whether this presence detection mechanism is supposed to be used only in the context of hot-plugging (yes, PCIe supports hot-plugging), or to detect the presence of cards in general (eg. during POST). One thing I have experimentally verified is that some, not all, motherboards use this mechanism to detect the presence of cards during POST. Down-plugging an x16 card in an x1 connector on these motherboards results in a system that does not boot or does not detect the card. The solution is to simply short pins A1 and B17 to do exactly what a real x1 card does:

I had a few motherboads that did not allow down-plugging. With this solution all of them worked fine. Now this meant I could build my system using less expensive motherboards with 4 x1 connectors for example, instead of requiring 4 x16 connectors. A reduced width only means reduced bandwidth. However even an x1 PCIe 1.0 link allows for 250 MB/s of bandwidth, which is far above my password cracking needs (the main kernel of whitepixel sends and receives data on the order of hundreds of KB per second).

The motherboard I chose is the Gigabyte GA-P31-ES3G with one x16 connector and three x1 connectors. It has the particularity of being undersized (only 19.3 cm wide) which helped me fit all the hardware in a single rackable chassis.

Designing for 1000+ Watt

Power was of course the other tricky part. Four HD 5970 cross the 1000 Watt mark. I chose to split the load on two (relatively) low-powered, but less expensive power supplies. In order to know how to best spread the load I first had to measure the current drawn by a card under my workload from its three 12 Volt power sources: PCIe connector, 6-pin, and 8-pin power connectors. I used a clamp meter for this purpose. This is where another advantage of the flexible PCIe extenders becomes apparent: the 12V wires can be physically isolated from the others on the ribbon cable to clamp the meter around them. In my experiments with whitepixel, the current drawn by an HD 5970 is (maximum allowed by PCIe spec in parentheses):

  • PCIe connector (idle/load): 1.1 / 3.7 Amp (PCIe max: 6.25)
  • 6-pin connector (idle/load): 0.9 / 6.7 Amp (PCIe max: 6.25, the card is slightly over spec)
  • 8-pin connector (idle/load): 2.2 / 11.4 Amp (PCIe max: 12.5)
  • Total (idle/load): 4.2 / 21.8 Amp (PCIe max: 25.0)

(Power consumption varied by up to +/-3% depending on the card, but it could be due to my clamp meter which is only 5% accurate. The PCIe connector also provides a 3.3V source, but the current draw here is negligible.)

The total wattage for the above numbers, per HD 5970, is 50 / 262 Watt (idle/load) which approximately matches the TDP specified by AMD: 50 / 294 Watt.

As to the rest of the system (motherboard, CPU —the system is diskless), they draw a negligible 30 Watt at all time, about half from the 12V rail (I measured 1.5 Amp) and half from others. It stays the same at idle and under the load imposed by whitepixel because the software does not perform any intensive computation on the processor. That said I planned for 3 Amp as measured with 1 core busy running "sha1sum /dev/zero".

Standardizing Power Distribution to the Video Cards

I decided early on to use server-class PSUs for their reliability and because few desktop PSUs are certified 80 PLUS Silver or better (80 PLUS only just was not enough). One inconvenient of server-class PSUs is that few come with enough PCIe 6-pin or 8-pin power connectors for the video cards. I came up with a workaround for this that brings additional advantages...

The various 12V power connectors in a computer (ATX, EPS12V, ATX12V) are commonly rated up to 6, 7, or 8 Amp per wire. On the other hand the PCIe specification is excessively conservative when rating the 6-pin connector 75 Watt (2.08 Amp/wire, 3 12V wires, 3 GND), and the 8-pin connector 150 Watt (4.17 Amp/wire, 3 12V wires, 5 GND). There is no electrical reason for being so conservative. So I bought a few parts (read this great resource about computer power cables):

  • Molex crimper
  • Yellow and black stranded 16AWG wire (large gauge to minimize voltage drop, without being too inconvenient to crimp)
  • Molex Mini-fit Jr. 4-circuit male and female housings (the same kind used for 4-pin ATX12V cable and motherboard connectors)
  • Molex Mini-fit Jr. terminals (metallic pins for the above connector)

And built two types of custom cables designed for 6.25 Amp per 16AWG wire:

  1. PCIe 6-pin connector to custom 4-pin connector (with one 12V pin, one GND pin, two missing pins) for 6.25 Amp total
  2. PCIe 8-pin connector to custom 4-pin connector (with two 12V pins, two GND pins) for 12.5 Amp total

With these cables connected to the video cards, I have at the other end of them a standardized set of 4-pin connectors (with 1 or 2 wire pairs) that remain to be connected to the PSU. I used a Molex pin extractor to extract the 12V and GND wires from all unused PSU connectors (ATX, ATX12V, EPS12V, etc) which I reinserted in Molex Mini-fit Jr. housings to build as many 4-pin connectors as needed (again with either 1 or 2 wire pairs).

Essentially, this method standardizes power distribution in a computer to 4-pin connectors and at the same time gets rid of the unnecessary 2.08 or 4.17 Amp/wire limit imposed by PCIe. Manufacturing the custom cables is a one-time cost, but I am able, with the Molex pin extractor, to reconfigure any PSU in a minute or so to build as many 4-pin connectors as it safely electrically allows. It is also easy to change from powering 6-pin connectors or 8-pin connectors by reconfiguring the number of wire pairs. Finally, all the cable lengths have been calculated to minimize voltage drop to under 100 mV.

Spreading ~90 Amp @ 12 Volt Across 2 PSUs

As per my power consumption numbers above for a single HD 5970 card, four of them plus the rest of the system total 4 * 21.8 + ~3 = ~90 Amp at 12 Volt. To accommodate this, I used two server-class 560 Watt Supermicro PWS-562-1H (aka Compuware CPS-5611-3A1LF) power supplies rated 80 PLUS Silver with a 12V rail capable of 46.5 Amp. Based on the measurements above, I decided to spread the load as such:

  • First PSU to power the four 8-pin connectors:
    4 * 11.4 (8-pin) = 45.6 Amp
  • Second PSU to power everything else (four 6-pin connectors + four cards via PCIe slot + ATX connectors for mobo/CPU):
    4 * 6.7 (6-pin) + 4 * 3.7 (slot) + ~3 (mobo/CPU) = ~45 Amp

With the current spread almost equally between the two PSUs, they operate slightly under 100% of their maximal ratings(!) In a power supply, the electrolytic capacitors are often the components with the shortest life. They are typically rated 10000 hours. So, although it is safe to operate the PSUs at their maximal ratings 24/7, I would expect them to simply wear out after about a year.

During one of my tests, I accidentally booted the machine with the video cards wired in a way that one of the power supplies was operating 10% above its max rating. I started a brute forcing session. One of the PSUs became more noisy than usual. It ran fine for half a minute. Then the machine suddenly shut down. I checked my wiring and realized that it must have been pulling about 51 Amp, so the over current protection kicked in! This is where the quality of server-class PSUs is appreciable... I corrected the wiring and this PSU is still running fine to this day.

Note that any other way of spreading the load would be less efficient. For example if both the 6-pin (75W) and 8-pin (150W) connectors of one card are connected to the same PSU, and if the remaining cards are powered in a way to manage to spread the power equally at full load, then the equilibrium would be lost when this one card stops drawing power (not the others) because one PSU would see a drop of (up to) 225W and the other at best 75W if it was powering the slot. A PSU at very low load is less efficient. When using two PSUs I recommend to have one power the slot and the 6-pin connector, and the other the 8-pin one (150W each).

88% Efficiency at Full Load

The 80 PLUS Verification and Testing report (pdf) of my power supplies indicates they should be at least 85% efficient at full load. As measured with my clamp meter, the combined PSU output power to all components is:

4 cards * 262 Watt + 30 Watt for mobo/CPU = 1078 output Watt

However I measured with a Kill-a-Watt on the wall outlet 1230 input Watt. So the power efficiency is 1078 / 1230 = 88% efficiency, even better than the 80 PLUS Gold level even though the PSU are certified only Silver! This demonstrates another quality of server-class PSUs. This level of efficiency may be possible due to my configuration drawing most of its power from the highly-optimized 12 Volt rails, whereas the official 80 PLUS tests are conducted with a significant fraction of load on the other rails (-12V, 3.3V, 5V, 5VSB) which are known to be less efficient. After all, even Google got rid of all rails but 12V.

84% Efficiency when Idle

Similarly, at idle I measure a PSU output power of:

4 cards * 50 Watt + 30 Watt for mobo/CPU = 230 output Watt

While the Kill-a-Watt reports 275 input Watt, which suggests an efficiency of 84%. At this level each PSU outputs 115 Watt so they function at 20% of their rated 560 Watt. According to the 80 PLUS Silver certification a PSU must be 85% efficiency at this load. My observation matches this within 1% (slight inaccuracies of the clamp meter).

Rackable Chassis

A less complex design problem was how to pack all this hardware in a chassis. As shown in the picture below, I simply removed the top cover of a 1U chassis (Norco RPC-170), and placed the video cards vertically (no supports were necessary), each spaced by 2 cm or so for a good airflow. Each PCIe extenders is long enough to reach the PCIe connectors on the motherboard —I bought 4 different lengths: 15cm, 20cm, 30cm, 30cm. The two 1U server PSUs are stacked on top of each other. The motherboard is not screwed to the chassis, it simply lays on an insulated mat.

The spacing between the video cards really helps: at an ambient temperature of 25 C, "aticonfig" reports maximum internal temperatures of 60-65 C at idle and 85-90 C under full load.

Final Thoughts

Comparison With Amazon EC2 GPU Instances

Amazon EC2 GPU instances are touted as inexpensive and perfect for brute forcing. Let's examine this.

On-demand Amazon EC2 GPU instances cost $2.10/hour and have two Nvidia Tesla M2050 cards, which the author of ighashgpu (fastest single hash MD5 brute forcer for Nvidia GPUs) estimates can achieve a total of 2790 Mhash/sec. One would need more than 10 such instances to match the speed of the 4xHD 5970 computer I built for about $2700. Running 10 of these instances for 6 days or more, would end up with hourly costs totalling $3024 and would already surpass the cost my computer. Running them 30 days would cost $15k. Running them 8 months would cost $123k. Compare this to the operating costs for my computer, mainly power, which are a mere $90 per month (1230 Watt at $0.10/kWhr), or $130 per month assuming an unremarkable PUE of 1.5 to account for cooling and other overheads:

  • Buying and running 4 x HD 5970 for 8 months: 2700 + 8*130 = $3740
  • Running 10 Amazon GPU instances for 8 months: 10 * 2.10 * 24 * 30.5 * 8 = $123000

You get the idea: financially, brute forcing in Amazon's EC2 GPU cloud makes no sense, in this example it would cost 33x more over 8 months. I recognize this is an extreme example with 4 highly optimized HD 5970, but the overall conclusion is the same even when comparing EC2 against a more modest computer with slower Nvidia cards.

To be more correct, brute forcing in Amazon's cloud makes financial sense in some cases, for example when performing so infrequently and at such a small scale (less than a few days of computing time) that purchasing 1 GPU would be more expensive. On the opposite side of the scale, it may start to make sense again when operating at a scale so large (hundreds of GPUs) that one may not have the expertise or time to deploy and maintain such an infrastructure on-premise. At this scale, one would buy reserved instances for a one-time cost plus a hourly cost lower than on-demand instances: $0.75/hr.

One should also keep in mind than when buying EC2 instances, one is paying for hardware features that are useless and unused when brute forcing: full bisection 10 Gbps bandwidth between instances, terabytes of disk space, many CPU cores (irrelevant given the GPUs), etc. The power of Amazon's GPU instances is better realized when running more traditional HPC workloads that utilize these features, as opposed to "dumb" password cracking.

IL Compiler Optimizer

I spent a lot of time looking closely how AMD's IL compiler optimizes and compiles whitepixel's IL code to native ISA instructions at run time. Anyone looking at the output ISA instructions will notice that it is a decent optimizer. For example step 1 in MD5 requires in theory 8 instructions to compute (3 boolean ops, 4 adds, 1 rotate):

A = B + ((A + F(B, C, D) + X[0] + T_0) rotate_left 7)
with F(x,y,z) = (x AND y) OR (NOT(x) AND z)

But because the intermediate hash values A B C D are known at the beginning of step 1, the CAL compiler precomputes "A + F(B, C, D)" which results in only 4 ISA instructions to execute this step. This plus other similar optimizations make the compiler contributes an overall perf improvement of about 3-4%. It might not sound much, but it was certainly sufficiently noticeable that it prompted me to track down where the unexpected 3-4% extra performance came from.

Expected Performance on HD 6900 series

The next-generation dual-GPU HD 6990 to be released in a few months is rumored to have 3840 ALUs at 775 MHz. If this is approximately true, then this card should perform about 9 Bhash/sec, or about 28% faster than the HD 5970.

Hacking is Fun

You may wonder why I spent all this time optimizing cost and power. I like to research, learn, practice these types of electrical and mechanical hacks, and optimize low-level code. I definitely had a lot of fun working on this project. That is all I have to say :-)

Comments

Beavis wrote: What are you doing running that much expensive hardware on your static-prone carpet?? 10 Dec 2010 02:51 UTC

mrb wrote: Putting it on the carpet was just to take pictures in an uncluttered space :-) But it does look strange... I am going to try to crop some pictures. 10 Dec 2010 05:06 UTC

Eddie wrote: Great work as always! Your level of detail is appreciated. 11 Dec 2010 02:04 UTC

nfo wrote: where can i download your rainbow tables
or did you provide md5 cracking as cloud service ?
no serious, good work !!
13 Dec 2010 05:00 UTC

Rogerfd wrote: good work! 14 Dec 2010 03:28 UTC

someone wrote: Great post, this is awesome! 15 Dec 2010 13:49 UTC

mrhg wrote: Hello,
This is great stuff.
I run a 5850@725MHz, and I see 3696 Mhash/sec. This almost seems more than expected compared to the 5970.
I was doing some opencl experiments myself, but these results will prompt me to look into CAL.
Question: The 69xx series has a VLIW4 instruction set compared to
15 Dec 2010 16:40 UTC

mrhg wrote: Seems my previous post got truncated somehow.
Question is:
The 69xx series has a VLIW4 instruction set compared to 68xx and 5xxx. Is it possible to run the current implementation on 69xx, or does it need to be reimplemented?
15 Dec 2010 16:43 UTC

mrb wrote: mrhg: don't forget that this post refers to perf numbers for whitepixel v1. v2 with the -e option is even faster as explained in my subsequent post. A 5970 does 8270 Mhash/sec. A 5850 should do 8270 * 1440 (nr shaders on 5850) / 3200 (nr of shaders on 5970) = 3722 Mhash/sec. You measured 3969 Mhash/sec which confirms it.

Whitepixel already supports the 69xx series and should run very well on it. However possible extra optimizations for this VLIW4 arch could maximize ALU utilization by another +6-7%.
15 Dec 2010 19:15 UTC

mrhg wrote: Hello,
I took upon myself a little task; since there are better overclocking tools available under windows, and some of the development tools from AMD only have GUI under windows platform, I figured I wanted to try running whitepixel under windows.

Being a proponent to open source, I didn't want to use any overpriced MS tools, so this is what I used
- mingw-w64 gcc compiler
- strawberry perl
- MSI Afterburner for GPU clock control

The GPU I've got is a 5850 reference design with core clock and memory speed at 725/1000MHz

I used 8-char lowercase password in the below test (as it completes within a minute on my machine)
- Default clock: 3239 Mhash/sec...
- Default clock + BFI: 3599 Mhash/sec...
- OC 775/1125: 3471 Mhash/sec...
- OC 850/1160: 3887 Mhash/sec...
- OC 850/1160 + BFI: 4225 Mhash/sec...

Based on previous OC, I think this is as much as I can push it before GPU gets unstable.
Looks like I get a 17% speed gain in the BFI case...
23 Dec 2010 03:04 UTC

mrb wrote: I am glad to know whitepixel can be compiled (apparently easily) under Windows. Cool! 23 Dec 2010 05:08 UTC

Hoang wrote: How about single SHA-1 recover/crack ? 30 Dec 2010 16:02 UTC

mrb wrote: 10 billion SHA-1 hash/sec are my expectations of whitepixel's performance on 4 x 5970s... I will work on SHA-1 some time in the near future. 30 Dec 2010 23:45 UTC

Hoang wrote: Also, i dont understand that you can use 4 x HD5970 w/o Crossfire brigde on the mobo which dont support Crossfire.
Can you explain?
31 Dec 2010 06:17 UTC

mrb wrote: Crossfire is *not* needed to support multiple GPUs from IL/OpenCL/DirectCompute programs. In fact I believe on earlier ATI HD 4000 series cards, you even had to *disable* Crossfire to allow GPGPU programs to work correctly with multiple GPUs. 01 Jan 2011 12:38 UTC

Hoang wrote: Hi, i got the problem when hot-plug the card to x8 slot with A1 wired to B48, system seems reset and not boot! Is there any ideas? 06 Jan 2011 16:08 UTC

mrb wrote: Hoang: it's hard to say what the pb could be. You are down-plugging an x16 card in an x8 slot? Make sure you correctly identified the pins A1 and B48. If you are using a flexible extender like me, it could be caused by poor quality of the extender (no shielding, EM interferences, etc). 07 Jan 2011 05:05 UTC

Hoang wrote: mrb: i got it working w/o wired but need some tape to cover the card as if it was x4 card! but have other problem that it needs Crossfire brigde then can use all plugged GPUs! 07 Jan 2011 14:51 UTC

Mannycalavera2 wrote: Hi i want to do something like you did, i didin't know i cant work with 4 5970 in windows, so if you can help me, i want to unlock sl3 phones, i'm thinking in buy this mobo http://latam.msi.com/spanish/products/detail_spec/890FXA-GD70_spa.htm and 4 hd5970, but in linux i'm lost, but i can learn, so do you recommend this mb for the 5970 to sl3? 07 Jan 2011 17:08 UTC

fajas colombianas wrote: did you provide md5 cracking as cloud service ? 12 Jan 2011 16:47 UTC

mrb wrote: Thinking about it...

Mannycalavera2: yeah the 890FXA is a great board. Actually I just bought one today.
13 Jan 2011 08:03 UTC

Francesco P. wrote: That is an amazing system. I have a question and I would greatly appreciate advice.

I am planning to implement 3 HD 5970 cards with a motherboard that support 16x16x16. I was reading that the CrossfireX driver only support 4 GPUs meaning 2 5970 cards. Do I need to use CrossfireX. My main goal is to use Bruteforce for unloclking SL3 and I want to make sure I am using all cards and I dont waste my money. How did you implement 4 cards?? I am a little confused after reading your explanation regarding this matter. Thanks
13 Jan 2011 23:59 UTC

mrb wrote: The limit is 4 GPUs on Windows only. Linux supports at least up to 8 GPUs. 14 Jan 2011 05:34 UTC

Francesco P. wrote: Thank you for taking the time to answer?

Couple of more questions:

Do I use CrossfireX or disable crossfire:?

What linux flavor do you recommend?

What driver do I install for the cards, like I said I am trying to unlock SL3 phones?

Please give me a hand on this there is a lot of information that contradicts some say yes some no not sure and I dont want to invest in the wrong thing.

Thank you ...
14 Jan 2011 17:24 UTC

Francesco P. wrote: Also, the app I will be running is Fenix key which uses ighashgpu, Fexix Key is not compatible with Linux any ideas what I can do in this situation?

Thanks this is my last post I don't mean to spam your blog. Thank for your great work.
14 Jan 2011 17:39 UTC

ventuz wrote: instead of using it to brute force finding password, use it for Folding@Home? 16 Jan 2011 05:01 UTC

mrb wrote: Francesco: ighashgpu... forget Linux then. You are stuck to using Windows which is limited to 2 x 5970 (4 GPUs). 16 Jan 2011 07:13 UTC

Francesco P. wrote: Would I be able to use whitepixel instead of ighashgpu to unlock SL3 phones, do you support SHA-1/SL3?

Thanks
18 Jan 2011 20:18 UTC

mrb wrote: Whitepixel does not currently support SHA-1 or SL3. 20 Jan 2011 07:49 UTC

Spiros Fraganastasis wrote: Hi!Can someone please help me with the compilation of white pixel?I use ATI 5770 with 10.12 Ati catalyst driver and ATI stream SDK2.2!When I put make into command line in order to compile whitepixel I get the following error message:
cc -O1 -std=c99 -pedantic -Wextra -Wall -Werror -Wno-overlength-strings -I/usr/local/ati-stream-sdk-v2.2-lnx32/include -c -o whitepixel.o whitepixel.c
cc1: warnings being treated as errors
whitepixel.c: In function ‘patch_opcodes’:
whitepixel.c:714: error: left shift count >= width of type
whitepixel.c: In function ‘patch_bfi_int_instructions’:
whitepixel.c:754: error: ignoring return value of ‘write’, declared with attribute warn_unused_result
make: *** [whitepixel.o] Error 1

Any kind of help is acceptable!Thanks in advance!
21 Jan 2011 19:54 UTC

hisense wrote: I have only one question - what is a real speed of this bruteforcer ? As I can see in sources you calculate a time by gettimeofday before kernel is invoke into kart and later you getting time after execution on card is finished, but you not count threads_analyze_and_prepare() which as I can see prepare very big amount of memory, so total performance in my opinion is totaly wrong till is not count with data preparation, ok is xxxxM\s but only card execution time of course computer don't have 0ms execution time for data preparation and here for sure performance is lost, maybe counter can show now nice value but does anyone try to move gettimeofday before data preparation and then see real performance? 21 Jan 2011 22:12 UTC

mrb wrote: Spiros: edit the Makefile and remove "-Werror".

hisense: correct, I do not account for some CPU time but it is negligible (0.5% or less). With default params, only 960kB of RAM must be prepared per GPU. Nonetheless I do care about making perf numbers a tiny bit more accurate, so I will account for this in the next version.
22 Jan 2011 10:56 UTC

hisense wrote: mrb: Ok thx, yes it's true I also tested it just now and no any visible changes so you are right this can be only 0.x%. 22 Jan 2011 11:40 UTC

Chris wrote: Hi. Great tool you created. Any plans for realeasing a SHA1 implementtion into whitepixel? thanks 05 Feb 2011 08:43 UTC

mrb wrote: I am working on SHA-1... 07 Feb 2011 07:34 UTC

alex wrote: Hello did you have a look into fastas 2 project 13 gpu
http://fastra2.ua.ac.be/
When salted sha1?

Distributed network version would be very nice to have :)=
08 Feb 2011 23:33 UTC

tatgdi wrote: Hello I am receiving the following error when trying to compile Whitepixel. Does anybody have some suggestions
root@user-System-Product-Name:/usr/local/whitepixel-2# make
./kernel-md5.pl
cc -O1 -std=c99 -pedantic -Wextra -Wall -Werror -Wno-overlength-strings -I/usr/local/ati-stream/include -c -o whitepixel.o whitepixel.c
cc1: warnings being treated as errors
whitepixel.c: In function ‘patch_bfi_int_instructions’:
whitepixel.c:754: error: ignoring return value of ‘write’, declared with attribute warn_unused_result
make: *** [whitepixel.o] Error 1
09 Feb 2011 18:58 UTC

mrb wrote: tatgdi: edit the Makefile and remove "-Werror" 10 Feb 2011 09:50 UTC

hisense wrote: "Expected Performance on HD 6900 series
The next-generation dual-GPU HD 6990 to be released in a few months is rumored to have 3840 ALUs at 775 MHz. If this is approximately true, then this card should perform about 9 Bhash/sec, or about 28% faster than the HD 5970. "

About this - in barts family "t" unit don't exist anymore so they take off 20% of performance 4 instructions per cycle instead of 5 before, 3840 is a magic digit cuz 3200 + 20% = 3840, so they need 3840 SP in barts family to get a performance of previous 5970, and 3200SP of course they want to overclock it so only additional computing power can come from overclock when even now we can overclock 5970 into 935MHz then what is a reason to wait for 6990 with new highest price.
15 Feb 2011 13:32 UTC

hisense wrote: About this - in barts family "t" unit don't exist anymore so they take off 20% of performance 4 instructions per cycle instead of 5 before, 3840 is a magic digit cuz 3200 + 20% = 3840, so they need 3840 SP in barts family to get a performance of previous 5970, and 3200SP of course they want to overclock it so only additional computing power can come from overclock when even now we can overclock 5970 into 935MHz then what is a reason to wait for 6990 with new highest price. Sorry for double post but previous after quotation have missed some parts. 15 Feb 2011 13:34 UTC

aRcTiC wrote: @mrhg How can i compile it for windows? Any code changes needed? 16 Feb 2011 05:27 UTC

mrhg wrote: Sorry, I haven't read this thread for a while.
I used the following tools on windows.
- Strawberry perl; http://strawberryperl.com/
- mingw-w64; http://mingw-w64.sourceforge.net/
Setup the PATH for the above
Unzip whitepixel into a folder and type 'gmake'

I did have to change some code, as the author made use of
- memmem()
- asprintf()
- nanosleep()
which are not available in glibc for mingw. I think I looked at koders.com and copy-pasted memmem and asprintf into whitepixel.c.
I could upload my port somewhere if interest exists.
28 Feb 2011 05:44 UTC

mhash fan wrote: mrhg, if you could upload it somewhere, I would be greatful 02 Mar 2011 09:58 UTC

FrankEGee88 wrote: Wow!! really cool mrb, this is exactly what i've been looking for, brilliant combination of hardware and software. I just recently built a GPU-based budget computer and i'm really looking to get your tool working for 4x 5890's and i know it was asked a ton above, and I apologize in advance, but what exactly do you mean by "edit the Makefile and remove “-Werror”" do I need to replace "(DEPTH)" with the file-path to the make file or perhaps the "." needs to be replaced with the file-path? I think i'll try that out and see what happens.

Sorry for the dumb question, really like where you're going and really am looking forward to this tool! :]
06 Mar 2011 07:04 UTC

Alex wrote: Great tool, what about MPI support in future ? 15 Mar 2011 12:54 UTC

ReDoP wrote: hi

just one question.. i m runing win 7 win 7 ati 5970, but i can't get a constant activity at 99%, always have 55% to 90%.

i have alredy try 10.3 , 10.5 , and 11.1 catalyst drivers , but always the same thing. board asus with dual pci e, and power suply 1000w intel dual core 2.8 ghz
15 Mar 2011 17:59 UTC

neutrino wrote: MRB, I have another question about crossfire support in Windows 7. Is it possible to run 4 HD5970 or limitation is maximum 4 GPU? You are using some Linux distro for the rig, right? 16 Mar 2011 21:58 UTC

mrb wrote: @Alex I currently have no plan to support MPI.

@ReDoP Sorry no idea, I don't use Windows.

@neutrino Windows is limited to 4 GPUs. I run 8 GPUs under Ubuntu 8.04 (since then I upgraded to 10.04).
17 Mar 2011 08:28 UTC

Andrew wrote: Hi,
How did you setup the video driver? I didn't find it documented on your site.
On my dual HD5970 I get 4 adapters each with 1 device by switching the DISPLAY environment variable around. I can use them fully by executing 4 times my code but from your screen shots I guess you get all 8 devices from the calDeviceGetCount call.
30 Mar 2011 21:28 UTC

mrb wrote: Andrew: use an xorg.conf file that defines each of the GPUs. With DISPLAY set to the default value, a single program can access all 8 devices (yes calDeviceGetCount returns 8).

Example xorg.conf: http://pastie.org/1738307
31 Mar 2011 07:12 UTC

ReDoP wrote: problem solution :

for each gpu ati, we need to have a processor core,

for 1 ati 5970 = 1 dual core
for 2 ati 5970= install a quad core , ex core i5 760 quad core

system runing 24/24 at 98% ACTIVITY for the 2 ati 5970
tested with mxkey. fenixkey, and log2cod
system time 20/21 s per salt
31 Mar 2011 13:10 UTC

mrb wrote: ReDoP: my system with 1 CPU core (only) is perfectly capable of exploiting 8 GPUs with no slowdown.

With older versions of the SDK, and when programming in OpenCL (as opposed to using the lower level CAL like whitepixel does), it used to be recommended to have 1 CPU core per GPU because AMD's OpenCL implementation would implement busy loops per GPU to wait for kernel execution.

Nowadays you can set the environment variable GPU_USE_SYNC_OBJECTS=1 and the CPU utilization will drop to nothing. See http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=143851&enterthread=y
31 Mar 2011 15:36 UTC

opencl_fans wrote: Hi, I have 3 HD6990s installed in my system. I am running Ubuntu 10.10 x86_64, AMD driver 11.3 and SDK 2.4 just released. With clinfo, I can see Cayman 0 to 5, the system recognized all GPUs fine. Unfortunately, as I tried to compile whitepixel v2, it fails with error. Could you please check if the source is compatible with 11.3 and SDK 2.4? I will try to post the errors tmr... 08 Apr 2011 14:51 UTC

opencl_fans wrote: Hi, I see the whitepixel-2 compiles now when I removed remove “-Werror” in the Makefile. So I first ./test to generate a code d41d8cd98f00b204e9800998ecf8427e, then ./whitepixel -ec all d41d8cd98f00b204e9800998ecf8427e, now here is a strange error:
Whitepixel v2
Enabling experimental BFI_INT instructions (integer bitfield insert)
Attacking search space: charset all, length 5
Initializing CAL
CAL version 1.4.1332
Found 6 devices
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Error: patched 249 instructions, was expecting: 248

Is this due to my 11.3 and SDK2.4 config?
09 Apr 2011 03:54 UTC

mrb wrote: opencl_fans: you used the -e option to enable the experimental BFI_INT instruction, via a hack. However this hack does not seem compatible with 2.4.

So remove -e. The downside is a small reduction in bruteforcing speed.

Eventually AMD will add proper BFI_INT support, so the unstable hack won't be necessary anymore.
09 Apr 2011 04:02 UTC

opencl_fans wrote: Hi mrb, thanks for your reply and I got it working now :)

opencl_fans@X8DTG-QF:~/whitepixel/whitepixel-2$ ./whitepixel -c all 21232f297a57a5a743894a0e4a801fc3
Whitepixel v2
Attacking search space: charset all, length 5
Initializing CAL
CAL version 1.4.1332
Found 6 devices
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 1: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 2: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 3: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 4: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 5: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
...

How long will this take? I can see that whitepixel status "sleeping" in System Monitor and waiting channel is "KCLGlobalKernelScheduler"it is more than 5 minutes now? How can I check it is running not hanged?

Thanks
09 Apr 2011 04:22 UTC

mrb wrote: Can you add the -v (verbose) option? Sounds like a hardware/driver issue. 09 Apr 2011 06:13 UTC

opencl_fans wrote: Thanks, here it is:

opencl_fans@X8DTG-QF:~/whitepixel/whitepixel-2$ ./whitepixel -v 21232f297a57a5a743894a0e4a801fc3
Whitepixel v2
Attacking search space: charset lower, length 5
Max iterations for specified charset: 17576
Will do 17576 iterations per loop
Initializing CAL
CAL version 1.4.1332
Found 6 devices
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 1: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 2: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 3: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 4: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 5: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Initializing global buffer
Initializing message block
Initializing sine values
Initializing global buffer
Initializing message block
Initializing sine values
Initializing global buffer
Initializing message block
Initializing sine values
Initializing global buffer
Initializing message block
Initializing sine values
Initializing global buffer
Initializing message block
09 Apr 2011 08:53 UTC

mrb wrote: I assume you forgot to paste the last line (which should be "Initializing sine values").

At this point whitepixel effectively launches the kernel via calCtxRunProgramGrid() which seems to hang.

Definitely a driver/hardware issue. Try with only 1 GPU (-g 1). Try on another machine. Make sure you installed the latest drivers. Check for logs in /var/log/kernel.org and Xorg.log.
10 Apr 2011 03:02 UTC

opencl_fans wrote: Hi, I have repeated the test with "-g X", the max X I can use is 4, and I have this:
./whitepixel -g 4 21232f297a57a5a743894a0e4a801fc3
Whitepixel v2
Attacking search space: charset lower, length 5
Initializing CAL
CAL version 1.4.1332
Found 6 devices
Manually limited to 4 devices
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 1: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 2: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 3: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Overall rate: 0 Mhash/sec...
Found message.
Length: 5
Bytes: [61 64 6d 69 6e]
String: [admin]

If I increase X to 5, it halts like previous... any idea? Is this a hardware fault? Thanks!
11 Apr 2011 05:13 UTC

mrb wrote: It could be that the 5th GPU (device 4) is defective. Remove all your cards from the computer, and run tests with one card at a time to try to pinpoint which one is causing problems. 11 Apr 2011 07:41 UTC

opencl_fabs wrote: Hi, thanks. Is it possible to assign extensive whitepixel crouching to a specfic card, i.e. using GPU 0+1, or 2+3, or 4+5 alternatively to verify the hardware fault? I am not sure if this is due to driver problem? 11 Apr 2011 16:32 UTC

mrb wrote: It is not possible, but will be in a future version of whitepixel. 11 Apr 2011 17:25 UTC

opencl_fans wrote: Hi, I have solved my problem, it was due to wrong BIOS setting of my X8DTG-QF. I have tested that -e works too, so I have 3x HD6990
(1) with -e 25,768 Mhash/sec
(2) without 22,950 Mhash/sec
I will add the 4th 6990 once solve the power issue...
12 Apr 2011 05:36 UTC

opencl_fans wrote: Sorry, it's me again. I would like to know how to full-loading the system with -ec for 24 hours? since the example was too short to complete. Thanks again :) 12 Apr 2011 05:40 UTC

mrb wrote: Just increase the search space to all chars with, say, length 8 with "-ec all -l 8" 12 Apr 2011 22:36 UTC

opencl_fans wrote: Finally, I would like to update the score with 4x HD6990s:
./whitepixel -ec all -l 5
Whitepixel v2
Enabling experimental BFI_INT instructions (integer bitfield insert)
Attacking search space: charset all, length 5
Initializing CAL
CAL version 1.4.1332
Found 8 devices
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 1: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 2: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 3: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 4: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 5: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 6: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
Device 7: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads
34538 Mhash/sec...
19 Apr 2011 05:44 UTC

mrb wrote: 34.5 Ghash/s -> cool 19 Apr 2011 08:22 UTC

AFRAR wrote: Hi.,
i am using win 7 64 bit with 4* 5850.,I am down plugging from X16 to X1 via pcie extender.,Device manager shows all cards and everything fine but when i run my brute force exe it uses one card only .,
How do i rectify this problem.
24 Apr 2011 16:59 UTC

lee wrote: Awesome setup, I would love to see what this is capable of running pyrit under cal++

my lowly 5850 does approx 68,000 keys per sec.

this must be a killer rig.

Im defo going to have a go at your pcie x1 mod, brilliant idea and the first time ive seen anyone attempt it.
25 Apr 2011 20:43 UTC

tatgdi wrote: Hello opencl_fans/Marc

Can you advise us of which BIOS option had to be changed to resolve the issue with the 4th GPU not being recognized properly within Linux?

Also can you provide some details about your system setup? How are you able to power all 4 6990's? What chassis, power supplies and motherboard are you using? Any information would be greatly appreciated.

Hi, I have solved my problem, it was due to wrong BIOS setting of my X8DTG-QF. I have tested that -e works too, so I have 3x HD6990
(1) with -e 25,768 Mhash/sec
(2) without 22,950 Mhash/sec
I will add the 4th 6990 once solve the power issue…

opencl_fans, - 11-04-’11 22:36
29 Apr 2011 04:54 UTC

mrb wrote: The max I have tried is 4 x 5970 or 3 x 6990.

I have not tried 4 x 6990 (yet).

I use this mobo in open-air chassis as pictured here: http://blog.zorinaq.com/?e=47
29 Apr 2011 12:30 UTC

opencl_fans wrote: Hi, sorry being away for a while.

To tatgdi, I just change my BIOS to optimal default in the X8DTG-QF, then all 8 GPUs are correctly recognised by Ubuntu 10.10 with driver 11.3 and 11.5. I followed the instruction on http://wiki.cchtml.com/index.php/Ubuntu_Maverick_Installation_Guide to install and update my driver.
I use corsair ax1200 + thermaltake power express 650 http://www.thermaltakeusa.com/Product.aspx?S=1207&ID=1544 to handle all 3+1 6990s, and it is on 24x7 for almost one month since.

To mrb, have you ever had problem with gdb with your rig? I have this strange error, with Ubuntu 10.10 + driver 11.5 + sdk 2.4, I can compile and run whitepixel without problem. Once I switch to debug mode, it gives segmentation fault as following:
opencl@X8DTG-QF:/media/New_Volume_1/APPZ/Benchmark/whitepixel/whitepixel-2$ gdb ./whitepixel
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /media/New_Volume_1/APPZ/Benchmark/whitepixel/whitepixel-2/whitepixel...(no debugging symbols found)...done.
(gdb) r -ec all -l 5
Starting program: /media/New_Volume_1/APPZ/Benchmark/whitepixel/whitepixel-2/whitepixel -ec all -l 5
[Thread debugging using libthread_db enabled]
Whitepixel v2
Enabling experimental BFI_INT instructions (integer bitfield insert)
Attacking search space: charset all, length 5
Initializing CAL
CAL version 1.4.1385
Found 8 devices
Launching 256 threads per SIMD

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5bb7ad0 in ?? () from /usr/lib/fglrx/libaticaldd.so
(gdb) bt
#0 0x00007ffff5bb7ad0 in ?? () from /usr/lib/fglrx/libaticaldd.so
#1 0x00007ffff5bac01d in ?? () from /usr/lib/fglrx/libaticaldd.so
#2 0x00007ffff5bac0dd in ?? () from /usr/lib/fglrx/libaticaldd.so
#3 0x00007ffff5a16f80 in ?? () from /usr/lib/fglrx/libaticaldd.so
#4 0x00007ffff5bfd4a3 in ?? () from /usr/lib/fglrx/libaticaldd.so
#5 0x00007ffff5bf804d in ?? () from /usr/lib/fglrx/libaticaldd.so
#6 0x00007ffff5c09ddb in ?? () from /usr/lib/fglrx/libaticaldd.so
#7 0x0000000000403157 in compile_and_run ()
#8 0x00000000004037b1 in main ()
(gdb)

I have posted this to AMD forum but so far without any luck...
Could you please help? Thank you!
11 May 2011 09:18 UTC

Michael wrote: I am doing some bitcoin mining. When you say that on some motherboards, plugging a 16x card into a 1x riser, there will not be recognition of the presence of the card, do you mean the card will not be recognized, or the card will not even have a signal going to it that would run the fan. My system refuses to recognize my card, but the fan on it is running.

Should I use the wire like you describe? Will this damage the system?
03 Jun 2011 19:34 UTC

mrb wrote: The card's fan was spinning, but the system either hung during POST, or the card was not detected by the OS (absent from the lspci output).

You should try the wire. It will not damage the system, unless you connect it to the wrong pins (be careful).
04 Jun 2011 00:47 UTC

r00nix wrote: Hello. When i'm trying to use 29 Jun 2011 21:58 UTC

r00nix wrote: ... -c option, i have such error:
./whitepixel -vv -c all 21232f297a57a5a743894a0e4a801fc3
Whitepixel v2
Attacking search space: charset all, length 5
Max iterations for specified charset: 65536
Will do 65536 iterations per loop
Initializing CAL
CAL version 1.4.1417
Found 2 devices
Launching 256 threads per SIMD
Device 0: Cypress GPU (tgt.rev 8.2), 20 SIMDs, launching 5120 threads
Device 1: Cypress GPU (tgt.rev 8.2), 20 SIMDs, launching 5120 threads
Initializing global buffer
Initializing message block
Initializing sine values
Initializing global buffer
Initializing message block
Initializing sine values
Overall rate: 0 Mhash/sec...
Device 0: execution time 1124 ms (3582 Mhash/sec)
Global buffer for first and last threads:
thread 0:
elm 0: 02(000000) 0000 0100 0080 0000
elm 1: 02(000000) 0000 0100 0080 0000
elm 2: 02(000000) 0000 0200 0080 0000
elm 3: 02(000000) 0000 0300 0080 0000
elm 4: 02(000000) 0000 0400 0080 0000
elm 5: 02(000000) 0000 0500 0080 0000
elm 6: 02(000000) 0000 0600 0080 0000
elm 7: 02(000000) 0000 0700 0080 0000
elm 8: 02(000000) 0000 0800 0080 0000
elm 9: 02(000000) 0000 0900 0080 0000
elm 10: 02(000000) 0000 0a00 0080 0000
elm 11: 02(000000) 0000 0b00 0080 0000
Candidate found by GPU 0 thread 4117 elm 4
thread 5119:
elm 0: 02(000000) 0000 f5ef 0080 0000
elm 1: 02(000000) 0000 f5ef 0080 0000
elm 2: 02(000000) 0000 f6ef 0080 0000
elm 3: 02(000000) 0000 f7ef 0080 0000
elm 4: 02(000000) 0000 f8ef 0080 0000
elm 5: 02(000000) 0000 f9ef 0080 0000
elm 6: 02(000000) 0000 faef 0080 0000
elm 7: 02(000000) 0000 fbef 0080 0000
elm 8: 02(000000) 0000 fcef 0080 0000
elm 9: 02(000000) 0000 fdef 0080 0000
elm 10: 02(000000) 0000 feef 0080 0000
elm 11: 02(000000) 0000 ffef 0080 0000
Device 1: execution time 1114 ms (3614 Mhash/sec)
Global buffer for first and last threads:
thread 0:
elm 0: f0(f1f200) f0f1 f200 eff0 f106
*bug*: invalid status for thread 0 elm 0: f0

but w/o charset definition there is OK. What's wrong and how can i use different charsets (not only lowercase)?
29 Jun 2011 22:04 UTC

r00nix wrote: Oh, looks like i had misconfigured X. Just tried to use your xorg.conf, all fine for now =) 29 Jun 2011 22:42 UTC

Thomas wrote: Hey, I've got whitepixel2 working with Ubuntu 11.04 X86_64, Catalyst version 11.7, and AMDAPP SDK version 2.5. The problem I'm having is whitepixel is only seeing 1 GPU (I have a 6990, another on the way). aticonfig sees both GPUs,

aticonfig --list-adapters
0. 03:00.0 AMD Radeon HD 6990
1. 04:00.0 AMD Radeon HD 6990

When I run whitepixel and feed it a hash I get the following output:

Whitepixel v2
Attacking search space: charset lower, length 5
Initializing CAL
CAL version 1.4.1457
Found 1 device
Launching 256 threads per SIMD
Device 0: unknown (tgt.rev 15.1), 24 SIMDs, launching 6144 threads

It does work, and will break hashs, but only using one GPU.

I know it's not a hardware problem because I've used both GPU cores before to mine bitcoin (on windows tho).

I also checked out your link to a sample xorg.conf file, mine is setup the same with 2 GPU references, not 8.

For future reference, to get it functioning I had to change all references to "cal.h" and "calcl.h" in all ".c" files. #include was changed to #include etc.. to accommodate AMDAPP SDK v2.5.

Any help would be appreciated. ~Thomas
15 Aug 2011 21:48 UTC

Thomas wrote: Sorry for another post, I didn't realize it would treat carets as tags. The "#include" section of the last post should look like this: where "(" is ""

#include(cal.h) becomes #include(CAL/cal.h)
#include(calcl.h) becomes #include(CAL/calcl.h)
15 Aug 2011 21:54 UTC

mrb wrote: Yeah I am aware of the CAL header file changes necessary to compile with the latest SDK...

I don't know why whitepixel sees only 1 GPU on your system. This is typically a symptom of xorg.conf not defining all GPUs, but you claim your xorg.conf is correctly configured... You do _not_ specify the -g option, do you?
16 Aug 2011 02:23 UTC

Thomas wrote: Appreciate the quick response.

I don't specify a number of GPUs to use, as the default is all.

Here is the beginning of my xorg.conf:

Section "ServerLayout"
Identifier "aticonfig Layout"
Screen 0 "aticonfig-Screen[0]-0" 0 0
Screen "aticonfig-Screen[1]-0" RightOf "aticonfig-Screen[0]-0"
EndSection
16 Aug 2011 08:06 UTC

mrb wrote: Let aticonfig recreate the xorg.conf. It will automatically define all your GPUs in the file:

$ sudo aticonfig --initial -f --adapter=all
17 Aug 2011 05:26 UTC

Thomas wrote: That's actually the first thing I did after I noticed it wasn't seeing the second GPU. I re-created it anyway, same problem. I don't believe it's my xorg.conf, as hashcat can see and use both GPUs.

Possibly re-compile whitepixel?
17 Aug 2011 06:55 UTC

mrb wrote: I don't think it will help. What command line do you use to invoke whitepixel? 17 Aug 2011 11:38 UTC

Thomas wrote: I use gnome-terminal, and run whitepixel with the standard cd to whitepixel directory, ./whitepixel hash. 17 Aug 2011 17:35 UTC

Thomas wrote: Solved, had to use the command: 'export DISPLAY=:0' before running it for it to see both devices. 21 Aug 2011 18:11 UTC

Servus wrote: I have just one question:

How many calculations per second will a new supercomputer do 2030?
09 Sep 2011 17:17 UTC

Spiralman wrote: I have a hard times trying to add another 2 HD5970 cards to my other 2 cards in Linux Ubuntu 11.04. While working with 2 cards all is ok: speed, detecting of GPUs by system, changing the clock speeds etc.
Then problem is that after resetting config file with "$ sudo aticonfig —initial -f —adapter=all" and then i try to add another two cards to system via two 16x pci-extenders (all 4 cards connected to 16X slots cause i'm using Asus P6T7 WS Supercomputer motherboard with 4 slots working at 16X and 3 slots at 8X), with 3 power supplys: one 500W for motherboard+HDD, one 1200W for first 2 cards, and one 1200W for other two cards. so power is more than enough. i get first ubuntu load screen and then right before the appearing of desktop i get whitescreen then black screen and freeze.
P.S. i'm not using crossfire-x.

What do you think about this issue?
19 Sep 2011 11:29 UTC

mrb wrote: Spiralman: defective extenders perhaps. Install all 4 cards directly on the motherboard, see if Xorg and apps detect all GPUs. 19 Sep 2011 21:36 UTC

Spiralman wrote: I have 4 new extenders-but only two of them are in use. Second is that one of the cards is with Arctic Cooling so i can't install all 4 cards directly on the motherboard, BUT i can do this with 3 cards and again same result as with 4 cards - my linux works only with two cards.
How to know that Xorg detects all cards when i can't get access to desktop? What if I uninstall all drivers then install from the scratch catalyst 11.7 (not the newest 11.8 cause on hashcat forum i see "on linux, 11.8 creates 100% CPU load per GPU per core and this reduces performance a lot."). Can you refer to me the good working comands for uninstall-install of catalyst or maybe should i delete my Ubuntu completely and to try things from zero point?
20 Sep 2011 09:15 UTC

mrb wrote: Boot in single user mode to let you use the pc without loading X. Run lspci, make sure you see all 4 cards (8 VGA/Display Adapters in lspci).

Catalyst can normally be uninstalled with "dpkg --purge fglrx" if you installed it via a package that you built.

Try to update your motherboard's BIOS.

Make sure you are running a 64-bit Ubuntu.
20 Sep 2011 20:05 UTC

mrb wrote: And try with 3 cards only (without the extender). 20 Sep 2011 20:06 UTC

albino wrote: Do you have any plans to rig whitepixel up with a dictionary/mangling rules, aka johntheripper? If not, do you think it's possible to do without significantly slowing the bruteforcing rate? 22 Sep 2011 15:53 UTC

mrb wrote: I have no immediate plan for that. Look at oclhashcat, it supports dictionaries with per-character charset mangling rules. 23 Sep 2011 03:33 UTC

Ger wrote: I have an asus m4n68t-m LE and I just bought a PCI-E 1X to 16X Riser Card Extension Cable, I want to connect a 16x GPU into it but I cant get it working. Any ideas? Thanks 17 Oct 2011 16:48 UTC

Gomar wrote: [whitepixel v2 achieves a higher rate of 33.1 billion password/sec on 4xHD 5970.]

congrats! However, it's still slower than distributed.net's 100billion/sec.
with your rate, it should take 9years to crack a 10 character password using lower & upper & symbols & numbers.
27 Oct 2011 03:45 UTC

David wrote: hi i wondring SL3 unlock support SHA1 hash with whitepixcel ? 03 Nov 2011 20:04 UTC

David wrote: hi there no one have idea about it is whitepixel is faster then 0.06 hashcat-lite version. 04 Nov 2011 14:22 UTC

suthernfriend wrote: cool 24 Nov 2011 19:44 UTC

mike wrote: Instead of expensive hardware, I think it may be worth venturing into generating high quality passwords (e.g. some work appears at http://dazzlepod.com/uniqpass/) and run dictionary attack using that list of passwords with some mangling rules THEN brute force. 08 Dec 2011 03:40 UTC

makaveli wrote: Absolutely awesome! Thanks for the guide! 24 Feb 2012 04:45 UTC

Timo Juhani Lindfors wrote: Just curious, is there any way to do GPU acceleration without the closed source catalyst drivers from ATI? 17 Apr 2012 10:39 UTC

mrb wrote: Timo: in the Bitcoin community, there is a project to make a GPGPU kernel run without the closed source drivers: https://bitcointalk.org/index.php?topic=4618.0

You will find somewhat incomplete asm shaders & documentation that indicates it is possible...
21 Apr 2012 08:58 UTC

Simon Zerafa wrote: Hi,

Just a quick note to ask if you have improved your set-up since the last article update?

If you have then what sorts of speeds are you getting now and what does the latest setup look like? :-)

Kind Regards

Simon
30 May 2012 06:46 UTC

Someone wrote: I had WP working a while ago, decided to get it up and running again under Ubuntu 12.04.

Catalyst driver version 12.4, AMDAPP SDK version 2.6, GCC version 4.6.3.

Modified the Makefile to look in /opt/AMDAPP/include/CAL, and removed -Werror as instructed. (could use $AMDAPPSDKROOT/CAL for SDK versions 2.5 and up, but you knew that)

When I tried to build, I'd get 'undefined reference' errors, for everything referenced in whitepixel.o (even pow from math.h).

Solution was to add $(LDFLAGS) to the building of whitepixel and test.

The 2 modified lines in the make file are as follow:

line 17 is now:
whitepixel: whitepixel.o cal-utils.o cpu-md5.o $(LDFLAGS)

line 25 is now:
test: test.o cal-utils.o cpu-md5.o $(LDFLAGS)

I'm quite the amature, but I'm sharing this here incase someone else runs into similar problems.
10 Jun 2012 09:05 UTC

dogdaynoon wrote: instead of using it to brute force finding password, use it for Folding@Home?

ventuz, - 15-01-’11 21:01

I don't know anything about brute force cracking but that comment right there was hilarious!!!
02 Jul 2012 16:39 UTC

huu wrote: 03eb9252f1caab5071293a8ba1821fcd please help me crack this 17 Jun 2013 04:07 UTC

Fluttering wrote: I understand this is rather old, but I'm trying to put a Pci-E x16 GPU into a server which only has 2 x8's, a PCI-E x4 port and an x1.
Won't plugging the GPU into a x1 extender to make it x16 be worse since the bandwidth will be extremely lower? I'd like to know that before I go any further, because I can't plug the GPu in.
30 Jul 2013 19:57 UTC

test wrote: Ok... how long does this password?
%b"'3ah][Xr$ur~YQdW8*4pB%D\%I]pM;S(:tZ=!9.+0@'je&~3@LB8_DB['|Q-o
i am just curious!!!
just post it back on this page!!
16 Sep 2013 14:14 UTC

Mystikan wrote: There is a very simple method developers can use to prevent brute-force password-guessing, and it's a technique that's decades old: Limit the number of login attempts for any given user account.

That can be done on two fronts: 1) only allow one login attempt every 5-10 seconds, ignoring any new attempt that takes place faster than this; and 2) locking the account, ignoring all subsequent login attempts, for a given time (usually between 1-24 hours) after a given number of failed attempts (usually 3.)

I'm absolutely gobsmacked at how many website and application developers have forgotten, or merely neglect to use, this simplest of all preventative measures.

If you implement login timing limits, you easily protect your users' accounts against all brute-force, dictionary and rainbow table type attacks. Unless they're completely naive as to use passwords like "password", "letmein", or their spouse's/kids' names, login limiting is the best protection you, as a developer can provide.
17 Jun 2014 16:06 UTC

man wrote: not everytime password are tested again live systems. so me passwords are tested on files or hash strings :) 31 Aug 2014 14:14 UTC

giro1991 wrote: Hi dude, nice project.

I have a doubt about a x4 card i'd like to get to work in a x1 slot. Judging from this blog, I need to hack a riser and short pin 1 with B17 and this should inform the mobo that only a 1x card is present, is that correct? If it helps, the card in question is available on the market in x1 and x4 configs, they look identical bar the pins, so I'm optimistic about this short hack.
19 Oct 2017 17:04 UTC

Hohn wrote: Hi ,
Can you explain his step by step

- Chassis Norco RPC-170 1U
2 x $ - 2 x PSU Supermicro PWS-562-1H (aka Compuware CPS-5611-3A1LF) 560 Watt 80 PLUS Silver
- Mobo Gigabyte GA-P31-ES3G, one x16, three x1 PCIe
CPU Intel Core 2 Duo 3.0GHz E8400 65W
RAM Kingston 2GB DDR2-667
- 4 x PCIe x1 Flexible Extender ARC1-PESX1B-Cx
4 x - 4 x AMD Radeon HD 5970 dual-GPU video card
09 Jun 2020 11:53 UTC

LOLBaer wrote: The year is 2021.
Today's graphics cards probably do 100 times as much as 10 years ago. Make such a test setup. Thanks.

Wir schreiben das Jahr 2021.
Die heutigen Garafikkarten verichten wahrscheinlich das 100 fache als vor 10 Jahren. Machen sie doch mal solch einen Versuchsaufbau. Danke.
03 Jan 2021 01:32 UTC