Whitepixel v2: configurable charset, higher performance (33.1 billion password/sec!)

Barely after I released whitepixel v1 last week, version 2 is already available. It delivers basic configurable charsets, plus a solid +15% performance increase, reaching 33.1 billion password/sec on 4 x HD 5970 graphics cards (see detailed hardware description)! These single MD5 hash brute forcing speeds are simply unheard of to this day, on one computer. Browse the whitepixel project page to get your hands on this open source (GPLv3) GPU-accelerated password hash auditing tool.

(On a side note, after seeing whitepixel demonstrate that high ALU utilization ratios for MD5 on AMD GPUs were possible, Ivan Golubev, the developer of ighashgpu, reworked his tool to increase his ALU utilization. He released ighashgpu 0.91.17.1 which raised his single MD5 cracking speed by 12%. This is one of the reasons why I like open source: anyone can learn from anyone else's source code and the knowledge of one developer can then spread out and indirectly benefits a much larger user base. Even I, a user of ighashgpu, benefit from it :-) )

Crazy BFI_INT hacking

Whitepixel v2 is +15% faster than v1 thanks to the BFI_INT instruction (integer bitfield insert) present in AMD Radeon HD 5000 series GPUs. This instruction is similar to the Cell B.E SPU selb, or the PowerPC AltiVec vsel instructions: it selects bits from 2 registers depending on a mask. It allows implementing both the F() and G() functions of MD5 in a single clock cycle.

However, this native instruction is not exposed at all, not even to low-level CAL IL code. Well, there is one IL instruction, ubit_insert, whose purpose is to implement some DirectX 11 operation, which is compiled to a BFM_INT+BFI_INT native pair. But there is no instruction to expose the powerful functionality of BFI_INT alone. The solution? After compiling the IL code to native instructions, whitepixel dynamically patches the binary CAL object in memory by scanning its opcodes to replace some of them with BFI_INT. This works well, but is ugly and makes a lot of assumptions. I would not be surprised if it caused crashes, so I made this feature optional via the -e option.

Configurable charsets

Currently whitepixel v2 gives a very basic choice: selecting between lower, upper, digit, printable ASCII (0x20-0x7e), or all bytes (0x00-0xff). No combination of these charsets is allowed, but v3 will. I like to stick to the "release often, release early" strategy.

mrb Tuesday 14 December 2010 at 03:46 am | | Default

36 comments

Ouroborus

Pretty nice, but it seems you’re still looking at around 120 years to crack a difficult 10 character password. But an eight character, case sensitive, alpha-numeric password should take less than two hours.

Ouroborus, - 01-01-’11 08:26
mrb

It is 57 years to crack all 10-char printable ASCII passwords with 4×5970 (95**10/33.1e9/3600/24/365.25 = 57.3 years).

Actually a real attack would take less than 9 years, because one must take into account GPU performance doubling every 1.5 years.

mrb, - 01-01-’11 10:19
mr b

how very exciting! i’ll be sure and tell everyone who asks that you aren’t the same mr b as I who can be found on twitter @veritaz

mr b, - 05-06-’11 18:59
Dennison Uy

Why does it always have to be MD5? How does this perform on other standards like SHA-256?

Dennison Uy, (URL) - 06-06-’11 02:25
mrb

Many applications with substandard password hashing algorithms (esp. web applications) hash passwords with MD5. So it became the de facto benchmark for password bruteforcers…

SHA-256 requires roughly 7 times more instruction (per call to the compression function). So a 4 x HD 5970 machine would bruteforce straight SHA-256 hashes at about 5 billion/sec.

mrb, - 06-06-’11 03:05
Anon

In real world SHA-256 performance the HD 5970 can do about 530-800MHash/second, depending on clock speed and other variables.
That means 2.1-3.2BHash/s, not 5BHash/s

Anon, - 06-06-’11 07:32
mrb

No, I have (unreleased) SHA-256 code (for Bitcoin) that runs at 1.1 Ghash/s on the 5970.

mrb, - 06-06-’11 11:01
ramon

No comment. Just some questions. So you are using brute force and very fast GPUs to guess passwords. How does this work in the real world? Say I try to use this to guess a website access but the site will shut me out after the first three wrong guesses. What good is all that speed? What are the odds of guessing right in the first three tries?

ramon, - 07-06-’11 05:16
Jon L.

Ramon, the point isn’t to blindly guess at passwords, the point is to take a known MD5 hash and determine what the plaintext password is that created that hash.

Jon L., - 07-06-’11 08:22
RayB

So – the attacker must first have a list of hashes from the site – by hacking it or otherwise to know a hash whose password would work. Therein lies the rub.

Interesting if this GPU approach is scaleable, using multiple units, like the Access Data DNA – a manager and multiple workers to split up the task – using 1000 or more devices.
RayB, - 07-06-’11 09:34
klmdb

try running the program on one of these:

http://www.tigerdirect.com/applications/..

XD

klmdb, - 09-06-’11 07:03
mrb

klmdb: I don’t support Nvidia because they are much slower. The Tesla C2070 only does 1.4 Ghash/s with MD5. See http://www.golubev.com/gpuest.htm

mrb, - 09-06-’11 23:15
Greg B

Ramon and Ray B: This isn’t designed for hacking website passwords, such as to bust into yahoo mail accounts, pr0n sites, etc. This would be very effective at busting into password-protected data stores, such as USB keys or hard drives, where one physically had access to the medium.

Greg B, - 14-06-’11 14:18
JodiTheTigger

Mrb, care to share your unreleased bitcoin mining code? From what I can tell it’s 50% faster than what is available.

JodiTheTigger, - 20-06-’11 14:40
mrb

It is not 50% faster. It is just a few percent faster than the best public code. Don’t forget that 1 Bitcoin hash is defined as 2 SHA-256 hashes. My code does 569M Bitcoin hashes/s, or ~1.1G SHA-256 hashes/s on the HD 5970.

mrb, - 20-06-’11 22:53
Aaron

Very, very cool. I’ll definitely add this to my ‘toolbox’ when I build my desktop. Going to crossfire 2 Radeon 6790’s (after flashing the bios to 6850, if I can), so I should be doing decently well with that.

Aaron, - 02-07-’11 17:59
mywebs

Would it make any difference if the MD5 hash also had a salt? When I use MD5 to hash a password I always use a sentence length salt that includes numbers and special chars to make sure a rainbow table can’t be used.

I actually created some code to encrypt cookies, could also be used for passwords, that first uses MD5, then a 128 bit one way hashing function then a 256 bit one in succession. Then I compress it all back down to about 13 characters in such a way it would be impossible for anything but possibly a supercomputer to reverse due to integer length limitations.

mywebs, (URL) - 28-07-’11 12:21
mrb

As you point out, salts are only useful to prevent pre-computed attacks. For the purpose of bruteforcing, a salt does not typically significantly slow down attacks.

Bruteforcing attacks can be slowed down by iterating a hash function. Look at the standard MD5-based or SHA2-based UNIX crypt() function for how it is done.

You talk about “encrypting” cookies with one way hash functions, but it does not sound like encryption (as it would imply you can decrypt them, which is precisely what one-way functions prevent).

mrb, - 28-07-’11 21:59
db

your program is much, much slower than oclHashcat. i have 4× 5970s and single hash raw MD5 performance with oclHascat-lite is ~ 49.2 G/s. step it up!

db, - 16-08-’11 05:27
mrb

I know. One of my goals when releasing whitepixel was to entice competition between different tools. At first, oclhashcat users in the IRC channels could not believe my results, said I cheated by using a simplistic charset, etc. Now look at how fast oclhashcat is. I succeeded :-)

mrb, - 16-08-’11 22:24
klo

Are you going to release a new faster version??

klo, - 22-08-’11 06:42
Peter

Nice and impressive. But all my login code has a preset time interval for login attempts. In other words, you can try a password only every ‘n’ seconds.
Ah yes, and get a couple wrong in a row and you are locked out for even longer so you can clear your mind and thing about what to do next :-)

Peter, - 31-08-’11 17:03
mrb

Peter, my tool runs an offline attack. It is not subject to settings controlling the maximum number of login attempts.

mrb, - 01-09-’11 20:04
m3g9tr0n

Hi Marc!
I have to admit that whitepixel is the fastest raw MD5 cracker!
Could you please suggest me some tutorials related to GPGPU and CAL APIs for ATI GPUs because I am interested in that topic!
Thanks in advance and keep up with the good work!!!

m3g9tr0n, - 16-11-’11 01:04
Ahmad

Can the code of whitepixel be modified to work with SHA-256?

Ahmad, - 17-11-’11 06:50
mrb

The official AMD doc on CAL is what I used to learn.

Yes the code could be modified to support SHA-256. (I wrote a private Bitcoin miner, which uses SHA-256, based on a fork of whitepixel.)

mrb, - 17-11-’11 17:40
m3g9tr0n

Thanks Marc for your reply!
Have you ever try this library?
http://code.google.com/p/calseum/

m3g9tr0n, - 24-11-’11 09:59
mrb

No. It seems calseum would be rendered obsolete by OpenCL…

mrb, - 24-11-’11 10:11
Mr T, who is much more smart than you

well MY website has rayguns and mutant porno lizards in jackboots that throw monkey poop at anyone who enters two wrong passwords in a row, so your code is crap. useless crap.
And and um On his lunchbreak my older brother like soldered together three minitels he got off that smelly french dude down the street and it is capable of doing like, um, a bazillion KADRILLION of those thingies every lightsecond. or something.
Oh yeah, and I uh, I like figured out a magic three-way encryptocoder function that obtuses not just cookies but SCONES, then recodercrypts them back and can’t be broken except by bashing me on the head until I surrender.
And and and since my TRS-80 doesn’t even HAVE a etcetera password shadow that I can find and doesn’t have a modem anyhow so WHY DO YOU EVEN BOTHER GETTING UP IN THE MORNING, let alone turning on your computer?

Why does every third response on this kind of blog involve people saying
A: You shouldn’t bother because it can’t crack MY website
B: Someone they know has done it faster
C: they’ve created some super code that renders even the NSA helpless.
or
D:they know more about linux so you suck.

I don’t know crap about this stuff, and even I know they’re silly. wow.

Thanks for your work on this stuff. This is how some of us learn a bit.

Mr T, who is much more smart than you, - 22-01-’12 00:57
TV

Hi Marc,

Crazy question, but will your software run on an ARM platform with an AMD/ATI card?

TV, - 08-03-’12 20:06
mrb

Not a chance. ARM platforms do not even support graphics cards made for x86 PCs.

mrb, - 09-03-’12 01:32
Simon Zerafa

Hi,

Have you been able to repeat your benchmarks using more modern ATI GPU’s? Say the 6990 or better?

If so what results did you get with Whitepixel or oclHashcat-plus? :-)

Kind Regards

Simon

Simon Zerafa, (URL) - 07-04-’12 05:17
mrb

Simon, no, however 6990 performance numbers with oclhashcat-plus can be found here: http://hashcat.net/oclhashcat-plus/

mrb, - 21-04-’12 01:49
Dawn White

I’m just trying to get my brain around calculating how long it would take such a rig using 4xHD 5970 units to crack WPA/WPA2 router passwords. For example many Thomson, VirginMedia routers use 8 lowercase characters. SKY routers on the older models use 8 uppercase. So I am talking about grabbing the 4-way handshake and then running say pyrit, aircrack-ng using dictionaries or using crunch to brute force. Any clarifications would be much appreciated as I am trying to either build a rig myself or buy one off the shelf. I know the hashcat team won some benchmark contests about a year ago, does anyone have an idea of the best price/performance rigs out there which would crack 8 symbol alph-numerics and ideally 10 symbol too as many newer routers are using 10 alpha-numeric symbols. Any links or advise is much appreciated.

TIA.

Dawn White, - 17-03-’14 16:47
Dawn White

Addendum to earlier post. re: WPA/WPA2 pw cracking. Just to be more precise, I mean crack 8 symbol alphanumeric passwords in a reasonable time lets say a few hours or a few days.

Dawn White, - 17-03-’14 17:01
mrb

Dawn White: I have no benchmark numbers for the HD5970, but I know that with a 8 x R9 290X rig, oclhashcat bruteforces WPA2 at 1.34M c/s. This means 8 alphabetic characters passwords can be attacked in 43 hours, 8 alphanumeric in 24 days, and 10 alphabetic in 3.3 years.

mrb, - 21-03-’14 23:12
(optional field)
(optional field)
Remember personal info?
Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.