Pictures of Nebulae, the Fastest GPU Supercomputer

I have not seen these pictures disseminated anywhere other than on one chinese technology website, IT168, so I thought I would share. They show Nebulae, the number 2 supercomputer according to the most recent TOP500 list.

Nebulae is composed of 4640 nodes, where each node has two Intel Xeon X5650 processors and one Nvidia C2050 GPU, for a total of 9280 processors and 4640 GPUs. Each node is a blade. There are 10 of them per blade chassis, 4 chassis per rack, or 40 nodes per rack. Therefore Nebulae represents at least 116 racks, not counting networking, storage, and other control equipment. So what can be seen in the above picture is most of it: 8 rows of about 18 racks.

You can find the original pictures in this IT168 article. Dawning, the designer and manufacturer of Nebulae, has a page on it with some technical details that seem interesting. But it is (literally!) Chinese to me. And with so much text in images, Google translate is not much of any help. I need to learn Chinese.

mrb Sunday 27 June 2010 at 11:01 pm | | Default | Three comments
Used tags: , ,

Microsoft Wrongly Accusing Google

The animosity that Microsoft displays towards Google is stupefying.

Check this out. On both the Microsoft Security Response Center blog and the Security Research & Defense blog, Microsoft comments:

  • "This issue was reported [...] by a Google security researcher."
  • "One of Google's security researchers publicly released vulnerability details."
  • "The analysis is incomplete and the actual workaround Google suggested is easily circumvented."
  • "We recommend not counting on the Google hotfix tool."

What is this about? Tavis Ormandy released details of a Windows Help and Support Center vulnerability (CVE-2010-1885). He discovered this on his own, as an independent researcher. In fact his advisory states:

"Finally, a reminder that this documents contains my own opinions, I do not speak for or represent anyone but myself."

This guy just happens to work for Google. Nothing wrong with that, right? Well Microsoft does not think so. They place the blame directly on Google, and make sure to mention the name "Google" 8 times in these 2 blog posts, without ever mentioning Tavis Ormandy's name once, the guy who actually discovered this.

Mike Reavey, Director, MSRC, you should know better, sir. Please apologize. Angry at Tavis that 5 days' notice was not enough to patch the flaw? Blaspheme him, not Google. Do not use this full disclosure event as a vehicle to carry your frustration towards Google.

mrb Friday 11 June 2010 at 01:18 am | | Default | No comments
Used tags: , ,

Improving Ant jar/zip Tasks Performance from O(n) to O(1)

Like many Java-based products, we use Ant to build NeXpose. While helping a coworker investigate performance issues, I noticed that Ant tasks creating jar archives containing many files were taking a lot of CPU time. I want to demonstrate via this blog post how a simple performance problem can be easily investigated and fixed. The resulting bugfix (1-line patch) was contributed to Ant and is included in version 1.8.1 which was released on 2010-05-07.

The Ant task creating the jar archive looked like this:

<jar destfile="foo.jar" basedir="directoryname"/>

The directory contained about 50 thousand files, and running the Ant task seemed to be about three times slower than creating the archive from the command line with the jar(1) command (103 sec vs. ~30 sec).

So I peeked at what Ant was doing with the excellent VisualVM debugger. When looking at a couple of stack traces taken at a random intervals, I noticed many were similar to this one:

"main" prio=10 tid=0x0000000041767000 nid=0x2174 runnable [0x00007fe89d117000..0x00007fe89d118eb0]
 java.lang.Thread.State: RUNNABLE
  at java.util.Hashtable.contains(Hashtable.java:270)
  - locked <0x00007fe81c738ff0> (a java.util.Hashtable)
  at org.apache.tools.ant.taskdefs.Zip.zipFile(Zip.java:1448)
  at org.apache.tools.ant.taskdefs.Jar.zipFile(Jar.java:602)
  at org.apache.tools.ant.taskdefs.Zip.zipFile(Zip.java:1551)
  at org.apache.tools.ant.taskdefs.Zip.addResources(Zip.java:792)
  at org.apache.tools.ant.taskdefs.Zip.addResources(Zip.java:853)
  at org.apache.tools.ant.taskdefs.Zip.executeMain(Zip.java:499)
  at org.apache.tools.ant.taskdefs.Zip.execute(Zip.java:416)
[...]

Why do I see Ant apparently spending so much time in Hashtable.contains() in my handful of stack trace samples, when a hashtable lookup has a time complexity of O(1)? Looking at Ant's code, I saw that it was using this hashtable as a set —storing the same object as a key and value— to represent entries that already exist in the jar archive, so that it could determine if a file to be added to the archive needs to have an entry created or updated. There were 50 thousand files in the archive. Worst case, a hashtable lookup costs a RAM access, or roughly 50 ns on a modern processor if not cached in the CPU caches. So 50k lookups should take a handful of milliseconds. However in my case it was apparent that Ant was spending about a minute doing nothing but lookups. I was more familiar with the Map class rather than the Hashtable class, so I glanced at the Hashtable.contains() documentation to confirm a suspicion...

public boolean contains(Object value)
Tests if some key maps into the specified value in this hashtable.

Yikes! This method checks for values instead of keys! Instead of doing an O(1) lookup with containsKey(), contains() iterates over all values for each file to be added to the archive. This explains the approximate minute of wasted CPU time: 25000 (average iterations) * 50000 (files) * 50 ns = 62.5 sec.

I changed the contains() call to containsKey() and recompiled Ant. It sped up the jar creation by 3x by shaving a minute from the execution time (103 sec down to 32 sec). I submitted a bug report (bug 48755 - Zip task does O(n) lookup in hashtable) and a patch to the Ant developers. The fix is included in Ant 1.8.1 which was released last month.

The whole investigation and fix took less than an hour. This is a great demonstration of the advantages of open source tools. Anyone can fix anything to help everyone.

mrb Tuesday 08 June 2010 at 03:17 am | | Default | One comment
Used tags: , ,

Brute Forcing RAR Archives Encrypted with the "-hp" Option

Today I am finally doing an official release of unrarhp, a Unix command line proof-of-concept brute forcer to recover the passwords of RAR archives encrypted with the RAR 3.x "-hp" option. As far as I know this is the only RAR "-hp" brute forcer that is open source and free. I wrote this cracker back in 2004 for a computer security contest organized at the Epitech french computer science school. I had to study the source code of the Unix version of "unrar", because at the time, the format of encrypted archives was not documented (not sure if it is today). There are 2 different ways to encrypt a RAR archive; the rar CLI tool exposes them through 2 options:

  • -p option, which encrypts only the content of the files in the archive, while file metadata (filenames...) are not encrypted
  • -hp option, which encrypts the internal block headers that contain file metadata, as well as the content of the files

I have never looked at the -p encryption, unrarhp works against archives encrypted with -hp only, but the 2 encryption mechanisms are definitely based on the same concepts. When encrypting a RAR archive with "-hp", a random 64-bit salt is generated by RAR, the UCS-2 encoded password is concatenated to the salt, the salt-password pair is hashed with 262144 rounds of a function based on SHA-1, which eventually outputs a 128-bit IV and 128-bit key used to AES-encrypt data blocks in ECB mode.

As a side remark, note that the fact that AES is used in ECB mode, and the fact that the same salt is reused for each file in the archive are serious cryptographic mistakes. This may open some yet undiscovered attack paths...

Unrarhp verifies passwords by using a trick that I am not sure any other RAR brute forcer uses: encrypted RAR archives seem to always contain an "end-of-archive" block that is the constant 7-byte plaintext blob "c4 3d 7b 00 40 07 00", so unrarhp simply compares the decrypted data with this known plaintext . The code is completely unoptimized and re-uses the SHA-1 and AES implementation of RAR, but it works. I first posted the code on the BarsWF forum to help others, and I believe IvanG (author of rargpu) wrote his brute forcer with the help of my code.

For more information and how to use the brute forcer, see the README file in the unrarhp tarball.

mrb Tuesday 01 June 2010 at 11:09 pm | | Default | One comment