Introducing Tianhe-1A: 4702 TFLOPS of GPU Power. Made in China. (And New World's Fastest Supercomputer).

Today at HPC 2010 China, the GPU-based Tianhe-1A was revealed and introduced as the new world's fastest supercomputer. Many news sites fail to give complete specifications but I have dug up all the information for you, readers. Read on.

As you may not see in the picture which shows only one row of rack, Tianhe-1A is composed of 7168 nodes, each featuring two Intel Xeon X5670 2.93 GHz 6-core "Westmere" processors, and one Nvidia Tesla M2050 448-ALU 1150 MHz "Fermi" graphics processing unit, for a total of 14336 CPUs and 7168 GPUs. There is an additional set of 2048 Galaxy "FT-1000" 1 GHz 8-core processors integrated to the system in an unknown way. Tianhe-1A achieves a practical performance of 2507 double precision TFLOPS with a theoretical peak of 4702 TFLOPS (both numbers exclude the FT-1000 processors):

7168 nodes * (11.733 GFLOPS per core * 6 CPU cores * 2 sockets + 515.2 GFLOPS per GPU) = 4702.21 double precision TFLOPS

The supercomputer is described as having 202752 heterogeneous cores (this time including the FT-1000 processors —I will re-iterate my previous complaint that these statistics are inconsistent in how extra hardware is either included or excluded from FLOPS and cores numbers):

7168 nodes * (12 processor cores + 14 SIMD units) + 2048 FT-1000 processors * 8 cores = 202752 cores

Tianhe-1A was designed by the National University of Defense Technology (NUDT), installed at the National SuperComputer Center in Tianjin (NSCC-TJ), consumes 4.04 MW of electricity, and comprises 103 rack cabinets. [Update 2010-11-16: Some sources state 112 racks.]

As its name discloses, Tianhe-1A is actually a major upgrade —I would say a rebuild— of the previous Tianhe-1 supercomputer that was made of 3072 nodes, mixing Xeon E5450 with E5540 CPUs, and using AMD HD 4870 X2 GPUs.

Most impressive is Tianhe-1A's theoretical peak of 4702 TFLOPS. This is 58% higher than the previously highest peak value from the TOP500 list (Nebulae, 2984 TFLOPS). I expect to see a system breaking 10 PFLOPS in the next 2 years. Tianhe-1A further solidifies China's leading position in the domain of GPGPU supercomputing who now operates the 3 fastest GPU-based supercomputers in the world: Tianhe-1A, Nebulae, and Mole-8.5. (Mole-8.5 is a smaller supercomputer with 1920 GPUs who got eclipsed by Tianhe-1A but still manages to surpass 1000 TFLOPS theoretical peak performance). After the TOP500 list is updated in November 2010, Tianhe-1A is expected to be number 1 and Nebulae is expected to be number 3.