Here is an insightful technical post on the new SPARC T3 processor from Joerg Moellenkamp (16 cores and 128 threads on 1 socket). Oracle just announced 1-, 2-, and 4-socket systems built on this processor, which gives up to 512 threads per system. I remember Oracle/Sun planning months ago 8-socket T3 systems, so I presume such beasts will be announced later.
I find it interesting that both Oracle, with the T3 processor, and AMD, with the upcoming Bulldozer processor, adopted similar designs.
A Bulldozer module, as AMD calls it, consists of two integer units, and one floating point unit (see picture). AMD sometimes labels these units as "cores" but this nomenclature is confusing. Instead a whole Bulldozer module should be seen as a 1-core 2-thread piece of x86-64 technology. When one of the threads executes integer-only instructions, the second integer unit effectively doubles the performance compared to a classic 1-core 2-thread design like Intel's SMT technology (a Nehalem core only has one integer unit).
A SPARC T3 core, like the previous generation UltraSPARC T2/T2+, also has two integer units, and one floating point unit. However a T3 core exposes 8 threads to the OS.
So, effectively a Bulldozer module (2 integer units, 1 floating point unit, 2 threads) is microarchitecturally equivalent to a T3 core (2 integer units, 1 floating point unit, 8 threads). This is the interesting story of Bulldozer: AMD finally adopted SMT, but a beefed-up version of it where not 1, but 2, integer units are present in a "core" to counterbalance the increased number of threads exposed by it. Of course no one in the technology press picked this up and reported it this way, because AMD is using words carefully to market a Bulldozer module as "2 cores not supporting SMT" as it sounds better than "1 core supporting a better version of SMT".
Now, I think a smart move for AMD would be to expose an even higher number of threads per Bulldozer module, as it could be relatively inexpensive to implement in terms of die area (Oracle showed they could expose 8 threads without too much difficulty). For example if 4 threads were exposed, the ratio of threads per integer units would be the same as Intel Nehalem. Who knows, perhaps AMD will do it in future revisions of Bulldozer?