IEEE’s annual HOT CHIPS conference, gives microprocessor designers an opportunity to be recognized by their peers for innovative design advances. This year’s conference included details on IBM’s POWER7, among other vendors’ server processors. The conference is targeted for those who enjoy exploring implementation details and debating design tradeoffs. Specifics on clock rates and system performance usually wait for actual server announcements.
Servers employing POWER7 are not expected to become available until sometime in 2010. Furthermore, IBM typically stages the release of various models – the relatively straight-forward architected, and higher volume, midrange and entry servers, especially those facing competitive offerings, debut the new chips first, while complex high-end systems continue exhaustive testing for a later introduction. Thus, despite the HOT CHIPS buzz, POWER7 is not imminent – various models will be not be unveiled until 2010, presumably split across the Spring and Fall announcement seasons as has been recent IBM practice.
As with other vendors, IBM faces the dilemma that championing the innovations of the next generation chip may impact sales of servers employing the current chip. So, to reassure customers planning substantial investment in current POWER6/6+ midrange and high-end servers, namely the Power 570 and Power 595, IBM offers an upgrade path that preserves system serial numbers. That is, customers can continue to procure current POWER6/6+ systems and plan to upgrade them to future POWER7 models, without losing financial accounting benefits of maintaining the same serial number.
Well over a decade ago, IBM significantly intensified its microprocessor design efforts, resulting in a series of POWER chips that were industry leading. The 45 nm POWER 7 should continue that streak. With up to eight cores per chip, compared to the dual-core POWER6, POWER7 could achieve up to four times the performance per chip, depending on the workload. To remain within the same power/thermal envelope, POWER7 clock rates are not expected to be quite as high as POWER6. Four-way Simultaneous Multithreading per core (up from two-way on POWER6), additional execution units, and the return of Out of Order execution (left out of POWER6’s high-clock rate focus) should compensate for POWER7’s slightly slower maximum clock rates. Indeed, IBM indicates that a POWER7 core will have higher performance than a POWER6 core, even at a lower clock frequency.
With eight cores per chip, the ability to feed data to the cores can become a challenge. Dual DDR3 memory controllers per chip promise 100GB/sec of sustained memory bandwidth per chip. A new “by-eight” ECC allows taking advantage of increasing memory chip density without compromising the Chipkill error correction coverage.
IBM highlights that POWER7 is the first processor chip with an internal embedded Dynamic RAM, eDRAM, cache. On-chip caches traditionally have employed Static RAM, SRAM, which is faster but consumes more power and is less dense than DRAM. POWER7 Level 1 and 2 caches remain SRAM; what’s new is a 32 MB eDRAM on-chip Level 3 cache. Prior POWER chips used a separate chip for L3 cache. Bringing L3 onto the same chip as the processor reduces latency six-fold and doubles bandwidth, according to IBM. Furthermore, moving L3 on-chip removes the power-consuming off-chip driver circuits and also obviates the need for the specialized multi-chip module previously used for many POWER models. (The ultra-dense “Blue Waters” supercomputer, being built for the National Center for Supercomputing Applications, will employ multi-chip POWER7 modules with water-cooled cold plates, in its quest to achieve one petaflop sustained performance.)
Other designs have had large on-chip caches – the Montvale Itanium implementation contains up to 24 MB of cache memory and Intel’s upcoming Nehalem-EX Xeon plans to have 24 MB on-chip L3. Per IBM, use of eDRAM requires only one third the chip space of conventional SRAM implementations and uses only one-fifth the standby power. (IBM estimated that the 1.2 B transistor POWER7 would have swollen to 2.7 B transistors if they had implemented a 32 MB SRAM L3 instead of eDRAM.) Such space and power savings permitted IBM to implement an eight-core design with large internal L3 and still stay within power envelope of prior servers.
IBM says it has enhanced various power saving modes in POWER7, refining the power-saving states of POWER6 to optimize the tradeoffs between energy saving and fast return to full performance. Clearly this is difficult to quantify the benefit. The whole industry is still learning and evolving, but IBM appears to have carefully explored the tradeoffs and this design iteration should balance performance and power-efficiency.
Until servers are released, during 2010, it will be hard to gauge POWER7’s competitive advantages. Of course, competitors will not stand still and can be expected to release their new chip implementations. Nonetheless, by all indications, POWER7 holds the promise to retain the role of chip to beat.

Comments