Semi-annually, for the June (Germany) and November (USA) Supercomputing Conferences, the TOP500 list of most powerful technical computing systems is released. Naturally, vendors engage in "my dog is bigger/faster/meaner/etc. than yours" bragging to showcase what are indisputably herculean efforts in advancing the frontiers of ultimate high performance computing. This season's significant milestone is the breaking of the 1 PetaFLOPS LINPACK barrier by the Roadrunner system created by IBM for the Los Alamos National Laboratory. Capturing the crown of "world's fastest supercomputer," Roadrunner delivers twice the performance of the previous TOP500 #1, IBM BlueGene/L at Lawrence Livermore. The Roadrunner cluster offers a unique heterogeneous architecture, employing over 12,000 chips derived from those designed for Sony's PlayStation 3 along with greater than 6,000 more conventional Opteron processors.
Decades ago, the US government relied on underground testing to understand the readiness of its aging nuclear arsenal, mostly produced 30-40 years ago. After the 1992 suspension of underground testing, the US turned to demanding simulations executed on massive supercomputers. The ASCI (Advanced Strategic Computing Initiative, now a part of NNSA, National Nuclear Security Administration) funded a number of breakthrough high performance parallel systems (ASCI "Red"/"White"/"Blue"/etc.). IBM was (and remains) a pioneer in creating some of these highly specialized ASCI-class systems, recognizing that solving the challenges of the "extreme" would have "trickle down" benefits to the merely "demanding."
Perhaps not apparent, the computational demands for consumer gaming consoles stress computational capabilities somewhat similar to high performance technical computing (HPC). The Cell Broadband Engine (Cell BE) designed to portray realistic gaming graphics for Sony's PlayStation3 incorporated an underlying design that was the foundation for Roadrunner's nuclear stockpile simulation mandate. The IBM QS22 Blade incorporates the latest Cell processor, with hardware-implemented double-precision floating point, an enhanced version of the single-precision circuitry needed for PlayStation graphics. Employing two QS22 Cell blades and an LS21 AMD Opteron Blade, along with InfiniBand networking, the record-breaking Roadrunner is basically constructed from generally-available hardware (a goal of US Government ASCI/NNSA funding).
What enables generally-available hardware to achieve extraordinary performance results is the tuning/optimization of the software stack to exploit extremely parallel systems. As IBM's Dr. Don Grice explains, IBM focused on optimizing software to exploit parallelism rather than creating unique hardware.
IBM can be justifiably proud in being the first to attain PetaFLOPS computing. Roadrunner's 120 million dollar project cost reflects a scale only affordable by governments. What may be more significant is that subsets of the Roadrunner configuration may deliver breakthrough performance not just for Government and Academic environments but also for Petroleum, Finance, Aeronautics and Automotive compute-intensive applications as they learn to exploit highly parallel heterogeneous clusters.
Comments