June 10, 2008

Roadrunner Breaks the PetaFLOPS Barrier

Semi-annually, for the June (Germany) and November (USA) Supercomputing Conferences, the TOP500 list of most powerful technical computing systems is released. Naturally, vendors engage in "my dog is bigger/faster/meaner/etc. than yours" bragging to showcase what are indisputably herculean efforts in advancing the frontiers of ultimate high performance computing. This season's significant milestone is the breaking of the 1 PetaFLOPS LINPACK barrier by the Roadrunner system created by IBM for the Los Alamos National Laboratory. Capturing the crown of "world's fastest supercomputer," Roadrunner delivers twice the performance of the previous TOP500 #1, IBM BlueGene/L at Lawrence Livermore. The Roadrunner cluster offers a unique heterogeneous architecture, employing over 12,000 chips derived from those designed for Sony's PlayStation 3 along with greater than 6,000 more conventional Opteron processors.  
Decades ago, the US government relied on underground testing to understand the readiness of its aging nuclear arsenal, mostly produced 30-40 years ago. After the 1992 suspension of underground testing, the US turned to demanding simulations executed on massive supercomputers. The ASCI (Advanced Strategic Computing Initiative, now a part of NNSA, National Nuclear Security Administration) funded a number of breakthrough high performance parallel systems (ASCI "Red"/"White"/"Blue"/etc.). IBM was (and remains) a pioneer in creating some of these highly specialized ASCI-class systems, recognizing that solving the challenges of the "extreme" would have "trickle down" benefits to the merely "demanding."
Perhaps not apparent, the computational demands for consumer gaming consoles stress computational capabilities somewhat similar to high performance technical computing (HPC). The Cell Broadband Engine (Cell BE) designed to portray realistic gaming graphics for Sony's PlayStation3 incorporated an underlying design that was the foundation for Roadrunner's nuclear stockpile simulation mandate. The IBM QS22 Blade incorporates the latest Cell processor, with hardware-implemented double-precision floating point, an enhanced version of the single-precision circuitry needed for PlayStation graphics. Employing two QS22 Cell blades and an LS21 AMD Opteron Blade, along with InfiniBand networking, the record-breaking Roadrunner is basically constructed from generally-available hardware (a goal of US Government ASCI/NNSA funding).   
What enables generally-available hardware to achieve extraordinary performance results is the tuning/optimization of the software stack to exploit extremely parallel systems. As IBM's Dr. Don Grice explains, IBM focused on optimizing software to exploit parallelism rather than creating unique hardware.
IBM can be justifiably proud in being the first to attain PetaFLOPS computing. Roadrunner's 120 million dollar project cost reflects a scale only affordable by governments. What may be more significant is that subsets of the Roadrunner configuration may deliver breakthrough performance not just for Government and Academic environments but also for Petroleum, Finance, Aeronautics and Automotive compute-intensive applications as they learn to exploit highly parallel heterogeneous clusters.   

June 03, 2008

New UltraSPARC T2 Blades for IBM's BladeCenter?

Opening the specification for a product can do wonderful things for the product and its customers. Other companies can use the specification to build products and services that support and compliment the original product. The net result is everyone benefits. Customers get more choice and competition, the original company expands the ecosystem of supporting products without having to invest R&D dollars, and the third-party companies receive another outlet for their products and services. Everyone is happy. But opening the specification has its risks as well. An open specification means the vendor no longer has control over which products get developed and the vendor’s "master product plan" may take a few detours. That is exactly what happened with the Themis T2BC blade from Themis Computer in Fremont, California.

Themis responded to a DoD requirement for a blade that would run Solaris applications natively on SPARC. The customer wanted the blades to run in existing IBM BladeCenter T chassis so they could consolidate a number of older SPARC servers onto blades. They were already using IBM’s BladeCenter in the program. Fortunately, BladeCenter was available since IBM has an open specification that encourages anyone to download the spec and build products for the BladeCenter Ecosystem. The "Niagara 2" UltraSPARC T2 processors were also available since Sun Microelectronics was actively seeking OEM partners for the new processor. According to Themis, the UltraSPARC T2 processor was selected over the UltraSPARC T1 because it has a more balanced floating-point performance. Another benefit of the T2 is that it has eight SPARC cores that can run the older Solaris applications in a native SPARC environment. When Sun’s LDOMs and Solaris Containers are used, the architecture becomes a compelling consolidation platform for older Solaris applications that cannot be ported to run natively on Solaris on x86. Many of these applications were running on Solaris 8 and the Solaris Migration Assistant provides an environment in which those Solaris 8 applications can run.

While developing the T2BC, Themis began to wonder if there might be a commercial market for such a blade. Sun recently launched the Sun Blade 6000 and the UltraSPARC T2-based Sun Blade T6320 and those products are the perfect solution for the majority of those who need to run on native SPARC blades. IBM BladeCenter already runs Solaris applications on its Xeon and Opteron blades, but the Themis T2BC meets the needs of those applications that must run on native SPARC. However launching such a product commercially requires commitments from all three vendors. Sun needs to ensure that its Solaris Operating Environment and management tools work properly on the Themis T2BC, and IBM needs to ensure that its Director management software recognizes and manages the blade. Without support from both companies, this product has limited chances of success.

IDEAS had the opportunity to talk with senior executives at Themis, Sun, and IBM. Frankly, we were expecting some serious spin control from Sun and IBM as they attempt to position the T2BC blade into an ever smaller market niche so that they can protect their own product turf. Even though Sun and IBM will not be selling the blade directly, both vendors surprisingly reinforced their decision to partner with Themis. Sun was overwhelmingly positive about the commercialization of the T2BC, and IBM stated that its Global Services group welcomes the opportunity to support the new blades should customers prefer IBM support over support from Themis. Even thought this blade server goes against each vendor’s blade strategy, IBM and Sun are cooperating nicely to help make the Themis T2BC a success. The CEO of Themis summed it up perfectly. "This blade isn't about microprocessor architecture or operating system wars, but rather about enabling Solaris applications to run natively within a BladeCenter Ecosystem.  We see this product expanding markets for both IBM and Sun technology." We could not have said it better. IDEAS feels the Themis T2BC is headed for success, and we give this new level of cooperation and partnership a strong ‘thumbs up.’

May 12, 2008

Sun's Two-Tier OS Support Targets Next Wave of IT Infrastructure

Much of the discussion around Sun's OpenSolaris operating system has focused on comparing it with Linux in terms of open source development communities and processes. Indeed, the relationship between OpenSolaris and Solaris somewhat parallels the segmentation of leading Linux distributions into "development" and "production" releases, i.e. Red Hat Fedora and Red Hat Enterprise Linux, and Novell’s OpenSUSE and SUSE Linux Enterprise Server. But the key development of this announcement is Sun’s introduction of robust support offerings for its dynamic, rapidly evolving operating environment. Neither Red Hat nor Novell offer formal support for Fedora or OpenSUSE (although Novell does provide free installation support for OpenSUSE). Thus, Sun has broken new ground in the open source OS business model by arriving at a solution that envisions the long-term needs of web-centric enterprises.

It is clear that much of the growth in the market is increasingly being driven by a new class of customer with a set of needs that diverge considerably from those of traditional commercial server environments. Users in this class consider massive levels of scale-out computing to be the normal way to grow capacity, and they rely heavily on proprietary software that is developed internally, which represents the “secret sauce” of their operations. Currently, the leading-edge proponents of this approach are predominantly in high-performance computing (HPC) environments; entertainment and media companies delivering content over the web; firms with web 2.0 business models; and some financial services organizations. Over time, though, their style of computing could impact a variety of organizations, as the lines between software and services start to blur, and end-users become more comfortable relying on remote computing resources that are accessed over the web.

As this segment of the market matures and gears up for operations, its customers are introducing new requirements for how systems should be designed in order to best meet their needs, some of which could send vendors back to the drawing board (as shown by IBM’s introduction of its iDataPlex architecture). Among other demands, these customers are creating a new set of rules for operating systems that will play a central role in their infrastructure. To be relevant in the rising wave of web-based computing, an OS should have the following characteristics:

  • Support for industry-standard hardware based on x86/x64 processors, which are taking on ever more demanding workloads as a result of their continuously growing performance;
  • Integrated support for virtualization, which is becoming a standard part of IT infrastructures both in hardware and software;
  • Support for cloud/grid/utility computing, i.e. the ability for organizations to achieve optimal utilization of computing resources, regardless of their source or physical location;
  • Real-time computing capabilities, in which services are performed within a guaranteed time span;
  • Open source business models, whereby the OS supplier offers value based on service and support, rather than licensing;
  • Choice of release streams, including a relatively fluid version with frequent updates and rapid functional improvements, as well as a more stable version suitable for critical production workloads.

Sun has addressed this last requirement with the first commercial release of OpenSolaris. While OpenSolaris has been available for some time, Sun announced that it will now provide support for the binary distribution of the OpenSolaris code. As a result, Sun now effectively offers its customers a choice of two supported operating systems: Solaris and OpenSolaris. Sun positions Solaris for “high speed developers and development teams”, as well as users who treat the operating system as a source of competitive advantage for applications such as high performance computing and social networking. By contrast, the traditional version of Solaris is intended for IT departments that value stability over rapid innovation.

At the front line of the largest web-based applications, resiliency is typically built into the user's software, so that few reliability features are needed at the level of individual servers. These applications are usually designed to continue processing in the event that servers fail due to outages in hardware or OS software. Thus, Sun’s offering of support for OpenSolaris may not be relevant for many of the cutting-edge web applications that might be deployed on it today. However, today’s web 2.0 workloads are tomorrow’s enterprise workloads representing the corporate backbone. As individual nodes in the cloud are used to host ever more critical workloads that are sensitive to critical points of failure at the level of individual systems, and thus require higher levels of uptime in the OS, OpenSolaris users may start to take Sun’s support offerings more seriously.

Sun offers two levels of support for OpenSolaris: "Essential" and "Production". While the pricing of these offerings has not yet been announced, based on their description, the Production level of support for OpenSolaris appears to be nearly identical to the "Premium" level of support for Solaris (except OpenSolaris Production promises one hour response time for Priority 1 calls, as opposed to immediate transfer for Solaris Premium). Therefore, the pricing of OpenSolaris Production compared to Solaris Premium will determine whether it makes more sense to deploy Solaris (which is free to deploy as well) or OpenSolaris. Compatibility between OpenSolaris and Solaris will also be a critical factor in determining which path customers take as they upgrade, thus controlling the flow of users from OpenSolaris to classic Solaris.

The ability for Sun to drive Solaris into IT infrastructures that are central to customers’ businesses remains a key milestone on its path to long-term success. Although Sun is aggressively optimizing its systems for web-based workloads, software serves as a much "stickier" bond with customers than most hardware, which can be swapped out relatively easy. As Scott McNealy was fond of saying, “users date their hardware, but marry their operating systems”. In the bigger picture, as more and more new businesses are seeded with infrastructures that are based on the web and scaling out, a requirement is emerging for new class of OS with a specific set of attributes, only one of which is an open source development model. Solaris and the leading Linux distributions are all well positioned to deliver these attributes, but Sun’s innovation in the business of operating systems could allow it to leap ahead of others in defining how this critical software component becomes integrated to customers’ organizations.

May 06, 2008

Not Extreme Storage, but Beyond Storage

HP just announced a new solution offering, the HP StorageWorks 9100 Extreme Data Storage System (ExDS9100), positioned as an ideal platform to deploy streaming media applications. But don’t let the name fool you – the new HP system is much more than another storage array.

The HP ExDS9100 offers rack-mounted, factory-integrated hardware, including an HP ProLiant c7000 BladeSystem with up to 16 blades, as well as HP StorageWorks disk controllers and drive enclosures supporting up to 820 disks (820 TB of raw storage capacity with 1 TB disks). These hardware components utilize the latest industry-standard technologies to provide high-density computing power and storage resources in an environmentally friendly fashion. Although the HP hardware is impressive, it doesn’t make the new HP offering uniquely distinctive. And a near-petabyte storage platform is quite scalable, but not so extreme today.

The secret-sauce of the HP Extreme Data Storage System is the PolyServe Matrix Server technology, which HP now owns through its 2007 acquisition of PolyServe. A key component of this technology is the HP Clustered File System (previously known as the PolyServe File System). The Matrix Server architecture uses the clustered file system to provide high-speed data I/O to clustered applications up to 16 nodes. One well-known implementation of this cluster technology is scalable NAS file serving, using the cluster as a NAS server to transmit data via network file protocols (such as NFS and CIFS) to other application servers or network clients. This implementation – a NFS file serving solution – is available from HP as the StorageWorks Enterprise File Services Clustered Gateway. However, the performance advantage of this architecture is fully unleashed when the applications, such as databases or streaming media, are running on the clustered servers and transmit data via direct I/O to the file system (without the overhead of network file protocols) – as implemented in the Extreme Data Storage System.

The product concept of Extreme Data Storage capitalizes on HP’s extensive experiences in providing scalable storage solutions for a variety of customers using HP PolyServe coupled with HP storage arrays. The new HP ExDS solution is designed to offer customers optimal scalability, density, ease-of-use, and affordability for their digital media and Web 2.0 deployments. ExDS in fact offers more than just a high-density storage platform; it provides a clustered application platform that is optimized for high-speed data I/O. The HP ExDS is a great example of the value that system vendors can bring to the storage market beyond innovations to the storage systems alone. It provides customers with a complete, integrated, and fully supported system solution that reduces the total cost of storing and accessing business data.

May 05, 2008

HP Adaptable Sustainability ?

HP in Australia recently held its  second annual Technology@Work end-user conference in Sydney; this year a (regional) analyst conference was run side by side with the end-user event. 

Paul Brandling, Vice President and Managing Director for HP South Pacific, gave the welcoming speech (see video), and introduced the event tag line of “Alternative thinking about business, technology and sustainability”, which featured all hp logo’s in a wonderfully environmental fluorescent lime green. During his keynote Mr. Brandling made the point that, “Globally over 50% of large enterprises will face data center floor space shortages in the next five years. Forcing many to relocate, our outsource, some of their applications.”

Dr David Morgan (a director of BHP, ex-CEO of Westpac, and recently co-chaired the “Future of the Australian Economy” stream at the 2020 summit) , gave an excellent keynote.  A video is to be put on the TAW08 website in the next few days, and is recommended. Major themes here were adaptability in the future planning for the Australia Economy, as well as regular references to the need for more federal legislation to replace up to eight state/regional legislative frameworks.

In the follow on meetings HP played its sustainability card. Highlighting a recent reduction from 85 worldwide data centers to now three pairs of two datacenters for the company. The associated power savings were enough to run the US city of Palo Alto (approx. same size as Darwin, Australia), whilst at the same time the company more than doubled its processing capability and storage capacity. 

It seems “IT as a utility” is being finally realized, at least within the top of the Fortune 5000 set (cf. HP, Westpac, and BT recently). A fundamental driver has to be a review of the cost of worldwide IP communications for these multinational companies.

Some in the industry remain skeptical as to the real value of ‘Green IT’ as an initiative. So it was interesting to note some of the examples to support the Green IT move. It was quoted that information computer technology (ICT) is estimated to be responsible for 4% worldwide carbon emissions (2% client products [e.g.: PC’s etc], 2% Data Centers). This is in contrast to aviation as a whole, which is only estimated to be responsible for 2% of emissions. 

The ENERGY STAR  label is now on major appliances such as office equipment, lighting, home electronics, and more. Computer equipment included in the ENERGY STAR program today includes desktop, notebook, tablet and workstations.  ENERGY STAR ratings for servers are coming and expected within 18 months. Whilst an ENERGY STAR rating for complete Data Centers is now on the drawing board.

So just how meaningful are these ENERGY STAR ratings ? The US Federal Government currently spends approximately $300M USD pa on energy costs. It is expected that a move from ENERGY STAR 3 to ENERGY STAR 4 equipment, would result in savings of $82M per annum. Now, that’s meaningful in both an environmental and economic context.