July 16, 2008

What's Better, 256 or 64 Processor Cores?

Fujitsu and Sun have celebrated the arrival of their new SPARC64 VII processor with a “world record” benchmark result. The achievement came in the form of a SAP SD 2–tier result, which is the highest ranking result to date when counted by number of users, as featured on the IDEAS Benchmark Gateway.

In doing so, Sun and Fujitsu relegated the previous #1 result to second place, the displaced server being IBM’s Unix flagship, the Power 595. With a result of 39,100 users, the SPARC Enterprise M9000 performed around 10% better than the IBM mark of 35,400 users.

SAP 450W

The M9000 achieved this feat with the use of 64 quad core SPARC64 VII processors running at 2.52GHz, for a total of 256 processor cores. By contrast the IBM 595 used 32 dual core 5GHz POWER6 processors for a total of 64 cores.

This raises an interesting question – what’s better for you, the greater scalability of the M9000 or the higher performance per core of the 595? In other words, for comparable performance does it matter how many cores you use? Every situation is different, and your answer may depend on the workload you’re running, which vendor can cut you the best deal and of course the all important power bill.

We would love to hear from anyone who has experiences or opinions to share on this topic. Use the comments link below to tell us what you think.

June 10, 2008

Roadrunner Breaks the PetaFLOPS Barrier

Semi-annually, for the June (Germany) and November (USA) Supercomputing Conferences, the TOP500 list of most powerful technical computing systems is released. Naturally, vendors engage in "my dog is bigger/faster/meaner/etc. than yours" bragging to showcase what are indisputably herculean efforts in advancing the frontiers of ultimate high performance computing. This season's significant milestone is the breaking of the 1 PetaFLOPS LINPACK barrier by the Roadrunner system created by IBM for the Los Alamos National Laboratory. Capturing the crown of "world's fastest supercomputer," Roadrunner delivers twice the performance of the previous TOP500 #1, IBM BlueGene/L at Lawrence Livermore. The Roadrunner cluster offers a unique heterogeneous architecture, employing over 12,000 chips derived from those designed for Sony's PlayStation 3 along with greater than 6,000 more conventional Opteron processors.  
Decades ago, the US government relied on underground testing to understand the readiness of its aging nuclear arsenal, mostly produced 30-40 years ago. After the 1992 suspension of underground testing, the US turned to demanding simulations executed on massive supercomputers. The ASCI (Advanced Strategic Computing Initiative, now a part of NNSA, National Nuclear Security Administration) funded a number of breakthrough high performance parallel systems (ASCI "Red"/"White"/"Blue"/etc.). IBM was (and remains) a pioneer in creating some of these highly specialized ASCI-class systems, recognizing that solving the challenges of the "extreme" would have "trickle down" benefits to the merely "demanding."
Perhaps not apparent, the computational demands for consumer gaming consoles stress computational capabilities somewhat similar to high performance technical computing (HPC). The Cell Broadband Engine (Cell BE) designed to portray realistic gaming graphics for Sony's PlayStation3 incorporated an underlying design that was the foundation for Roadrunner's nuclear stockpile simulation mandate. The IBM QS22 Blade incorporates the latest Cell processor, with hardware-implemented double-precision floating point, an enhanced version of the single-precision circuitry needed for PlayStation graphics. Employing two QS22 Cell blades and an LS21 AMD Opteron Blade, along with InfiniBand networking, the record-breaking Roadrunner is basically constructed from generally-available hardware (a goal of US Government ASCI/NNSA funding).   
What enables generally-available hardware to achieve extraordinary performance results is the tuning/optimization of the software stack to exploit extremely parallel systems. As IBM's Dr. Don Grice explains, IBM focused on optimizing software to exploit parallelism rather than creating unique hardware.
IBM can be justifiably proud in being the first to attain PetaFLOPS computing. Roadrunner's 120 million dollar project cost reflects a scale only affordable by governments. What may be more significant is that subsets of the Roadrunner configuration may deliver breakthrough performance not just for Government and Academic environments but also for Petroleum, Finance, Aeronautics and Automotive compute-intensive applications as they learn to exploit highly parallel heterogeneous clusters.   

June 03, 2008

New UltraSPARC T2 Blades for IBM's BladeCenter?

Opening the specification for a product can do wonderful things for the product and its customers. Other companies can use the specification to build products and services that support and compliment the original product. The net result is everyone benefits. Customers get more choice and competition, the original company expands the ecosystem of supporting products without having to invest R&D dollars, and the third-party companies receive another outlet for their products and services. Everyone is happy. But opening the specification has its risks as well. An open specification means the vendor no longer has control over which products get developed and the vendor’s "master product plan" may take a few detours. That is exactly what happened with the Themis T2BC blade from Themis Computer in Fremont, California.

Themis responded to a DoD requirement for a blade that would run Solaris applications natively on SPARC. The customer wanted the blades to run in existing IBM BladeCenter T chassis so they could consolidate a number of older SPARC servers onto blades. They were already using IBM’s BladeCenter in the program. Fortunately, BladeCenter was available since IBM has an open specification that encourages anyone to download the spec and build products for the BladeCenter Ecosystem. The "Niagara 2" UltraSPARC T2 processors were also available since Sun Microelectronics was actively seeking OEM partners for the new processor. According to Themis, the UltraSPARC T2 processor was selected over the UltraSPARC T1 because it has a more balanced floating-point performance. Another benefit of the T2 is that it has eight SPARC cores that can run the older Solaris applications in a native SPARC environment. When Sun’s LDOMs and Solaris Containers are used, the architecture becomes a compelling consolidation platform for older Solaris applications that cannot be ported to run natively on Solaris on x86. Many of these applications were running on Solaris 8 and the Solaris Migration Assistant provides an environment in which those Solaris 8 applications can run.

While developing the T2BC, Themis began to wonder if there might be a commercial market for such a blade. Sun recently launched the Sun Blade 6000 and the UltraSPARC T2-based Sun Blade T6320 and those products are the perfect solution for the majority of those who need to run on native SPARC blades. IBM BladeCenter already runs Solaris applications on its Xeon and Opteron blades, but the Themis T2BC meets the needs of those applications that must run on native SPARC. However launching such a product commercially requires commitments from all three vendors. Sun needs to ensure that its Solaris Operating Environment and management tools work properly on the Themis T2BC, and IBM needs to ensure that its Director management software recognizes and manages the blade. Without support from both companies, this product has limited chances of success.

IDEAS had the opportunity to talk with senior executives at Themis, Sun, and IBM. Frankly, we were expecting some serious spin control from Sun and IBM as they attempt to position the T2BC blade into an ever smaller market niche so that they can protect their own product turf. Even though Sun and IBM will not be selling the blade directly, both vendors surprisingly reinforced their decision to partner with Themis. Sun was overwhelmingly positive about the commercialization of the T2BC, and IBM stated that its Global Services group welcomes the opportunity to support the new blades should customers prefer IBM support over support from Themis. Even thought this blade server goes against each vendor’s blade strategy, IBM and Sun are cooperating nicely to help make the Themis T2BC a success. The CEO of Themis summed it up perfectly. "This blade isn't about microprocessor architecture or operating system wars, but rather about enabling Solaris applications to run natively within a BladeCenter Ecosystem.  We see this product expanding markets for both IBM and Sun technology." We could not have said it better. IDEAS feels the Themis T2BC is headed for success, and we give this new level of cooperation and partnership a strong ‘thumbs up.’

April 29, 2008

Sun Ray 2 is Worthy of a Closer Look

In the last few years, Sun has thoroughly redesigned the Sun Ray appliance to compete in the rapidly growing market for thin client desktop infrastructure. We recently had the opportunity to try out the new Sun Ray 2 in person at the Sun Executive Briefing Center in Menlo Park, CA and we were impressed. Sun has taken a product that has been around a long time and re-engineered it with all of the features customers expect today, plus some interesting capabilities that set it apart from the competition. Today’s Sun Ray 2 thin clients running Sun Ray Software 4 feature support for a wide variety of clients, enhanced security, and best of all the ability to runs Windows, Linux, Solaris, an Mac OS (or all four at once) equally well. The market has responded to these changes by buying Sun Rays at an ever increasing clip. According to Sun, sales have doubled from fiscal year 2006 to 2007.

One of the more interesting features of Sun Ray is its smart card authentication. Each desktop user in the organization has a credit card with an embedded Java chip that slides into the attached card reader. Users can show up at any Sun Ray in the company, log into their own account, and up will come their own desktop. When the card is removed without logging out, the desktop session remains suspended on the server until the card is inserted into another Sun Ray. Even days later from halfway around the world, the original desktop reappears, complete with all of the original applications. The Sun Secure Global Desktop Software provides this "Hot Desking" capability and also provides the ability to run multiple windows, each with a different application or operating system. For example, a Solaris 10, Windows, and Linux window can be open along with numerous browsers and desktop applications.

Management of thin clients has become increasingly important as IT departments deal with ever increasing complexity. Sun’s Desktop Manager 1.0 features an easy to use point-and-click web-based interface that makes centrally defining and configuring hundreds or thousands of desktops and their applications relatively quick and easy. The Sun Desktop Manager has three major components: configuration repositories that store configuration policies and organizational structure, management tools that help to enforce configuration policies, and agents residing on the client and fetch the configuration settings and apply them. It also features lockdown capabilities that prevent unauthorized people from making changes to configuration values or gaining access to unauthorized applications. Granted, this is a v1.0 product, but it seems to focus on what is truly needed in this space.

Overall, we are impressed with the improvements Sun has made to the Sun Ray. The current version is such a vast improvement over past iterations, it hardly seems fitting to continue calling it a "Sun Ray." Obviously, the entire thin client industry has improved tremendously in the last few years and there are now a number of excellent products from which to choose. But we feel Sun Ray 2 deserves special consideration in light of its smart card authentication and ease of management. If you are considering thin clients this year, and who is not, then you owe it to yourself to go down and try out the Sun Ray in person.

April 23, 2008

IBM Racks Up x86 Servers by the Boatload

The IT industry continues to trend towards hosting ever more massive workloads with scale-out architectures, in which large numbers of industry-standard servers containing x86/x64 processors are joined into clusters or grids. Much of the attention in recent years has been on scaling out with blade servers, which allow large numbers of servers contained in specialized modules to be deployed and managed in an optimal hardware footprint. Now, IBM has introduced a new server design, called the iDataPlex, which introduces a new level of density for cramming large numbers of processors into a small amount of space, using traditional rack-mounted servers rather than blades.

The iDataPlex concept resulted from a number of conversations IBM executives and engineers had with major web-based businesses and leading-edge financial services organizations. All of the major server vendors, including IBM, were pushing blade servers to these customers. But blades were too expensive for these massive scale-out deployments and traditional rack servers were not sufficiently customizable and lacked the required density. As a result, most of these customers ended up building their own server complexes simply because nothing on the market met their needs.

In designing the iDataPlex, IBM took a "clean sheet" approach that was inspired by blades, but uses the traditional rackmounted form factor. Server vendors like HP are pushing blades everywhere, and other companies like Dell and Sun are pushing a combination of rack and bladed servers. But no one had seriously looked at rack servers in years. IBM chose to use the same 42U standard rack envelope, but turn it sideways and shorten the servers themselves to 15 inches. That allowed up to 42 2U servers or 28 3U servers to be positioned into the same space that held half that many in a traditional rack design. A maximum of 672 processor cores per rack doubles the conventional rack density. Due to cost considerations, the iDataPlex has no backplane and relies on cables to carry all of the I/O. Each iDataPlex is entirely built in China in an IBM facility. Like mainframes, every system is custom built. It takes 3-4 months to sell and IBM expects that 70% of deployments will be in a new or redesigned datacenter.

For cooling, there are no fans attached to the processor boards themselves, only in the 2U/3U chassis, each of which contains 4 large fans. Further, IBM offers an optional rear-door heat exchanger for the iDataPlex that uses water cooling. The rack can be cooled sufficiently without the heat exchanger, which costs $75-100,000. However, when the water cooling is used, the demands on Computer Room Air Conditioning (CRAC) devices can be lowered or even eliminated in many cases. The result is a staggering 40% reduction in datacenter cooling costs today, and IBM is working to push that to 60% in the future.

Both rack servers and blade servers were designed back during the time when performance and performance density were key requirements. Reliability was also paramount and numerous redundancies and failover capabilities were added to ensure that applications would remain available even after a major server or component failure. The problem is all of the added redundancies pushed up the cost. With cloud computing, the old requirements have changed. Component reliability is no longer needed because the application itself can deal with failures. With tens of thousands of servers, failures occur all the time and bad servers can be swapped out on an hourly or daily basis. The cost of computing, and specifically the operating cost, is now the defining criterion. Anything that can save on the cost of electricity is a huge plus. IBM realized that this new market is now large enough to support a specialized server design that prioritizes low energy consumption and density over redundancies. Today, there are few customers demanding this type of solution, but the ones that do exist buy tens of thousands of servers at a time for deployments that can reach 100,000 servers or more. We at IDEAS feel that iDataPlex is a strong solution for this new class of customers.