February 26, 2007

Simplifying High Availability With Agent-Less Appliances

Setting up High Availability (HA) clusters, in which a backup server takes over when a production system fails, has traditionally been a notoriously challenging task for system administrators. HA clustering involves automating a multi-step process: detecting when a failure occurred; restarting the workload correctly on a backup system when the failure occurs, and then restoring the workload to its original server when the failure condition has been corrected ("fail-back"). The implementation of this mechanism usually requires the installation of complex, and often expensive, HA clustering software that typically has to be tightly integrated with applications in order to properly restart them on backup systems when a failure occurred.

Now, Fujitsu Siemens Computers (FSC) has announced x10sure, a turnkey solution that dramatically simplifies the deployment of HA clustering for industry-standard servers running Windows. x10sure uses a packaged bundle of production servers plus a dedicated server to continuously monitor the uptime of the production servers or blade servers in an N+1 (and future N+n) environment. In case of failure, it transfers the workload to a backup server, which can be utilized for non-critical functions during normal operating conditions. x10sure maintains images of servers on a SAN, including all operating system and application software needed to run a workload, and remotely boots the protected servers from these images. It then continuously pings the servers that it protects, and if one fails to respond, it simply boots the workload image from that server on a standby server, remapping the Logical Unit Number (LUN) of the failed server to the spare server.

x10sure is based on a stripped-down version of Adaptive Services Control Center (ASCC), FSC’s powerful server provisioning mechanism that was originally designed to automate the dynamic provisioning  of blade servers in response to changing workloads. x10sure is somewhat limited in the classes of failures it can detect, since uptime is essentially defined as a binary condition: the server responds to pings, or it doesn’t. However, the system is appealing in its simplicity and relatively low cost, since it doesn’t require the installation and configuration of complex clustering software on each node being monitored. FSC believes that x10sure's "agent-less" design is a unique solution to the HA problem, at least among the major systems vendors.

Touting x10sure’s simplicity, FSC targets the solution at mid-size companies with business processes that are intertwined with those of larger partners, and thus require systems that are sufficiently reliable to work in tandem with enterprise-grade IT infrastructures, yet are still simple enough to be managed by moderately sophisticated IT personnel. The overused term "disruptive" is frequently invoked to describe any product that is even a little bit innovative, but in fact it has a specific definition: delivering value that appeals to a specific customer segment through a radically different approach that optimizes for simplicity and low cost rather than superior function. Whether or not FSC is truly alone in driving agent-less HA appliances into the market, it is hard to see how x10sure doesn’t fit that definition.