« Twenty Years of TPC Benchmarks | Main | Memory Price/GB Is Dropping at Around $200 per Annum »

October 17, 2008

SSD Shines New Light on HSM and ILM

The advent of Solid State Disks (SSDs) is bringing the once fading concepts of Hierarchical Storage Management (HSM) and Information Lifecycle Management (ILM) back into the limelight. HSM and ILM, two parallel concepts that revolve around placing data into tiers of storage, have been outshined by the growing industry and media focus on virtualization and Green IT. Now, the popularity of SSD technology, which promises both affordable performance and green computing, is re-emphasizing the importance of HSM and ILM. Today, the term “tier-0” has become a commonly acceptable concept in SSD implementation. However, unlike other hardware advances, such as faster interfaces and disks, the benefits of tier-0 do not automatically kick-in. Tier-0 is only effective when the right data is loaded into the SSDs – and that can be accomplished through effective HSM and ILM.

HSM is typically implemented at the storage-controller level, though some enterprise applications, especially databases, offer host-based HSM capability. An HSM solution involves placing the most frequently accessed data into low-latency tiers of storage, such as memory and SSD. (Yes, memory has been considered a tier of storage for decades in mainframe and other enterprise application environments.) Ideally, HSM solutions should be automated while providing options for manual locking. Some high-end storage controllers also provide cache-based volumes that can be synchronized with disk copies; such technology can be leveraged for SSD implementations. HSM approaches commonly use SSD as a caching device.

ILM is typically implemented with add-on data management software as opposed to being natively enabled by the storage controller or application. The general notion surrounding ILM concerns unstructured data; application-enabled ILM is generally discussed with the HSM capability of applications. By classification and policy-compliance, an ILM solution reallocates data across tiers of storage according to the data’s current value in terms of user requirements and business priorities. An effective ILM solution would be able to identify suitable data for tier-0 and adjust the data allocation in a dynamic fashion. Typically, an ILM approach uses SSD as a tier of disks, managed along with other tiers of storage.

Hence, before we get excited about adding SSD into our infrastructures, we ought to ask ourselves – do we have robust HSM/ILM capabilities in place to effectively utilize our tier-0 storage? For those that answer no, and have no idea how to employ these capabilities, the new Intel SSD implemented with Sun Solaris ZFS can offer a good example of how a host-based file system enables both HSM and ILM for tier-0. The new 32 GB Intel X-25E Extreme SATA SSD is capable of up to 35,000 IOPS random read performance and 3,300 IOPS random write performance, costs about the current price of a 500 GB SATA disk. Therefore only small-block, read-only, frequently-requested data should be loaded into these SSDs. And when write operations have to occur in tier-0, the data changes need to be flushed to disks quickly. The ZFS second level adaptive replacement cache (L2ARC) technology is designed to manage these operations, in a HSM fashion. And click here for an example of how to employ ZFS for ILM.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451863e69e2010535943425970c

Listed below are links to weblogs that reference SSD Shines New Light on HSM and ILM:

Comments

HSM was a concept of traditional enterprise application environments that in fact treated memory and disks as a hybrid storage pool; memory was called the “main storage”. So the concept of hybrid storage pool is not new with ZFS; what is new would be the economics of ZFS-based solutions when offering this kind of enterprise capability that used to be cost-prohibitive to most mid-size organizations!

Using Hybrid storage pool is different than employing an HSM or ILM strategy. All file systems use memory to cache frequently accessed data for rapid access and improved performance. Combining SSDs (essentially memory) with traditional spinning disks in a single hybrid pool is cool because ZFS has a pooled storage concept. It can add more SSDs to the pool to address the need to add more memory as the file system grows and more frequently accessed data grows, but only from a single ZPool. Other file systems like QFS that have the concept of metadata separation to improve performance writing can also take advantage of SSDs by improving MetaData I/O for applications that have lots of Metadata operations. ZFS doesn't have this concept of meta data separation so it was enhanced to take advantage SSDs within a hybrid storage pool. QFS doesn't have the concept of pooled storage though so Tier 0 would be all SSDs or all disk. One of the ways file systems can increase performance is a concept called metadata separation.

The concept of data movement across tiers (one of the main concepts in an HSM/ILM strategy) is different with ZFS. ZFS enhancements are taking advantage of SSDs and HDs to speed performance reading and writing from a single Hybrid pool that can't be migrated to other tiers including tape.

And, this was pulled from a white paper on SSDs and ZFS (distributed at SNW this week) that may provide a better explanation....

"Systems use memory to cache frequently accessed data for rapid access and improved performance. Once data is stored in the cache, future requests can be satisfied quickly by accessing the cached copy rather than fetching it from disk. Policies determine which data is held in the cache in an attempt to anticipate future needs. However, large working sets that cannot fit into memory can cause the cache to be ineffective. Flash storage can be used to enhance caching operations in systems. Solaris ZFS combines main memory and enterprise SSDs into a large read cache and uses an Adaptive Replacement Cache (ARC) for its cache replacement algorithm. The ARC manages and balances the cache content using most frequently used (MFU) and most recently used (MRU) algorithms for storing data to, and retrieving data from, memory. A second-level ARC (L2ARC 4) with smart caching and pre-fetching techniques lets Solaris ZFS use enterprise SSDs as a second-level cache to further speed read performance. Defective Flash blocks are treated as a cache miss rather than data loss, with information retrieved from hard disk to satisfy the request. The checksums built into Solaris ZFS are used to catch cache inconsistencies."

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.