How modern erasure codes im- prove storage systems by Western Digital. ©TechnoQWAN LLD 2016 How...

2
Commissioned by Western Digital. ©TechnoQWAN LLD 2016 How modern erasure codes im- prove storage systems Executive Summary: Storage systems are made out of unreliable components, yet they must protect data despite many possible failure modes. Erasure codes are key, and like everything else in tech they've improved dramatically in the last 20 years. Here's what you should know. Introduction In the 60s and 70s, disks were small and tape backups were a workable way to protect data. In the 80s disk mirroring was popular for overcoming disk failures. Then, in the late 80s, RAID promised protection and performance, thanks to more sophisticated Reed-Solomon erasure codes. Advances in erasure codes continued though. Today modern erasure codes offer capabilities undreamed of by RAID systems and significant business benefits. What is an erasure code? Simply put, erasure codes mathematically derive recovery information - parity data - so the original data can be recovered from a subset of the encoded data. In Reed-Solomon-based RAID arrays, they enable a drive or two to fail without data loss. While Reed-Solomon RAID was a great advance over mirrored drives, it's downsides have become pronounced as data volumes grow. Specifically: RAID requires that all drives in a stripe be the same virtual capacity, making RAID arrays inflexible. Lengthy drive reconstruction times as drive capacities grow. Failed drive reconstruction rates are growing too, due to the likelihood of unrecoverable read errors. Difficult data migrations during RAID array replacements due to much higher capacities. All erasure codes use the same basic principles, but mathematicians have kept exploring how to improve them. In the late 1990s, rateless or fountain codes were developed.

Transcript of How modern erasure codes im- prove storage systems by Western Digital. ©TechnoQWAN LLD 2016 How...

Page 1: How modern erasure codes im- prove storage systems by Western Digital. ©TechnoQWAN LLD 2016 How modern erasure codes im-prove storage systems Executive Summary: Storage systems are

Commissioned by Western Digital. ©TechnoQWAN LLD 2016

How modern erasure codes im-prove storage systemsExecutive Summary: Storage systems are made out of unreliable components, yet they must protect data despite many possible failure modes. Erasure codes are key, and like everything else in tech they've improved dramatically in the last 20 years. Here's what you should know.

Introduction In the 60s and 70s, disks were small and tape backups were a workable way to protect data. In the 80s disk mirroring was popular for overcoming disk failures. Then, in the late 80s, RAID promised protection and performance, thanks to more sophisticated Reed-Solomon erasure codes.

Advances in erasure codes continued though. Today modern erasure codes offer capabilities undreamed of by RAID systems and significant business benefits.

What is an erasure code? Simply put, erasure codes mathematically derive recovery information - parity data - so the original data can be recovered from a subset of the encoded data. In Reed-Solomon-based RAID arrays, they enable a drive or two to fail without data loss.

While Reed-Solomon RAID was a great advance over mirrored drives, it's downsides have become pronounced as data volumes grow. Specifically:

• RAID requires that all drives in a stripe be the same virtual capacity, making RAID arrays inflexible.

• Lengthy drive reconstruction times as drive capacities grow.

• Failed drive reconstruction rates are growing too, due to the likelihood of unrecoverable read errors.

• Difficult data migrations during RAID array replacements due to much higher capacities.

All erasure codes use the same basic principles, but mathematicians have kept exploring how to improve them. In the late 1990s, rateless or fountain codes were developed.

Page 2: How modern erasure codes im- prove storage systems by Western Digital. ©TechnoQWAN LLD 2016 How modern erasure codes im-prove storage systems Executive Summary: Storage systems are

Beyond RAID These rateless erasure codes removed many of the limitations of RAID-style codes, making it possible to build much more robust and flexible storage systems. Their key advantages:

• Physical drive constraints are gone. Different drive sizes can be fully utilized.

• Protection levels can be varied - unlike RAID - within the same storage system.

• Much higher protection levels than RAID offers, such as 16 nines, can be dialed in.

• During rebuilds the data is still accessible at full performance.

• Scalability is practically infinite, since volumes are not limited to drive stripes.

• Performance and durability can be tuned on the fly, instead of the fixed R-S RAID model.

Rateless erasure coding's key advantage over RAID is its flexibility - in durability, in performance, in configuration, and more - making possible storage systems that can serve for decades, not just years.

Beyond erasure codes While the underlying technology of rateless erasure codes makes many exciting capabilities possible, it is up to the system architects to embody them in products. For example, the processes that take data, split it up, encode with parity, distribute it across available storage, reconstruct and read data, and handle failures, must be carefully considered, implemented, and tested to ensure robust operation.

The hardware architecture underlying the rateless code must also consider compute loads and network bandwidth requirements, especially under degraded operation or during routine maintenance. While the coding will protect the data, it is the hardware that has to run the coding, as well as the user interface and I/O traffic.

The future The core problem of a digital civilization is persistent storage. Without it, civilization cannot long endure. Based on my research, I believe that in 10 years RAID arrays will be a niche product, largely replaced by rateless code-based storage systems, whether on-premises or in the cloud. Like drive mirroring, RAID arrays won’t disappear, but will find niches where their features remain attractive.

About The Author

Robin Harris is the president and chief analyst of TechnoQWAN LLC, publisher of StorageMojo.com. He has over 30 years experience in the IT industry in product management and marketing, business development and product planning at companies large and small. He earned degrees from Yale University and the Wharton School of the University of Pennsylvania.