Adapting RAID Methods for Use in Object Storage Systems (original) (raw)
Related papers
Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput
1990
An analysis of mirrored discs and of RAID5 shows that mirrors have considerably better throughput, measured as requests/second on random requests of arbitrary size (up to 1MB). Mirrors have comparable or better response time for requests of reasonable size (less than 100KB). But mirrors have a 100% storage penalty: storing the data twice. Parity striping is a data layout that stripes the parity across the discs, but does not stripe the data. Parity striping has throughput almost as good as mirrors, and has cost/GB comparable to RAID5 designs -combing the advantages of both for high-traffic disc resident data. Parity striping has additional fault containment and software benefits as well. Parity striping sacrifices the high data transfer rates of RAID designs for high throughput. It is argued that response time and throughput are preferable performance metrics. Outline Introduction Why Striping and RAID Are Inappropriate for OLTP Systems Parity Striping: Cheap Reliable Storage Plus H...
Parity Redundancy Strategies in a Large Scale Distributed Storage System
Mss, 2004
With the deployment of larger and larger distributed storage systems, data reliability becomes more and more of a concern. In particular, redundancy techniques that may have been appropriate in small-scale storage systems and disk arrays may not be sufficient when applied to larger scale systems. We propose a new mechanism called delayed parity generation with active data replication (DPGADR) to maintain high reliability in a large scale distributed storage system without sacrificing fault-free performance.
Archival data storage systems contain data that must be preserved over long periods of time but which are often unlikely to be accessed during their lifetime. The best strategy for such systems is to keep their disks powered-off unless they have to be powered up to access their contents, to reconstruct lost data, or to perform other disk maintenance tasks. Of all such tasks, reconstructing data after a disk failure is the one that is likely to have the highest energy footprint and the most impact on the overall power consumption of the array, because it typically involves powering up all the disks belonging to the same reliability stripe as the failed disk and keeping them running for considerable time at each occurrence. We investigate two two-failure tolerant disk layouts that have lower parity overhead than the number of disks read (and hence powered-on) for recovering data on lost drives would suggest. Our first organization is a flat XOR code that organizes the data disks into a rectangle with fewer rows than columns, and adds a simple parity disk to each row and column. Recovery from a disk failure proceeds by preferring columns when reconstructing lost data, and thereby has fewer reads than the parity overhead would normally suggest. Our second layout is based on the most basic pyramid code. We can view this layout as an example RAID Level 6 variant. In this variant, a stripe has a Q-parity calculated from the data disks in the stripe, but the data disks are also organized into smaller groups where each group has a separate P-parity calculated as the exclusive-or of the data disks in the group. We compare the two layouts by measuring their robustness to data loss, their one-year survival rate, and the expected number of number of disks that must be involved to recover from both single and multiple disk failures. Our results show that rectangular layouts are significantly more reliable than layouts based on the most basic Pyramid codes, but that they also require more disk accesses to recover from disk failures.
Parity striping of disc arrays: low-cost reliable storage with acceptable throughput
Proceedings of the Sixteenth International Conference on Very Large Databases, 1990
~n analysis of mirrored discs and of RAID5 shows that mirrors have considerably better throughput, measured as requests/second, on random requests of arbitrary size (up to 1MB). Mirrors have comparable or better response time for requests of reasonable size (less than looKB). But mirrors have a 100% storage penalty: storing the data twice. Parity striping is a data layout that stripes the parity across the discs, but does not stripe the data. Parity striping has throughput almost as good as mirrors, and has costlGB comparable to RAIDS designs-combing the advantages of both for high-traffic disc resident data. Parity striping has additional fault containment and software benefits as well. Parity striping sacrifices the high data transfer rates of RAID designs for high throughput. It is argued that response time and throughput are preferable performance metrics.
Using Shared Parity Disks to Improve the Reliability of RAID Arrays
We propose to increase the reliability of RAID level 5 arrays used for storing archival data. First, we identify groups of two or three identical RAID arrays. Second, we add to each group a shared parity disk containing the diagonal parities of their arrays. We show that the new organization can tolerate all double disk failures and between 75 and 89 percent of triple disk failures without incurring any data loss. As a result, the additional parity disk increases the mean time to data loss of the arrays in the group it protects by at least 14,000 percent.
2021
In recent years, there has been a demand for using inexpensive general-purpose server-based storage in a wide range of applications. To use server-based storage in legacy applications such as databases and virtual desktop infrastructure where random access is dominant, it is important to enable low data access latency. Because common server-based storage uses erasure coding (EC) to reduce the capacity overhead of redundant data, the degraded write response time of EC data becomes a performance issue. The prior studies improve the write response time of server-based storage by storing frequently accessed data with replication that enables a smaller response time than EC. However, the difference in response time between replication and EC causes unstable write response time and leads to the usability degradation of target applications. We propose dynamic redundancy control with delayed parity update, which asynchronizes parity updates of EC data to achieve stable write response time f...
Combining replication and parity approaches for fault-tolerant disk arrays
Parallel and Distributed …, 1994
We explore the method of combining the replication and parity approaches to tolerate multiple disk failures in a disk array. In addition to the conventional mirrored and chained declustering methods, a method based on the hybrid of mirrored-and-chained declustering is explored. A performance study that explores the effect of combining replication and parity approaches is conducted. It is experimentally shown that the proposed approach can lead to the most cost-effective solution if the objective is to sustain the same ...
RESAR: Reliable Storage at Exabyte Scale
—Stored data needs to be protected against device failure and irrecoverable sector read errors, yet doing so at exabyte scale can be challenging given the large number of failures that must be handled. We have developed RESAR (Robust, Efficient, Scalable, Autonomous, Reliable) storage, an approach to storage system redundancy that only uses XOR-based parity and employs a graph to lay out data and parity. The RESAR layout offers greater robustness and higher flexibility for repair at the same overhead as a declustered version of RAID 6. For instance, a RESAR-based layout with 16 data disklets per stripe has about 50 times lower probability of suffering data loss in the presence of a fixed number of failures than a corresponding RAID 6 organization. RESAR uses a layer of virtual storage elements to achieve better manageability, a broader potential for energy savings, as well as easier adoption of heterogeneous storage devices.
Reliability Mechanisms for Very Large Storage Systems
2003
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improved techniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.
Triple failure tolerant storage systems using only exclusive-or parity calculations
We present a disk array organization that can survive three simultaneous disk failures while only using exclusive or operations to calculate the parities that generate this failure tolerance. The reliability of storage systems using magnetic disks depends on how prone individual disks are to failure. Unfortunately, disk failure rates are impossible to predict and it is well known that individual batches might be subject to much higher failure rates at some point during their lifetime. It is also known that many disk drive families, but not all, suffer a substantially higher failure rate at the beginning and some at the end of their economic lifespan. Our proposed organization can be built on top of a dense two failure tolerant layout using only exclusive-or operations and with a ratio of parity to data disks of 2/k. If the disk failure rates are higher than expected, the new organization can be super-imposed on the existing two-failure tolerant organization by introducing (k+1)/2 new parity disks and (k+1)/2 new reliability stripes to yield a three-failure tolerant layout without moving any data or calculating any other parity but the new one. We derive the organization using a graph visualization and a construction by Lawless of factoring a complete graph into paths.