Sage Weil - Academia.edu (original) (raw)
Papers by Sage Weil
Science China Information Sciences, 2015
Proceedings of the ACM/IEEE SC2004 Conference, 2004
In petabyte-scale distributed file systems that decouple read and write from metadata operations,... more In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
ACM/IEEE SC 2006 Conference (SC'06), 2006
Emerging large-scale distributed storage systems are faced with the task of distributing petabyte... more Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains.
Proceedings of the 8th Parallel Data Storage Workshop on - PDSW '13, 2013
Object-based storage systems are currently being investigated as promising possibilities for futu... more Object-based storage systems are currently being investigated as promising possibilities for future large-scale storage needs. The 'Ceph' storage system has been designed with the goal of scalable petabyte-level storage, but currently uses mirroring as its primary means of reliability. While the mirroring of objects has a certain elegance for simplicity of design and understanding, it is extremely inefficient in terms of hardware and economic overhead required to achieve even a basic degree of protection. Work is therefore progressing on adapting RAID-like methods for use in an object-based environment using Ceph as a testbed.
The data storage needs of large high-performance and general-purpose computing environments are g... more The data storage needs of large high-performance and general-purpose computing environments are generally best served by distributed storage systems. Traditional solutions, exemplified by NFS, provide a simple distributed storage system model, but cannot meet the demands of high-performance computing environments where a single server may become a bottleneck, nor do they scale well due to the need to manually partition (or repartition) the data among the servers. Object-based storage promises to address these needs through a simple networked data storage unit, the Object Storage Device (OSD) that manages all local storage issues and exports a simple read/write data interface. Despite this simple concept, many challenges remain, including efficient object storage, centralized metadata management, data and metadata replication, and data and metadata reliability. We describe Ceph, a distributed object-based storage system that meets these challenges, providing highperformance file storage that scales directly with the number of OSDs and Metadata servers.
We have developed Ceph, a distributed file system that provides excellent performance, reliabilit... more We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.
In petabyte-scale distributed file systems that decouple read and write from metadata operations,... more In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance. We examine aspects of the workload that make it difficult to distribute effectively, and present a few potential strategies to demonstrate the issues involved. Finally, we describe the advantages of intelligent metadata management and a simulation environment we have developed to validate design possibilities.
Science China Information Sciences, 2015
Proceedings of the ACM/IEEE SC2004 Conference, 2004
In petabyte-scale distributed file systems that decouple read and write from metadata operations,... more In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
ACM/IEEE SC 2006 Conference (SC'06), 2006
Emerging large-scale distributed storage systems are faced with the task of distributing petabyte... more Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains.
Proceedings of the 8th Parallel Data Storage Workshop on - PDSW '13, 2013
Object-based storage systems are currently being investigated as promising possibilities for futu... more Object-based storage systems are currently being investigated as promising possibilities for future large-scale storage needs. The 'Ceph' storage system has been designed with the goal of scalable petabyte-level storage, but currently uses mirroring as its primary means of reliability. While the mirroring of objects has a certain elegance for simplicity of design and understanding, it is extremely inefficient in terms of hardware and economic overhead required to achieve even a basic degree of protection. Work is therefore progressing on adapting RAID-like methods for use in an object-based environment using Ceph as a testbed.
The data storage needs of large high-performance and general-purpose computing environments are g... more The data storage needs of large high-performance and general-purpose computing environments are generally best served by distributed storage systems. Traditional solutions, exemplified by NFS, provide a simple distributed storage system model, but cannot meet the demands of high-performance computing environments where a single server may become a bottleneck, nor do they scale well due to the need to manually partition (or repartition) the data among the servers. Object-based storage promises to address these needs through a simple networked data storage unit, the Object Storage Device (OSD) that manages all local storage issues and exports a simple read/write data interface. Despite this simple concept, many challenges remain, including efficient object storage, centralized metadata management, data and metadata replication, and data and metadata reliability. We describe Ceph, a distributed object-based storage system that meets these challenges, providing highperformance file storage that scales directly with the number of OSDs and Metadata servers.
We have developed Ceph, a distributed file system that provides excellent performance, reliabilit... more We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second.
In petabyte-scale distributed file systems that decouple read and write from metadata operations,... more In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance. We examine aspects of the workload that make it difficult to distribute effectively, and present a few potential strategies to demonstrate the issues involved. Finally, we describe the advantages of intelligent metadata management and a simulation environment we have developed to validate design possibilities.