DiSK: A distributed shared disk cache for HPC environments (original) (raw)

A Distributed and Flexible Platform for Large-Scale Data Storage in HPC Systems

2017

The realm of HPC systems lies in sharing computational resources efficiently. Their challenge is to turn massively large data into valuable information and meaningful knowledge. To accomplish this, I/O subsystems have to provide scalable bandwidth and capacity in order to deliver on the increasing demand for their requests. Emerging technologies, new programming paradigms and virtualized environments need novel ways to offer optimised solutions to support heavy data ows in storage services. In this paper, we propose a distributed storage layer on computer nodes that can be used as a robust data storage service to handle intensive I/O operations. Preliminary experiments show that our platform outperforms other distributed data storage solutions.

A Network-Aware Distributed Storage Cache for Data Intensive Environments

1999

Modern scientific computing involves organizing, moving, visualizing, and analyzing massive amounts of data at multiple sites around the world. The technologies, the middleware services, and the architectures that are used to build useful high-speed, wide area distributed systems, constitute the field of data intensive computing. In this paper we will describe an architecture for data intensive applications where we use a high-speed distributed data cache as a common element for all of the sources and sinks of data. This cache-based approach provides standard interfaces to a large, application-oriented, distributed, on-line, transient storage system. We describe our implementation of this cache, how we have made it "network aware," and how we do dynamic load balancing based on the current network conditions. We also show large increases in application throughput by access to knowledge of the network conditions.

An Effective Storage Mechanism for High Performance Computing (HPC)

International Journal of Advanced Computer Science and Applications, 2015

All over the process of treating data on HPC Systems, parallel file systems play a significant role. With more and more applications, the need for high performance Input-Output is rising. Different possibilities exist: General Parallel File System, cluster file systems and virtual parallel file system (PVFS) are the most important ones. However, these parallel file systems use pattern and model access less effective such as POSIX semantics (A family of technical standards emerged from a project to standardize programming interfaces software designed to operate on variant UNIX operating system.), which forces the MPI-IO implementations to use inefficient techniques based on locks. To avoid this synchronization in these techniques, we ensure that the use of a versioning-based file system is much more effective.

Object Storage: Scalable Bandwidth for HPC Clusters

2004

This paper describes the Object Storage Architecture solution for cost-effective, high bandwidth storage in High Performance Computing (HPC) environments. An HPC environment requires a storage system to scale to very large sizes and performance without sacrificing cost-effectiveness nor ease of sharing and managing data. Traditional storage solutions, including disk-per-node, Storage-Area Networks (SAN), and Network-Attached Storage (NAS) implementations, fail to find a balance between performance, ease of use, and cost as the storage system scales up. In contrast, building storage systems as specialized storage clusters using commodity-off-the-shelf (COTS) components promise excellent price-performance at scale provided that binding them into a single system image and linking them to HPC compute clusters can be done without introducing bottlenecks or management complexities. While a file interface (typified by NAS systems) at each storage cluster component is too high-level to provide scalable bandwidth and simple management across large numbers of components, and a block interface (typified by SAN systems) is too low-level to avoid synchronization bottlenecks in a shared storage cluster, an object interface (typified by the inode layer of traditional file system implementations) is at the intermediate level needed for independent, highly parallel operation at each storage cluster component under centralized, but infrequently applied, control. The Object Storage Device (OSD) interface achieves this independence by storing an unordered collection of named variable-length byte arrays, called objects, and embedding extendable attributes, fine-grain capability-based access control, and encapsulated data layout and allocation into each object. With this higher-level interface, object storage clusters are capable of highly parallel data transfers between storage and compute cluster node under the infrequently applied control of the out-of-band metadata managers. Object Storage Architectures support single-system-image file systems with the traditional sharing and management features of NAS systems and the resource consolidation and scalable performance of SAN systems.

GekkoFS - A Temporary Distributed File System for HPC Applications

2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018

We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.

C2P: Co-operative Caching in Distributed Storage Systems

Lecture Notes in Computer Science, 2014

Distributed storage systems (e.g. clustered filesystems-HDFS, GPFS and Object Stores-Openstack swift) often partition sequential data across storage systems for performance (data striping) or protection (Erasure-Coding). This partitioning leads to logically correlated data being stored on different physical storage devices, which operate autonomously. This un-coordinated operation may lead to inefficient caching, where different devices may cache segments that belong to different working sets. From an application perspective, caching is effective only if all segments needed by it at a given point in time are cached and a single missing segment may lead to high application latency. In this work, we present C2P: a middleware for cooperative caching in distributed storage. C2P uses an event-based architecture to coordinate caching across the storage devices and ensures that all devices cache correlated segments. We have implemented C2P as a caching middleware for hosted Openstack Swift Object Store. Our experiments show 4-6% improved cache hit and 3-5% reduced disk IO with minimal resource overheads. Read (A) Read (B) Read (C) Segments in Cache Read (A) Read (B) Read (C) Storage Application A1 B2 C1 A1 B2

Towards cost-effective and high-performance caching middleware for distributed systems

International Journal of Big Data Intelligence, 2016

One performance bottleneck of distributed systems lies on the hard disk drive (HDD) whose single read/write head has physical limitations to support concurrent I/Os. Although the solid-state drive (SSD) has been introduced for years, HDDs are still dominant storage due to large capacity and low cost. This paper proposes a caching middleware that manages the underlying heterogeneous storage devices in order to allow distributed file systems to achieve both high performance and low cost. Specifically, we design and implement a user-level caching system that offers SSD-like performance at a cost similar to a HDD. We demonstrate how such a middleware improves the performance of distributed file systems, such as the HDFS. Experimental results show that the caching system delivers up to 7X higher throughput and 76X higher IOPS than Linux Ext4 file system, and accelerates HDFS by 28% on 32 nodes.

Ad hoc file systems for HPC

2020

Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory (NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include node-local SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using nodelocal storage media. The discussion includes open research questions, such as how to couple ad ...

Multi-level caching in distributed file systems

1992

We are investigating the potential for intermediate file servers to address scaling problems in increasingly large distributed file systems. To this end, we have run trace-driven simulations based on data from DEC-SRC and our own data collection to determine the potential of caching-only intermediate servers. The degree of sharing among clients is central to the effectiveness of an intermediate server. This turns out to be quite low in the traces available to us. All told, fewer than 10% of block accesses are to files shared by more than one file system client. Trace-driven simulation shows that even with an infinite cache at the intermediate, cache hit rates are disappointingly low. For client caches as small as 20 MB, we observe hit rates less than 19%. As client cache sizes increase, the hit rate at the intermediate approaches the degree of sharing among all clients. On the other hand, the intermediate does appear to be effective in reducing the peak load presented to upstream file servers.

CHAIO: Enabling HPC Applications on Data-Intensive File Systems

The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the semantic gap between data-intensive file systems and HPC imposes numerous challenges. For example, N-1 (N to 1) is a widely used data access pattern for HPC applications such as checkpointing, but cannot perform well on data-intensive file systems.

DiSK: A distributed shared disk cache for HPC environments (original) (raw)

Related papers