File System Workload Analysis For Large Scale Scientific Computing Applications (original) (raw)

HPC global file system performance analysis using a scientific-application derived benchmark

Parallel Computing, 2009

With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle its data analysis requirements. However, to use such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel file systems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3, XT4, and Intel Itanium2 clusters; GPFS on IBM Power5 and AMD Opteron platforms; a BlueGene/P installation using GPFS and PVFS2 file systems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX-versus MPI-IO, and unique-versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and show that only the two of the nine evaluated systems benefited from MPI-2's asynchronous MPI-IO. On those systems, experimental results indicate that the computational intensity required to hide I/O effectively is already close to the practical limit of BLAS3 calculations. Overall, our study quantifies vast differences in performance and functionality of parallel file systems across state-of-the-art platforms-showing I/O rates that vary up to 75x on the examined architectures-while providing system designers and computational scientists a lightweight tool for conducting further analysis.

Small-file access in parallel file systems

2009 IEEE International Symposium on Parallel & Distributed Processing, 2009

Today's computational science demands have resulted in ever larger parallel computers, and storage systems have grown to match these demands. Parallel file systems used in this environment are increasingly specialized to extract the highest possible performance for large I/O operations, at the expense of other potential workloads. While some applications have adapted to I/O best practices and can obtain good performance on these systems, the natural I/O patterns of many applications result in generation of many small files. These applications are not well served by current parallel file systems at very large scale.

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

Journal of Computer Science and Technology, 2020

Many scientific fields increasingly use High-Performance Computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today's HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. However, burst buffer file systems typically offer many features that a scientific application, running in isolation for a limited amount of time, does not require. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.

Parallel File System Analysis Through Application I/O Tracing

The Computer Journal, 2012

Input/Output (I/O) operations can represent a significant proportion of the run-time of parallel scientific computing applications. Although there have been several advances in file format libraries, file system design and I/O hardware, a growing divergence exists between the performance of parallel file systems and the compute clusters that they support. In this paper, we document the design and application of the RIOT I/O toolkit (RIOT) being developed at the University of Warwick with our industrial partners at the Atomic Weapons ...

Scalable File Systems for High Performance Computing Final Report

2007

Simulations on high performance computer systems produce very large data sets. Rapid storage and retrieval of these data sets present major challenges for high-performance computing and visualization systems. Although computing speed and disk capacity have both increased at exponential rates over the past decade, disk bandwidth has lagged far behind. Moreover, existing file systems for high-performance computers are generally poorly suited for use with workstations, necessitating the copying of data for use with visualization systems. Our research has successfully addressed a number of the key research issues in the design of a high-performance multi-petabyte storage system targeted for use in post-Purple computing systems planned for

PVFS: A Parallel File System for Linux Clusters

2002

As Linux clusters have matured as platforms for lowcost, high-performance parallel computing, software packages to provide many key services have emerged, especially in areas such as message passing and networking. One area devoid of support, however, has been parallel file systems, which are critical for highperformance I/O on such clusters. We have developed a parallel file system for Linux clusters, called the Parallel Virtual File System (PVFS). PVFS is intended both as a high-performance parallel file system that anyone can download and use and as a tool for pursuing further research in parallel I/O and parallel file systems for Linux clusters.

Quantifying the Effects of Contention on Parallel File Systems

IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

As we move towards the Exascale era of super-computing, node-level failures are becoming more commonplace ; frequent checkpointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning checkpointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exascale; unfortunately, many domain-scientists simply submit their jobs with the default file system configuration. In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing paral-lelism in the parallel file system, performance can be improved by up to 49⇥. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.

On Evaluating Decentralized Parallel I/O Scheduling Strategies for Parallel File Systems

Lecture Notes in Computer Science, 2007

This paper evaluates the impact of the parallel I/O scheduling strategy on the performance of the file access in a parallel file system for clusters of commodity computers (Clusterfile). We argue that the parallel I/O scheduling strategy should be seen as a complement to other file access optimizations like striping over several I/O servers, non-contiguous I/O and collective I/O. Our study is based on three simple decentralized parallel I/O heuristics implemented inside Clusterfile. The measurements in a real environment show that the performance of parallel file access may vary with as much as 86% for writing and 804% for reading with the employed heuristic and with the schedule block granularity.

Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, 2013

The growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. This work's motivation are geophysical applications used for oil and gas exploration. These applications process Terabyte size datasets in HPC facilities [6]. The datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. The traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. Parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, most geophysical applications are I/O bounded, these results positively impact the overall performance of the application, and confirm the chosen strategy as path to follow.

File System Workload Analysis For Large Scale Scientific Computing Applications (original) (raw)

Related papers