An Analysis of File Migration in a Unix Supercomputing Environment (original) (raw)
Related papers
Dynamic file migration in distributed computer systems
Communications of the ACM, 1990
The importance of file migration is increasing because of its potential to improve the performance of distributed office, manufacturing and hospital information systems. To encourage research in the file migration problem, the authors summarize accomplishments of researchers of the problem, provide a detailed comparison of file migration and dynamic file allocation problems, and identify important areas of research to support the development of effective file migration policies.
Long-term File Activity and Inter-Reference Patterns
We compare and contract long-term file system activity for different Unix environments for periods of 120 to 280 days. Our focus is on finding common long-term activity trends and reference patterns. Our analysis shows that 90% of all files are not used after initial creation, those that are used are normally short-lived, and that if a file is not used in some manner the day after it is created, it will probably never be used. Additionally, we find approximately 1% of all files are used daily. This information allows us to more accurately predict the files which are never used. These files can be compressed or moved to tertiary storage enabling either more users per disk or larger user disk quotas.
File System Workload Analysis For Large Scale Scientific Computing Applications
Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We reexamine the workload characteristics in parallel computing environments in the light of recent technology advances and new applications. We analyze application traces from a cluster with hundreds of nodes. On average, each application has only one or two typical request sizes. Large requests from several hundred kilobytes to several megabytes are very common. Although in some applications, small requests account for more than 90% of all requests, almost all of the I/O data are transferred by large requests. All of these applications show bursty access patterns. More than 65% of write requests have inter-arrival times within one millisecond in most applications. By running the same benchmark on different file models, we also find that the write throughput of using an individual output file for each node exceeds that of using a shared file for all nodes by a factor of 5. This indicates that current file systems are not well optimized for file sharing.
Measurement and Analysis of Large-Scale Network File System Workloads
In this paper we present the analysis of two large-scale network file system workloads. We measured CIFS traffic for two enterprise-class file servers deployed in the NetApp data center for a three month period. One file server was used by marketing, sales, and finance departments and the other by the engineering department. Together these systems represent over 22 TB of storage used by over 1500 employees, making this the first ever large-scale study of the CIFS protocol. We analyzed how our network file system workloads compared to those of previous file system trace studies and took an in-depth look at access, usage, and sharing patterns. We found that our workloads were quite different from those previously studied; for example, our analysis found increased read-write file access patterns, decreased read-write ratios, more random file access, and longer file lifetimes. In addition, we found a number of interesting properties regarding file sharing, file re-use, and the access patterns of file types and users, showing that modern file system workload has changed in the past 5–10 years. This change in workload characteristics has implications on the future design of network file systems, which we describe in the paper.