A survey of dynamic replication strategies for improving data availability in data grids (original) (raw)
Related papers
2012-FGCS Journal-A survey of dynamic replication strategies for improving data availability.pdf
Data grid is a distributed collection of storage and computational resources that are not bounded within a geophysical location. It is a fast growing area of research and providing efficient data access and maximum data availability is a challenging task. To achieve this task, data is replicated to different sites. A number of data replication techniques have been presented for data grids. All replication techniques address some attributes like fault tolerance, scalability, improved bandwidth consumption, performance, storage consumption, data access time etc. In this paper, different issues involved in data replication are identified and different replication techniques are studied to find out which attributes are addressed in a given technique and which are ignored. A tabular representation of all those parameters is presented to facilitate the future comparison of dynamic replication techniques. The paper also includes some discussion about future work in this direction by identifying some open research problems. .pk (A. Daud).
The State of the Art and Open Problems in Data Replication in Grid Environments
Handbook of Research on Scalable Computing Technologies, 2010
Data Grids provide services and infrastructure for distributed data-intensive applications that need to access, transfer and modify massive datasets stored at distributed locations around the world. For example, the next-generation of scientific applications such as many in high-energy physics, molecular modeling, and earth sciences will involve large collections of data created from simulations or experiments. The size of these data collections is expected to be of multi-terabyte or even petabyte scale in many applications. Ensuring efficient, reliable, secure and fast access to such large data is hindered by the high latencies of the Internet. The need to manage and access multiple petabytes of data in Grid environments, as well as to ensure data availability and access optimization are challenges that must be addressed. To improve data access efficiency, data can be replicated at multiple locations so that a user can access the data from a site near where it will be processed. In addition to the reduction of data access time, replication in Data Grids also uses network and storage resources more efficiently. In this chapter, the state of current research on data replication and arising challenges for the new generation of data-intensive grid environments are reviewed and open problems are identified. First, fundamental data replication strategies are reviewed which offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Then, specific algorithms for selecting appropriate replicas and maintaining replica consistency are discussed. The impact of data replication on job scheduling performance in Data Grids is also analyzed. A set of appropriate metrics including access latency, bandwidth savings, server load, and storage overhead for use in making critical
A four-phase data replication algorithm for data grid
Journal of Advanced Computer Science & Technology, 2014
Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.
Dynamic replication strategies in data grid systems: A survey (Rapport de recherche IRIT))
HAL (Le Centre pour la Communication Scientifique Directe), 2014
In data grid systems, data replication aims to increase availability, fault tolerance, load balancing and scalability while reducing bandwidth consumption, and job execution time. Several classification schemes for data replication were proposed in the literature, (i) static vs. dynamic, (ii) centralized vs. decentralized, (iii) push vs. pull, and (iv) objective function based. Dynamic data replication is a form of data replication that is performed with respect to the changing conditions of the grid environment. In this paper, we present a survey of recent dynamic data replication strategies. We study and classify these strategies by taking the target data grid architecture as the sole classifier. We discuss the key points of the studied strategies and provide feature comparison of them according to important metrics. Furthermore, the impact of data grid architecture on dynamic replication performance is investigated in a simulation study. Finally, some important issues and open research problems in the area are pointed out.
Dynamic replication strategies in data grid systems: a survey
The Journal of Supercomputing, 2015
In data grid systems, data replication aims to increase availability, fault tolerance, load balancing and scalability while reducing bandwidth consumption, and job execution time. Several classification schemes for data replication were proposed in the literature, (i) static vs. dynamic, (ii) centralized vs. decentralized, (iii) push vs. pull, and (iv) objective function based. Dynamic data replication is a form of data replication that is performed with respect to the changing conditions of the grid environment. In this paper, we present a survey of recent dynamic data replication strategies. We study and classify these strategies by taking the target data grid architecture as the sole classifier. We discuss the key points of the studied strategies and provide feature comparison of them according to important metrics. Furthermore, the impact of data grid architecture on dynamic replication performance is investigated in a simulation study. Finally, some important issues and open research problems in the area are pointed out.
Proposing and Evaluating Dynamic Data Replication Strategy in Data Grid Environment
2013
A Data Grid consists of a collection of geographically distributed computer and storage resources located in different places, and enables users to share data and other resources. Data replication is used in Data Grid to enhance data availability, fault tolerance, load balancing and reliability. Although replication is a key technique, but the problem of selecting proper locations for placing replicas i.e. replica placement in Data Grid has not been widely studied, yet. In this paper an efficient replica selection strategy is proposed to select the best replica location from among the many replicas. Due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica. A Grid simulator, OptorSim, is used to evaluate the performance of this dynamic replication strategy. The simulation results show that the proposed algorithm outperforms comparing t...
Data replication strategies in grid environments
2002
Abstract Data grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system.
A Dynamic Data Replication in Grid System
Procedia Computer Science, 2016
A data grid is a structural design or cluster of services that provides persons or assortments of users' ability to access modify and transfer very great amount of geographically distributed data. As a result of this needed massive storage resources for store massive data files. For take away that drawback we have a tendency to use dynamic data replication is applied to scale back data time interval and to utilize network and storage resources expeditiously. Dynamic data replication through making several replicas in numerous websites. Here, through up the modified BHR (MBHR) methodology, we have a tendency to project a dynamic algorithmic program for data replication in data grid system. This algorithmic program uses variety of parameter for locating replicating appropriate web site wherever the file is also needed in future with high likelihood. The algorithmic program predicts future wants of replicated appropriate grid web site square measure supported file access history.