Checkpointing protocols in distributed systems with mobile hosts: A performance analysis (original) (raw)
Related papers
Paradigms in Fault Tolerant Checkpointing Protocols in Distributed Mobile Systems
2012 International Conference on Computing Sciences, 2012
Distributed mobile systems are ubiquitous nowadays. These systems are important building blocks and are useful for constructing efficient protocols for client-server systems, transaction processing, web applications, and scientific computing [5,11]. Distributed mobile systems are not fault tolerant. They introduce new challenges in the area of fault tolerant computing. The vast computing potential of these systems is often hampered by their susceptibility to failures. Mobile computing highlights many issues, such as lower throughput and latency, low bandwidth of wireless channels, lack of stable storage on mobile hosts, connection breakdowns and inadequate battery life that make the classical checkpointing protocols incongruous. Many techniques like group communication, transactions and rollback recovery have been developed to add reliability to such systems [12]. This paper surveys the algorithms which will restore the system back to a consistent state after a failure.
Low overhead optimal checkpointing for mobile distributed systems
Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), 2003
A checkpoint algorithm for a mobile distributed computing system (MDCS) needs to handle many new issues like: mobility, low bandwidth of wireless channels, lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional checkpointing techniques unsuitable for mobile distributed systems.
A Tunable Checkpointing Algorithm for the Distributed Mobile Environment
The aim of a distributed checkpointing algorithm is to efficiently restore the execution state of distributed applications in face of hardware or software failures. Originally, such algorithms were devised for fixed networking systems, of which computing components communicate with each other via wired networks. Therefore, those algorithms usually suffer from heavy networking costs coming from frequent data transits over wireless networks, if they are used in the wireless computing environment. In this paper, to reduce usage of wireless communications, our checkpointing algorithm allows the distributed mobile application to tune the level of its checkpointing strictness. The strictness is defined by the maximum rollback distance (MRD) that says how many recent local checkpoints can be rolled back in the worst case. Since our algorithm have more flexibility in checkpointing schedule due to the use of MRD, it is possible to reduce the number of enforced local checkpointing. In particu...
A distributed consistent global checkpoint algorithm for distributed mobile systems
2001
A distributed coordinated checkpointing algorithm for distributed mobile systems is presented. A consistent global checkpoint is a set of states in which no message is recorded as received in one process and as not yet sent in another process. It is used for rollback when process failure occurs. A consistent global checkpoint must be obtained for any checkpoint initiation by any process. This paper shows a checkpoint algorithm in which the amount of information piggybacked on program messages does not depend on the number of mobile processes. The number of checkpoints is minimized under two assumptions: (I) one consistent global checkpoint is taken for concurrent checkpoint initiations and (2) a checkpoint is initiated at each handoff by mobile processes. This algorithm is thus optimal among the generalizations of Chandy and Lamport's distributed snapshot algorithm under the latter assumption.
iaeme
Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. While dealing with Mobile Distributed systems, we come across some issues like: mobility, low bandwidth of wireless channels and lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional check pointing techniques designed for Distributed systems unsuitable for Mobile environments. To take a checkpoint, an MH has to transfer a large amount of checkpoint data to its local MSS over the wireless network. Since the wireless network has low bandwidth and MHs have low computation power, all-process check pointing will waste the scarce resources of the mobile system on every checkpoint. Minimum-process coordinated check pointing is a preferred approach for mobile distributed systems. In this paper, we discuss various existing minimum-process check pointing protocols for mobile distributed systems.
Analysis Of Recent Checkpointing Techniques For Mobile Computing Systems
International Journal of Computer Science & Engineering Survey, 2011
Recovery from transient failures is one of the prime issues in the context of distributed systems. These systems demand to have transparent yet efficient techniques to achieve the same. Checkpoint is defined as a designated place in a program where normal processing of a system is interrupted to preserve the status information. Checkpointing is a process of saving status information. Mobile computing systems often suffer from high failure rates that are transient and independent in nature. To add reliability and high availability to such distributed systems, checkpoint based rollback recovery is one of the widely used techniques for applications such as scientific computing, database, telecommunication applications and mission critical applications. This paper surveys the algorithms which have been reported in the literature for checkpointing in Mobile Computing Systems.
A New Efficient Checkpointing Algorithm for Distributed Mobile Computing
Journal of Control Engineering and Applied Informatics, 2015
Mobile networks have been quickly adopted by many companies and individuals. However, multiple factors such as mobility and limited resources often constrain availability and thus cause instability of the wireless environment. Such instability poses serious challenge for fault tolerant distributed mobile applications. Therefore, the classical checkpointing techniques, which make the applications more failure-resistant, are not always compatible with the mobile context. In fact, it is necessary now to think about other techniques or at least adapt those to devise effective and well suited techniques for the mobile environment. Considering this issue, the contribution in this paper is a proposal of a new checkpointing algorithm suitable for mobile computing systems. This algorithm is characterized by its efficiency and optimization in terms of incurred time-space overhead during checkpointing process and normal application running period.
IAEME PUBLICATION, 2020
Minimum-process coordinated checkpointing is an applicable approach to acquaint with software-based retrieval in mobile Distributed Systems patently. In order to balance the Global State accumulation overhead and the loss of computation on recuperation, we recommend an amalgamated protocol, wherein, an all-process coordinated global snapshot is recorded after the accomplishment of least-process checkpointing protocol for a fixed number of times. In orchestrated checkpointing, if a solitary process flops to record its tentative snapshot; all the global snapshot compilation effort goes waste. Therefore, we suggest that in the first juncture, all concerned Mobile Hosts will record temporary snapshot only. The cost of recording a temporary snapshot is insignificantly small; as it is stored on the memory of Mobile Host only. In this way, we are able to deal with repeated abandons during checkpointing. In the least-process coordinated Global state accumulation, an effort has been made to abate the number of unworkable snapshots and impeding of processes using a probabilistic approach.
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System
A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Checkpoint is defined as a fault tolerant technique. It is a save state of a process during the failure-free execution, enabling it to restart from this checkpointed state upon a failure to reduce the amount of lost work instead of repeating the computation from beginning. The process of restoring form previous checkpointed state is known as rollback recovery. A checkpoint can be saved on either the stable storage or the volatile storage depending on the failure scenarios to be tolerated. Checkpointing is major challenge in mobile ad hoc network. The mobile ad hoc network architecture is one consisting of a set of self configure mobile hosts(MH) capable of communicating with each other without the assistance of base stations, some of processes running on mobile host. The main issues of this environment are insufficient power and limited storage capacity. This paper surveys the algorithms which have been reported in the literature for checkpointing in distributed systems as well as Mobile Distributed systems. Keywords – Checkpointing, Distributed systems, Fault tolerance, Mobile computing system, Rollback recovery.
A MINIMUM PROCESS SYNCHRONOUS CHECKPOINTING ALGORITHM FOR MOBILE DISTRIBUTED SYSTEM
iaeme
A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. A mobile computing system is a distributed system where some of processes are running on mobile hosts (MHs), whose location in the network changes with time. The number of processes that take checkpoints is minimized to 1) avoid awakening of MHs in doze mode of operation, 2) minimize thrashing of MHs with checkpointing activity, 3) save limited battery life of MHs and low bandwidth of wireless channels. In minimum-process checkpointing protocols, some useless checkpoints are taken or blocking of processes takes place. In this paper, we propose a minimum-process coordinated checkpointing algorithm for non-deterministic mobile distributed systems, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead. We try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others