Mukesh Singhal - Academia.edu (original) (raw)
Papers by Mukesh Singhal
Necessary and Sufficient Conditions on Information for Causal Message Ordering and Their Optimal Implementation
Distributed Computing, 1998
This paper formulates necessary and sufficient conditions on the information required for enforci... more This paper formulates necessary and sufficient conditions on the information required for enforcing causal ordering in a distributed system with asynchronous communication. The paper then presents an algorithm for enforcing causal message ordering. The algorithm allows a process to multicast to arbitrary and dynamically changing process groups. We show that the algorithm is optimal in the space complexity of the overhead of control information in both messages and message logs. The algorithm achieves optimality by transmitting the bare minimum causal dependency information specified by the necessity conditions, and using an encoding scheme to represent and transmit this information. We show that, in general, the space complexity of causal 0message ordering in an asynchronous system is Omega(n2)\Omega(n^{2})Omega(n2) , where nnn is the number of nodes in the system. Although the upper bound on space complexity of the overhead of control information in the algorithm is O(n2)O(n^{2})O(n2) , the overhead is likely to be much smaller on the average, and is always the least possible.
Invariant-Based Verification of a Distributed Deadlock Detection Algorithm
IEEE Transactions on Software Engineering, 1991
... Page 5. KSHEMKALYANI AND SINGHAL: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 793 ... A dependen... more ... Page 5. KSHEMKALYANI AND SINGHAL: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 793 ... A dependency from Tj to Tk through a data manager (where Tj is the requester and Tk is the holder) is represented by an edge from TJ to Tk in the TWFG. ...
IEEE Transactions on Software Engineering, 1994
In this paper, we propose a new algorithm to detect and resolve distributed deadlocks in the gene... more In this paper, we propose a new algorithm to detect and resolve distributed deadlocks in the generalized model. The initiator of the proposed algorithm diffuses the probes along the outgoing edges of Wait-For Graph (WFG) and collects the replies that carry the dependency information between processes directly. However, the initiator simplifies the unblocking conditions of blocked nodes in response to a reply form an unblocked node and receives almost two replies from any node unlike the earlier algorithms. It finally declares all the nodes that have not been reduced as deadlocked. We also prove the correctness of the algorithm. It has a worst-case time complexity of d+1 and message complexity of less than e+2n where d is the diameter, e is the number of edges and n is the number of nodes in the WFG. Since the termination detection of the proposed algorithm is isolated from deadlock detection, it minimizes the message length into a constant without using any explicit technique. It is the significant improvement over the existing algorithms. It also minimizes additional rounds of messages to resolve deadlocks.
Information Processing Letters, 1992
Singhal, M. and A. Kshemkalyani.
IEEE Transactions on Mobile Computing, 2006
We present a lightweight hierarchical routing model, Way Point Routing (WPR), in which a number o... more We present a lightweight hierarchical routing model, Way Point Routing (WPR), in which a number of intermediate nodes on a route are selected as waypoints and the route is divided into segments by the waypoints. Waypoints, including the source and the destination, run a high-level intersegment routing protocol, while the nodes on each segment run a low-level intrasegment routing protocol. One distinct advantage of our model is that when a node on the route moves out or fails, instead of discarding the whole original route and discovering a new route from the source to the destination, only the two waypoint nodes of the broken segment have to find a new segment. In addition, our model is lightweight because it maintains a hierarchy only for nodes on active routes. On the other hand, existing hierarchical routing protocols such as CGSR and ZRP maintain hierarchies for the entire network. We present an instantiation of WPR, where we use DSR as the intersegment routing protocol and AODV as the intrasegment routing protocol. This instantiation is termed DSR over AODV (DOA) routing protocol. Thus, DSR and AODV-two well-known on-demand routing protocols for MANETs-are combined into one hierarchical routing protocol and become two special cases of our protocol. Furthermore, we present two novel techniques for DOA: one is an efficient loop detection method and the other is a multitarget route discovery. Simulation results show that DOA scales well for large networks with more than 1,000 nodes, incurring about 60 percent-80 percent less overhead than AODV, while other metrics are better than or comparable to AODV and DSR.
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems, 1999
Checkpointing algorithms are classified as synchronous and asynchronous in the literature. In syn... more Checkpointing algorithms are classified as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coordination, resulting in performance degradation. In asynchronous checkpointing, processes take
Using Logging and Asynchronous Checkpointing to Implement Recoverable Distributed Shared Memory
Distributed shared memory provides a useful paradigm for developing distributed applications. As ... more Distributed shared memory provides a useful paradigm for developing distributed applications. As the number of processors in the system and running time of distributed applications increase, the likelihood of processor failure increases. A method of recovering processes running in a distributed shared memory environment which minimizes lost work and the cost of recovery is desirable so that long-running applications are not adversely affected by processor failure. A technique for achieving recoverable distributed shared memory which utilizes asynchronous process checkpoints and logging of pages accessed via read operations on the shared address space is presented. The scheme supports independent process recovery without forcing rollback of operational processes during recovery. The method is particularly useful in environments where taking process checkpoints is expensive
A Low Overhead Recovery Technique Using Quasi-Synchronous Checkpointing
In this paper, we propose a quasi-synchronous check- pointing algorithm and a low-overhead recove... more In this paper, we propose a quasi-synchronous check- pointing algorithm and a low-overhead recovery algo- rithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced check- point coordination for the progression of the recovery line which helps bound rollback propagation during a recov- ery. Thus, it has the easeness and
Finding Consistent Global Checkpoints in a Distributed Computation
IEEE Transactions on Parallel and Distributed Systems, 1997
Finding consistent global checkpoints of a distributed computation is important for analyzing, te... more Finding consistent global checkpoints of a distributed computation is important for analyzing, testing,or verifying properties of these computations. In this paper we present a theoretical foundation for findingconsistent global checkpoints. Given an arbitrary set S of local checkpoints, we prove exactly which sets ofother local checkpoints can be combined with S to build consistent global checkpoints, and we present analgorithm
IEEE Computer, 1996
m Causality-determining which event happens before what others-is vital in distributed computatio... more m Causality-determining which event happens before what others-is vital in distributed computations. Distributed systems can determine causality using logical clocks.
A reliable multicast algorithm, called RMA, for mobile ad hoc networks is presented that is based... more A reliable multicast algorithm, called RMA, for mobile ad hoc networks is presented that is based on a new cost criterion, called link lifetime, for determining the optimal path between a pair of nodes. The algorithm has the characteristics of using an undirected graph for its routing operations rather than a fixed structure like a tree or a mesh. Previously proposed routing metrics for mobile ad hoc networks were designed for use in wired environments, where link stability is not a concern. We propose a new metric, called the lifetime, which is more appropriate for mobile ad hoc networks. The lifetime metric is dependent on the predicted future life of the link under consideration. We developed a simulator for the mobile ad hoc networks, which is portable and scalable to a large number of nodes. Using the simulator, we carried out a simulation study to analyze the effectiveness of the routing metrics and the performance of the proposed reliable multicast algorithm. The simulation results show that the lifetime metric helps achieve better performance in mobile ad hoc environments than the hop count metric.
Journal of Parallel and Distributed Computing, 1997
Causal message ordering is required for several distributed applications. In order to preserve ca... more Causal message ordering is required for several distributed applications. In order to preserve causal ordering, only direct dependency information between messages, with respect to the destination process(es), need be sent with each message. By eliminating other kinds of control information from the messages, the communication overheads can be signi cantly reduced. In this paper we present an algorithm that uses this knowledge to e ciently enforce causal ordering of messages. The proposed algorithm does not require any prior knowledge of the network topology or communication pattern. As computation proceeds, it acquires knowledge of the communication pattern and is capable of handling dynamically changing multicast communication groups, and minimizing the communication overheads. With regard to communication overheads, the algorithm is optimal for the broadcast communication case. Extensive simulation experiments demonstrate that the algorithm imposes lower communication overheads than previous causal ordering algorithms. The algorithm can be employed in a variety of distributed computing environments. Its energy e ciency and low bandwidth requirement make it especially suitable for mobile computing systems. We show how to employ the algorithm for causally ordered multicasting of messages in mobile computing environments.
IEEE Transactions on Parallel and Distributed Systems, 1996
A mobile computing system consists of mobile and stationary nodes, connected to each other by a c... more A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively a ected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a nite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.
IEEE Computer, 1992
The problem of judiciously and transparently redistributing the load of the system among its node... more The problem of judiciously and transparently redistributing the load of the system among its nodes so that overall performance is maximized is discussed. Several key issues in load distributing for general-purpose systems, including the motivations and design trade-offs for load-distributing algorithms, are reviewed. In addition, several load-distributing algorithms are described and their performances are compared. These algorithms are sender-initiated algorithms, receiver-initiated algorithms, symmetrically initiated algorithms, and adaptive algorithms. Load-distributing policies used in existing systems are examined, and conclusions about which algorithm might help in realizing the most benefits of load distributing are drawn.>
Mobile computers use wireless channels to communicate with other computers. Efcient channel alloc... more Mobile computers use wireless channels to communicate with other computers. Efcient channel allocation is at the heart of the design of an e cient mobile computing system. The nite number of channels should be e ciently allocated to maximize throughput and avoid co-channel interference. Temporal variations in channel demand require channel allocation to adapt dynamically to the changing demand. In this paper we present a probabilistic analysis of the temporal imbalance in channel demand. The results provide guidelines for the design of dynamic distributed channel allocation strategies. The analysis indicates that the presence of load imbalance does not necessarily imply the possibility for channel transfer. Also, the e ectiveness of channel transfer depends on the duration for which the imbalance persists. This duration is in uenced in part by the velocity of mobile units. The wide variance in velocities of the mobile units can be handled by using a hierarchical cellular layout. Bulk channel transfers can be employed to reduce the overheads of channel transfer and to enable the system to quickly adjust to spatial variations in channel demand.
IEEE Transactions on Parallel and Distributed Systems, 1994
A mobile computing system consists of mobile and stationary nodes, connected to each other by a c... more A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively a ected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a nite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.
Necessary and Sufficient Conditions on Information for Causal Message Ordering and Their Optimal Implementation
Distributed Computing, 1998
This paper formulates necessary and sufficient conditions on the information required for enforci... more This paper formulates necessary and sufficient conditions on the information required for enforcing causal ordering in a distributed system with asynchronous communication. The paper then presents an algorithm for enforcing causal message ordering. The algorithm allows a process to multicast to arbitrary and dynamically changing process groups. We show that the algorithm is optimal in the space complexity of the overhead of control information in both messages and message logs. The algorithm achieves optimality by transmitting the bare minimum causal dependency information specified by the necessity conditions, and using an encoding scheme to represent and transmit this information. We show that, in general, the space complexity of causal 0message ordering in an asynchronous system is Omega(n2)\Omega(n^{2})Omega(n2) , where nnn is the number of nodes in the system. Although the upper bound on space complexity of the overhead of control information in the algorithm is O(n2)O(n^{2})O(n2) , the overhead is likely to be much smaller on the average, and is always the least possible.
Invariant-Based Verification of a Distributed Deadlock Detection Algorithm
IEEE Transactions on Software Engineering, 1991
... Page 5. KSHEMKALYANI AND SINGHAL: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 793 ... A dependen... more ... Page 5. KSHEMKALYANI AND SINGHAL: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 793 ... A dependency from Tj to Tk through a data manager (where Tj is the requester and Tk is the holder) is represented by an edge from TJ to Tk in the TWFG. ...
IEEE Transactions on Software Engineering, 1994
In this paper, we propose a new algorithm to detect and resolve distributed deadlocks in the gene... more In this paper, we propose a new algorithm to detect and resolve distributed deadlocks in the generalized model. The initiator of the proposed algorithm diffuses the probes along the outgoing edges of Wait-For Graph (WFG) and collects the replies that carry the dependency information between processes directly. However, the initiator simplifies the unblocking conditions of blocked nodes in response to a reply form an unblocked node and receives almost two replies from any node unlike the earlier algorithms. It finally declares all the nodes that have not been reduced as deadlocked. We also prove the correctness of the algorithm. It has a worst-case time complexity of d+1 and message complexity of less than e+2n where d is the diameter, e is the number of edges and n is the number of nodes in the WFG. Since the termination detection of the proposed algorithm is isolated from deadlock detection, it minimizes the message length into a constant without using any explicit technique. It is the significant improvement over the existing algorithms. It also minimizes additional rounds of messages to resolve deadlocks.
Information Processing Letters, 1992
Singhal, M. and A. Kshemkalyani.
IEEE Transactions on Mobile Computing, 2006
We present a lightweight hierarchical routing model, Way Point Routing (WPR), in which a number o... more We present a lightweight hierarchical routing model, Way Point Routing (WPR), in which a number of intermediate nodes on a route are selected as waypoints and the route is divided into segments by the waypoints. Waypoints, including the source and the destination, run a high-level intersegment routing protocol, while the nodes on each segment run a low-level intrasegment routing protocol. One distinct advantage of our model is that when a node on the route moves out or fails, instead of discarding the whole original route and discovering a new route from the source to the destination, only the two waypoint nodes of the broken segment have to find a new segment. In addition, our model is lightweight because it maintains a hierarchy only for nodes on active routes. On the other hand, existing hierarchical routing protocols such as CGSR and ZRP maintain hierarchies for the entire network. We present an instantiation of WPR, where we use DSR as the intersegment routing protocol and AODV as the intrasegment routing protocol. This instantiation is termed DSR over AODV (DOA) routing protocol. Thus, DSR and AODV-two well-known on-demand routing protocols for MANETs-are combined into one hierarchical routing protocol and become two special cases of our protocol. Furthermore, we present two novel techniques for DOA: one is an efficient loop detection method and the other is a multitarget route discovery. Simulation results show that DOA scales well for large networks with more than 1,000 nodes, incurring about 60 percent-80 percent less overhead than AODV, while other metrics are better than or comparable to AODV and DSR.
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems, 1999
Checkpointing algorithms are classified as synchronous and asynchronous in the literature. In syn... more Checkpointing algorithms are classified as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coordination, resulting in performance degradation. In asynchronous checkpointing, processes take
Using Logging and Asynchronous Checkpointing to Implement Recoverable Distributed Shared Memory
Distributed shared memory provides a useful paradigm for developing distributed applications. As ... more Distributed shared memory provides a useful paradigm for developing distributed applications. As the number of processors in the system and running time of distributed applications increase, the likelihood of processor failure increases. A method of recovering processes running in a distributed shared memory environment which minimizes lost work and the cost of recovery is desirable so that long-running applications are not adversely affected by processor failure. A technique for achieving recoverable distributed shared memory which utilizes asynchronous process checkpoints and logging of pages accessed via read operations on the shared address space is presented. The scheme supports independent process recovery without forcing rollback of operational processes during recovery. The method is particularly useful in environments where taking process checkpoints is expensive
A Low Overhead Recovery Technique Using Quasi-Synchronous Checkpointing
In this paper, we propose a quasi-synchronous check- pointing algorithm and a low-overhead recove... more In this paper, we propose a quasi-synchronous check- pointing algorithm and a low-overhead recovery algo- rithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced check- point coordination for the progression of the recovery line which helps bound rollback propagation during a recov- ery. Thus, it has the easeness and
Finding Consistent Global Checkpoints in a Distributed Computation
IEEE Transactions on Parallel and Distributed Systems, 1997
Finding consistent global checkpoints of a distributed computation is important for analyzing, te... more Finding consistent global checkpoints of a distributed computation is important for analyzing, testing,or verifying properties of these computations. In this paper we present a theoretical foundation for findingconsistent global checkpoints. Given an arbitrary set S of local checkpoints, we prove exactly which sets ofother local checkpoints can be combined with S to build consistent global checkpoints, and we present analgorithm
IEEE Computer, 1996
m Causality-determining which event happens before what others-is vital in distributed computatio... more m Causality-determining which event happens before what others-is vital in distributed computations. Distributed systems can determine causality using logical clocks.
A reliable multicast algorithm, called RMA, for mobile ad hoc networks is presented that is based... more A reliable multicast algorithm, called RMA, for mobile ad hoc networks is presented that is based on a new cost criterion, called link lifetime, for determining the optimal path between a pair of nodes. The algorithm has the characteristics of using an undirected graph for its routing operations rather than a fixed structure like a tree or a mesh. Previously proposed routing metrics for mobile ad hoc networks were designed for use in wired environments, where link stability is not a concern. We propose a new metric, called the lifetime, which is more appropriate for mobile ad hoc networks. The lifetime metric is dependent on the predicted future life of the link under consideration. We developed a simulator for the mobile ad hoc networks, which is portable and scalable to a large number of nodes. Using the simulator, we carried out a simulation study to analyze the effectiveness of the routing metrics and the performance of the proposed reliable multicast algorithm. The simulation results show that the lifetime metric helps achieve better performance in mobile ad hoc environments than the hop count metric.
Journal of Parallel and Distributed Computing, 1997
Causal message ordering is required for several distributed applications. In order to preserve ca... more Causal message ordering is required for several distributed applications. In order to preserve causal ordering, only direct dependency information between messages, with respect to the destination process(es), need be sent with each message. By eliminating other kinds of control information from the messages, the communication overheads can be signi cantly reduced. In this paper we present an algorithm that uses this knowledge to e ciently enforce causal ordering of messages. The proposed algorithm does not require any prior knowledge of the network topology or communication pattern. As computation proceeds, it acquires knowledge of the communication pattern and is capable of handling dynamically changing multicast communication groups, and minimizing the communication overheads. With regard to communication overheads, the algorithm is optimal for the broadcast communication case. Extensive simulation experiments demonstrate that the algorithm imposes lower communication overheads than previous causal ordering algorithms. The algorithm can be employed in a variety of distributed computing environments. Its energy e ciency and low bandwidth requirement make it especially suitable for mobile computing systems. We show how to employ the algorithm for causally ordered multicasting of messages in mobile computing environments.
IEEE Transactions on Parallel and Distributed Systems, 1996
A mobile computing system consists of mobile and stationary nodes, connected to each other by a c... more A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively a ected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a nite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.
IEEE Computer, 1992
The problem of judiciously and transparently redistributing the load of the system among its node... more The problem of judiciously and transparently redistributing the load of the system among its nodes so that overall performance is maximized is discussed. Several key issues in load distributing for general-purpose systems, including the motivations and design trade-offs for load-distributing algorithms, are reviewed. In addition, several load-distributing algorithms are described and their performances are compared. These algorithms are sender-initiated algorithms, receiver-initiated algorithms, symmetrically initiated algorithms, and adaptive algorithms. Load-distributing policies used in existing systems are examined, and conclusions about which algorithm might help in realizing the most benefits of load distributing are drawn.>
Mobile computers use wireless channels to communicate with other computers. Efcient channel alloc... more Mobile computers use wireless channels to communicate with other computers. Efcient channel allocation is at the heart of the design of an e cient mobile computing system. The nite number of channels should be e ciently allocated to maximize throughput and avoid co-channel interference. Temporal variations in channel demand require channel allocation to adapt dynamically to the changing demand. In this paper we present a probabilistic analysis of the temporal imbalance in channel demand. The results provide guidelines for the design of dynamic distributed channel allocation strategies. The analysis indicates that the presence of load imbalance does not necessarily imply the possibility for channel transfer. Also, the e ectiveness of channel transfer depends on the duration for which the imbalance persists. This duration is in uenced in part by the velocity of mobile units. The wide variance in velocities of the mobile units can be handled by using a hierarchical cellular layout. Bulk channel transfers can be employed to reduce the overheads of channel transfer and to enable the system to quickly adjust to spatial variations in channel demand.
IEEE Transactions on Parallel and Distributed Systems, 1994
A mobile computing system consists of mobile and stationary nodes, connected to each other by a c... more A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively a ected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a nite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.