A fault-tolerant sequencer for timed asynchronous systems (original) (raw)

Fault Tolerant Sequencer: Specification and an Implementation

2001

Abstract The synchronization among thin, independent and concurrent processes in an open distributed system is a fundamental issue in current architectures (eg middlewares, three-tier architectures etc.).“Independent process” means no message has to be exchanged among the processes to synchronize themselves and “open” means that the number of processes that require to synchronize changes along the time.

Fault-Tolerant Asynchronous Networks

IEEE Transactions on Computers, 2000

tuple {IB, SB, OB, 5B, WB} where IB, SB, OB are the sets of binary k-tuples, n-tuples, and p-tuples called input, state, and II. NECESSARY AND SUFFICIENT CONDITIONS output sets [these are the different values the xi, yi, and zi In the following we present some definitions for introducing can take (refer to ], and SB, CB are the next-state and certain notations. Although these notations are presented in output functions reference to asynchronous sequential machines, they are appli-8B SB X IB SB cable to all sequential machines, synchronous or asynchronous.

Concurrency in Synchronous Systems

Formal Methods in System Design, 2006

In this paper we introduce the notion of weak endochrony, which extends to a synchronous setting the classical theory of Mazurkiewicz traces. The notion is useful in the synthesis of correct-by-construction communication protocols for globally asynchronous, locally synchronous (GALS) systems. The independence between various computations can be exploited here to provide communication schemes that do not restrict the concurrency while still guaranteeing correctness. Such communication schemes are then lighter and more flexible than their latency-insensitive or endo/isochronous counterparts. Unité de recherche INRIA Rennes IRISA, Campus universitaire de Beaulieu, 35042 RENNES Cedex (France) Téléphone : 02 99 84 71 00 -International : +33 2 99 84 71 00 Télécopie : 02 99 84 71 71 -International : +33 2 99 84 71 71

Timestamping messages and events in a distributed system using synchronous communication

Distributed Computing, 2006

Determining order relationship between events of a distributed computation is a fundamental problem in distributed systems which has applications in many areas including debugging, visualization, checkpointing and recovery. Fidge/Mattern's vector-clock mechanism captures the order relationship using a vector of size N in a system consisting of N processes. As a result, it incurs message and space overhead of N integers. Many distributed applications use synchronous messages for communication. It is therefore natural to ask whether it is possible to reduce the timestamping overhead for such applications. In this paper, we present a new approach for timestamping messages and events of a synchronously ordered computation, that is, when processes communicate using synchronous messages.

Synchronization with eventcounts and sequencers

Communications of the ACM, 1979

Synchronization of concurrent processes requires controlling the relative ordering of events in the processes. A new synchronization mechanism is proposed, using abstract objects called eventcounts and sequencers, that allows processes to control the ordering of events directly, rather than using mutual exclusion to protect manipulations of shared variables that control ordering of events. Direct control of ordering seems to simplify correctness arguments and also simplifies implementation in distributed systems. The mechanism is defined formally, and then several examples of its use are given. The relationship of the mechanism to protection mechanisms in the system is explained; in particular, eventcounts are shown to be applicable to situations where confinement of information matters. An implementation of eventcounts and sequencers in a system with shared memory is described.

Checkpoint and rollback in asynchronous distributed systems

Proceedings of INFOCOM '97, 1997

This paper proposes a novel algorithm for taking checkpoints and rolling back the processes for recovery in asynchronous distributed systems. The algorithm has the following properties: (1) Multiple processes can simultaneously initiate the checkpointing.

Sequentialization and Synchronization for Distributed Programs

2017

Author(s): Bakst, Alexander Goldberg | Advisor(s): Jhala, Ranjit | Abstract: Distributed systems are essential for building services that can handle the ever increasing number of people and devices connected to the internet as well as the associated growth in data accumulation. However, building distributed programs is hard, and building confidence in the correctness of an algorithm or implementation is harder still. One fundamental reason is the highly asynchronous nature of distributed execution. Timing differences caused by network delays and variation in compute power can trigger behaviors that were unanticipated by the programmer.Unfortunately, techniques for building confidence are all up against the same problem: the combinatorial explosion in the number of behaviors of a distributed system. Testing and model checking techniques can not hope to weed out all behaviors when the state space is infinite. At the other end of the spectrum, constructing proofs by hand is a daunting ...

On Composition and Implementation of Sequential Consistency (Extended Version)

ArXiv, 2016

It has been proved that to implement a linearizable shared memory in synchronous message-passing systems it is necessary to wait for a time proportional to the uncertainty in the latency of the network for both read and write operations, while waiting during read or during write operations is sufficient for sequential consistency. This paper extends this result to crash-prone asynchronous systems. We propose a distributed algorithm that builds a sequentially consistent shared memory abstraction with snapshot on top of an asynchronous message-passing system where less than half of the processes may crash. We prove that it is only necessary to wait when a read/snapshot is immediately preceded by a write on the same process. We also show that sequential consistency is composable in some cases commonly encountered: 1) objects that would be linearizable if they were implemented on top of a linearizable memory become sequentially consistent when implemented on top of a sequential memory w...

Total order communications over asynchronous distributed systems: Specifications and implementations

During the last two decades the design and development of total order (TO) communications has been one of the main research topics in dependable distributed computing. The huge amount of research work has produced several TO specifications and a wide variety of TO implementations with different guarantees whose differences are often left hidden or unclear. This paper presents a systematic classification of six distinct TO specifications based on a well-defined formal framework. The classification allows us (i) to define in a formal way the differences among the behaviors of faulty and correct processes admitted by each specification, and (ii) to derive a methodology that enables the classification of TO implementations with respect to their enforced specification. The paper also discusses the impact of TO specifications on the design of application logics. The methodology is then used to formally study the properties of eight variations of TO implementations based on a fixed sequencer given in a well-known context, namely primary component group communication systems.