Maintaining group connectivity in dynamic asynchronous distributed systems (original) (raw)

Fault-tolerant group communication protocols for asynchronous systems

1994

Contents iv Dlustrations vi Chapter 1-Introduction 1 1.1 Group Communication 2 1.1.1 Process Crashes and Membership Reconfiguration 2 1.1.2 Message Ordering 3 1.1.3 Message Delivery in Overlapping Process Groups 4 1.1.4 Existing Group Communication Protocols 4 1.2 Contributions of the Thesis 5 1.3 Thesis outline 6 Chapter 2-Group Communication Protocols and Related Problems 9 2.1 Synchrony and Group Communication 9 2.2 The System Model 11 2.3 Overlapping Process Groups 12 2.4 Message Order Delivery 2.4.1 Event Ordering in Distributed Systems 2.4.2 Identical Order Delivery 2.4.3 Causal Order Delivery 2.4.4 Total Order Delivery 2.5 Fault-Tolerance 2.6 Related Work 23 2.6.1 Chang and Maxemchuk's protocol.. 23 2.6.2 V System and Amoeba 24 2.6.3 ISIS protocols 25 2.6.4 Psync protocol 27 2.6.5 Trans and Total protocols 29 2.6.6 Transis protocols 30 2.6.7 Garcia-Molina and Spauster's protocol..

Reliable group communication in distributed systems

[1988] Proceedings. The 8th International Conference on Distributed, 1988

The design and implementation of a reliable group communication mechanism is presented. The mechanism guarantees a form of atomicity in that messages are received by all operational members of the group or by none of them. In addition, the order of messages is the same at each of the recipients. The message ordering property can be used to simplify distributed database and distributed processing algorithms. The proposed mechanism can survive despite process, host and communication failures. Survivability is essential in fault-tolerant applications.

A generic group communication approach for hybrid distributed systems

2009

Group Communication is a powerful abstraction that is being widely used to manage consistency problems in a variety of distributed system models, ranging from synchronous, to time-free asynchronous model. Though similar in principles, distinct implementation mechanisms have been employed in the design of group communication for distinct system models. However, the hybrid nature of many modern distributed systems, with dynamic and varied QoS guarantees, has put forward the need for integrated models. Furthermore, adaptation with degraded service is a common requirement in such scenarios. This paper tackles this new challenge by introducing a generic group communication mechanism. Because of its integrated feature, our approach is capable of handling group communication for both synchronous and asynchronous distributed systems, dynamically adapting to the available QoS. For example, it can dynamically switch to the asynchronous version when the run-time system can no longer guarantee a timely operation. The properties and algorithms of the integrated approach are presented in this paper, as well as a performance evaluation through simulation, comparing this mechanism with some classical approaches.

Reliable communication in the presence of failures

ACM Transactions on Computer Systems, 1985

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local-and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.

From Static Distributed Systems to Dynamic Systems

24th IEEE Symposium on Reliable Distributed Systems (SRDS'05)

A noteworthy advance in distributed computing is due to the recent development of peer-to-peer systems. These systems are essentially dynamic in the sense that no process can get a global knowledge on the system structure. They mainly allow processes to look up for data that can be dynamically added/suppressed in a permanently evolving set of nodes. Although protocols have been developed for such dynamic systems, to our knowledge, up to date no computation model for dynamic systems has been proposed. Nevertheless, there is a strong demand for the definition of such models as soon as one wants to develop provably correct protocols suited to dynamic systems. This paper proposes a model for (a class of) dynamic systems. That dynamic model is defined by (1) a parameter (an integer denoted α) and (2) two basic communication abstractions (query-response and persistent reliable broadcast). The new parameter α is a threshold value introduced to capture the liveness part of the system (it is the counterpart of the minimal number of processes that do not crash in a static system). To show the relevance of the model, the paper adapts an eventual leader protocol designed for the static model, and proves that the resulting protocol is correct within the proposed dynamic model. In that sense, the paper has also a methodological flavor, as it shows that simple modifications to existing protocols can allow them to work in dynamic systems.

Adaptive and Dependable Group Communication

Group Communication is a powerful abstraction that is being widely used to manage consistency problems in a variety of distributed system models, ranging from synchronous, to time-free asynchronous model. Though similar in principles, distinct implementation mechanisms have been employed in the design of group communication for distinct system models. However, the hybrid nature of many modern (real-time) distributed systems, with dynamic and varied QoS guarantees, has put forward the need for integrated models. Furthermore, adaptation with degraded service is a common requirement in such scenarios. This paper tackles this new challenge by introducing a generic group communication mechanism called the Timed Causal Blocks. Because of its integrated feature, the Timed Causal Blocks mechanism is capable of handling group communication for both synchronous and asynchronous distributed systems, dynamically adapting to the available QoS. For example, it can dynamically switch to the asynchronous version when the run-time system can no longer guarantee a timely operation. Formal properties of the integrated model and related mechanisms, with proof sketches are presented.

Fault-Tolerant Intra-Group Communication

In distributed applications, a group of processes have to be cooperated. The intra-group communication supports the atomic and causally ordered delivery of messages with the processes in the group. Each process in the group is replicated into a collection of multiple replicas named a clusters. In this paper, we would like to discuss a fault-tolerant group communication which supports the atomic and ordered delivery of messages among the clusters in the group in the presence of Byzantine faults of the replicas.

Consensus in Asynchronous Distributed Systems: A Concise Guided Tour

Lecture Notes in Computer Science, 2000

The distributed consensus problem arises when several processes need to reach a common decision despite failures. The importance of this problem is due to its omnipresence in distributed computation: we need consensus to implement reliable communications, atomic commitment, consistency checks, resources allocations etc. The solvability of this problem is strictly related to the nature of the system it is conceived in. When an asynchronous system is considered, a research result states the impossibility of deterministically reaching consensus when even one single fault occurs. In this paper we will focus our attention on the models proposed to overcome this result and the research originated from them.

Dynamic Byzantine Reliable Broadcast

2020

Reliable broadcast is a powerful primitive guaranteeing that, intuitively, all processes in a distributed system deliver the same set of messages. Œere is a twofold reason why this primitive is appealing: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) it is powerful enough to implement numerous useful applications like payment systems. Œe problem we tackle in this paper is that of dynamic reliable broadcast, i.e., enabling processes to join or leave the system. Œis is desirable property for long-lived applications supposed to be highly available, yet has been precluded in previous asynchronous reliable broadcast protocols. We introduce the first specification of a dynamic Byzantine reliable broadcast (dbrb) primitive that is amenable to an asynchronous implementation. Indeed, we present an algorithm that implements this specification in an asynchronous environment. ...

Object-Based Group Communication for Distributed Systems

In distributed applications, group communication among multiple objects is required. Group communication protocols provide a group of multiple objects with reliable data transmission service, i.e. messages are delivered to all the destination objects in the group in a well-dened order. In the distributed applications, there are two kinds of messages, i.e. requests and responses. Many group communication protocols have been discussed so far, which support the atomic and ordered delivery of messages at the communication network level. Only messages to be ordered at the application level have to be ordered. In this paper, we would like to discuss how to support the ordered delivery of request and response messages at the application level. Messages are delivered to application objects in such an order that the states of the objects are consistent. paper area : Distributed object-oriented systems sages in the group. 2 System Model 2.1 System structure The distributed system is composed of multiple application objects interconnected by the high-speed communication system [?] [Figure 1]. Object o is dened to be a pair of data structure D o and a collection P o of abstract operations for manipulating D o . Users can manipulate o only through the operations in P o . On receipt of a request message of operation op in P o , o computes op and sends back the response of op. op may change the abstract state of o, i.e. D o . A group G is dened to be a collection of multiple objects o 1 , : : : , o n (n 2), i.e. G = ho 1 , : : : , o n i.