Crash-tolerant Consensus in Directed Graph Revisited (original) (raw)

Brief Announcement: Crash-Tolerant Consensus in Directed Graph Revisited

2017

We revisit the problem of distributed consensus in directed graphs tolerating crash failures; we improve the round and communication complexity of the existing protocols. Moreover, we prove that our protocol requires the optimal number of communication rounds, required by any protocol belonging to a specific class of crash-tolerant consensus protocols in directed graphs. 1998 ACM Subject Classification B.8.1 Reliability, Testing and Fault-Tolerance, E.1 Distributed data structures

The consensus problem in fault-tolerant computing

ACM Computing Surveys, 1993

The consensus problem is concerned with the agreement on a system status by the fault-free segment of a processor population in spite of the possible inadvertent or even malicious spread of disinformation by the faulty segment of that population. The resulting protocols are useful throughout fault-tolerant parallel and distributed systems and will impact the design of decision systems to come. This paper surveys research on the consensus problem, compares approaches, outlines applications, and suggests directions for future work.

Unconditionally reliable message transmission in directed networks

2008

In the unconditionally reliable message transmission (URMT) problem, two non-faulty players, the sender S and the receiver R are part of a synchronous network modeled as a directed graph. S has a message that he wishes to send to R; the challenge is to design a protocol such that after exchanging messages as per the protocol, the receiver A protocol for URMT is one of the fundamental primitives used by almost all fault-tolerant distributed algorithms since without reliable communication little that is truly collaborative is possible. In fact, several popular fault-tolerant distributed algorithms, like (randomized) Byzantine agreement etcetera, assume that the underlying network is a complete graph, thereby implicitly assuming the existence of a URMT protocol that can simulate a complete graph overlaid in the actual underlying network (for the actual connectivity is seldom complete in practice). Notwithstanding its applications in distributed computing, the problem of URMT is nevertheless, in principle, interesting and challenging in own right.

Fault tolerance in networks of bounded degree

Achieving processor cooperation in the presence of faults is a major problem in distributed systems. Popular paradigms such as Byzantine agreement have been studied principally in the context of a complete network. Indeed, Dolev [J. Algorithms, 3 (1982), pp. 14-30] and Hadzilacos [Issues of Fault Tolerance in Concurrent Computations, Ph.D. thesis, Harvard University, Cambridge, MA, 1984] have shown that fl(t) connectivity is necessary if the requirement is that all nonfaulty processors decide unanlmously, where is the number of faults to be tolerated. We believe that in forseeable technologies the number of faults will grow with the size of the network while the degree will remain practically fixed. We therefore raise the question whether it is possible to avoid the connectivity requirements by slightly lowering our expectations. In many practical situations we may be willing to "lose" some correct processors and settle for cooperation between the vast majority of the processors. Thus motivated, we present a general simulation technique by which vertices (processors) in almost any network of bounded degree can simulate an algorithm designed for the complete network. The simulation has the property that although some correct processors may be cut off from the majority of the network by faulty processors, the vast majority of the correct processors will be able to communicate among themselves undisturbed by the (arbitrary) behavior of the faulty nodes.

Distributed Consensus, revisited

Acta Informatica, 2007

We provide a novel model to formalize a well-known algorithm, by Chandra and Toueg, that solves Consensus among asynchronous distributed processes in the presence of a particular class of failure detectors (3S or, equivalently, Ω), under the hypothesis that only a minority of processes may crash. The model is defined as a global transition system that is unambigously generated by local transition rules. The model is syntax-free in that it does not refer to any form of programming language or pseudo code. We use our model to formally prove that the algorithm is correct. * The original publication is available at www.springerlink.com 1 Actually, the algorithm may easily reach system configurations in which, at a certain point in time, every process is coordinator in its current round, while all processes are in pairwise different rounds, by having every participant simply always suspect the respective coordinator. Analogously, the algorithm may easily reach moments in which none of the processes is the coordinator of its round. Moreover, in such a moment, it is impossible to predict, from a chronological point of view, which process will next become coordinator.

Initial failures in distributed computations

International Journal of Parallel Programming, 1989

We inv estigate the possibility of solving problems in completely asynchronous message passing systems where a number of processes may fail prior to execution. By using game-theoretical notions, necessary and sufficient conditions are provided for solving problems in such a model with and without a termination requirement. An upper bound on the message complexity for solving any problem in the model is given, as well as a simple design concept for constructing a solution to any solvable problem.

Byzantine vector consensus in complete graphs

Proceedings of the 2013 ACM symposium on Principles of distributed computing, 2013

Consider a network of n processes, each of which has a ddimensional vector of reals as its input. Each process can communicate directly with all the processes in the system; thus the communication network is a complete graph. All the communication channels are reliable and FIFO (first-infirst-out). • We prove that in a synchronous system, n ≥ max(3f + 1, (d+1)f +1) is necessary and sufficient for achieving Byzantine vector consensus. • In an asynchronous system, it is known that exact consensus is impossible in presence of faulty processes. For an asynchronous system, we prove that n ≥ (d + 2)f + 1 is necessary and sufficient to achieve approximate Byzantine vector consensus. Our sufficiency proofs are constructive. We prove sufficiency by providing explicit algorithms that solve exact BVC in synchronous systems, and approximate BVC in asynchronous systems.

The weakest failure detector for solving consensus

1992

We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In Chandra and Toueg [1996], it is shown that {ᐃ, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as {ᐃ. Thus, {ᐃ is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.

Distributed Consensus with Finite Message Passing

Computing Research Repository, 2010

Inspired by distributed resource allocation problems in dynamic topology networks, we initiate the study of distributed consensus with finite messaging passing. We first find a sufficient condition on the network graph for which no distributed protocol can guarantee a conflict-free allocation after R rounds of message passing. Secondly we fully characterize the conflict minimizing zero-round protocol for path graphs, namely random allocation, which partitions the graph into small conflict groups. Thirdly, we enumerate all one-round protocols for path graphs and show that the best one further partitions each of the smaller groups. Finally, we show that the number of conflicts decrease to zero as the number of available resources increase. c n n−1 d=0 dN (d), since there are c n different possible colorings. Finally, we define two classes of protocols that qualify the worst case performance.