Sergio Arévalo - Academia.edu (original) (raw)

Papers by Sergio Arévalo

Computing, 2020

Uniform reliable broadcast (URB) is an important abstraction in distributed systems, offering del... more Uniform reliable broadcast (URB) is an important abstraction in distributed systems, offering delivery guarantee when spreading messages between processes. Informally, URB guarantees that if a process (correct or not) delivers a message m, then all correct processes deliver m. This abstraction has been extensively investigated in distributed systems where all processes have unique identifiers. Furthermore, the majority of papers in the literature usually assume that the communication channels of the system are reliable, which is not always the case in real systems. In this paper, the URB abstraction is investigated in anonymous asynchronous message passing distributed systems with process crash failures and fair lossy channels. For that, we assume the availability of a random number generator that generates unique global values with very high probability. Firstly, a simple URB algorithm is given assuming a majority of correct processes. Then, we prove the impossibility of solving URB without a majority of correct processes if no failure detector is used. Subsequently, a new failure detector class AΘ is proposed, which can be used to implement URB with any number of correct processes. However, the two previous URB algorithms are non-quiescent because every correct process, to offset the loss of messages caused by fair lossy channels, has to broadcast all URB_delivered messages forever. Hence, a perfect anonymous failure detector AP * is proposed, together with AΘ, to make the URB algorithm quiescent. Finally, we discuss an alternative failure detector AΘ P * , which combines the properties of AΘ and AP * .

Resumen Como es bien conocido, los sistemas distribuidos tienen como característica inherente los... more Resumen Como es bien conocido, los sistemas distribuidos tienen como característica inherente los fallos parciales debidos a caídas de nodos o fallos en las comunicaciones. Eso hace necesario que las aplicaciones distribuidas tengan que ser tolerantes a fallos. En la actualidad, la mayor parte de los servicios de tolerancia a fallos se implementan a medida para cada aplicación distribuida con un grado de reutilización nulo a casi nulo. Para paliar esta falta de reutilización estamos trabajando en la definición de una arquitectura de componentes distribuidos para tolerancia a fallos. En dicha arquitectura se ha definido un framework que proporciona el servicio de detección de fallos. Este framework lo hemos caracterizado como un componente reutilizable, que puede ser parametrizado para ofertar la semántica de un detector de fallos para sistemas distribuidos síncronos o parcialmente síncronos. En este trabajo presentamos el diseño y algunos aspectos de implementación de dicho componente.

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing

In this paper we study the implementability of different classes of failure detectors in several ... more In this paper we study the implementability of different classes of failure detectors in several models of partial synchrony. We show that no failure detector with perpetual accuracy (namely, È, É, Ë, and Ï) can be implemented in any of the models of partial synchrony proposed in [3] and [5] in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of class ¢.

International Journal of Parallel, Emergent and Distributed Systems, 2013

The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism tha... more The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, like Consensus. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve Consensus, namely the Eventually Strong class (3S), are presented. While the first algorithm is wait free, the second is f-resilient, where f is a known upper bound on the number of faulty processes. Both algorithms guarantee that, eventually, all the correct processes agree permanently on a common correct process, i.e., they also implement a failure detector * Research partially supported by the Spanish Research Council, under grants TIN2005-09198-C02-01, TIN2007-67353-C02-02, and TIN2008-06735-C02-01, and the Comunidad de Madrid, under grant S-0505/TIC/0285. † A preliminary version of this article was presented at SRDS'2000 [22]. of the class Omega (Ω). They are also shown to be optimal in terms of the number of communication links used forever. Additionally, a wait-free algorithm that implements a failure detector of the Eventually Perfect class (3P) is presented. This algorithm is shown to be optimal in terms of the number of bidirectional links used forever.

Lecture Notes in Computer Science, 1999

Unreliable failure detectors, proposed by Chandra and Toueg [2], are mechanisms that provide info... more Unreliable failure detectors, proposed by Chandra and Toueg [2], are mechanisms that provide information about process failures. In [2], eight classes of failure detectors were defined, depending on how accurate this information is, and an algorithm implementing a failure detector of one of these classes in a partially synchronous system was presented. This algorithm is based on all-to-all communication, and periodically exchanges a number of messages that is quadratic on the number of processes. To our knowledge, no other algorithm implementing these classes of unreliable failure detectors has been proposed. In this paper, we present a family of distributed algorithms that implement four classes of unreliable failure detectors in partially synchronous systems. Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.

Journal of Parallel and Distributed Computing, 2005

The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism tha... more The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve different problems in asynchronous systems, in particular the Consensus problem. In this paper, we present a new class of unreliable failure detectors, which we call Eventually Consistent and denote by ♦C. This class combines the failure detection capabilities of class ♦S with the eventual leader election capability of class. This capability allows all correct processes to eventually choose the same correct process as leader. We study the relationship between ♦C and other classes of failure detectors. We also propose an efficient algorithm to transform ♦C into ♦P in models of partial synchrony. Finally, to show the power of this new class of failure detectors, we present a Consensus algorithm based on ♦C. This algorithm successfully exploits both the leader election and the failure detection capabilities of the failure detector, and performs better in number of rounds than all the previously proposed algorithms for ♦S.

Information Processing Letters, 2011

The failure detector class Omega (Í2) provides an eventual leader election functionality, i.e., e... more The failure detector class Omega (Í2) provides an eventual leader election functionality, i.e., eventually all correct processes permanently trust the same correct process. An algorithm is communication-efficient if the number of links that carry messages forever is bounded by n, being n the number of processes in the system. It has been defined that an algorithm is crash-quiescent if it eventually stops sending messages to crashed processes. In this regard, it has been recently shown the impossibility of implementing Q crash quiescently without a majority of correct processes. We say that the membership is unknown if each process p¡ only knows its own identity and the number of processes in the system (that is, i and n), but p¡ does not know the identity of the rest of processes of the system. There is a type of link (denoted by ADD link) in which a bounded (but unknown) number of consecutive messages can be delayed or lost. In this work we present the first implementation (to our knowledge) of Q in partially synchronous systems with ADD links and with unknown membership. Furthermore, it is the first implementation of Q that combines two very interesting properties: communication-efficiency and crash-quiescence when the majority of processes are correct. Finally, we also obtain with the same algorithm a failure detector (OP) such that every correct process eventually and permanently outputs the set of all correct processes.

IEEE Transactions on Computers, 2004

Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide inform... more Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all-to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. In this paper, we study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, P, Q, S, and W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class Â proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, ÅP, ÅQ, ÅS, and ÅW). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.

Computing, 2020

Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing

International Journal of Parallel, Emergent and Distributed Systems, 2013

Lecture Notes in Computer Science, 1999

Journal of Parallel and Distributed Computing, 2005

The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism tha... more The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve different problems in asynchronous systems, in particular the Consensus problem. In this paper, we present a new class of unreliable failure detectors, which we call Eventually Consistent and denote by ♦C. This class combines the failure detection capabilities of class ♦S with the eventual leader election capability of class. This capability allows all correct processes to eventually choose the same correct process as leader. We study the relationship between ♦C and other classes of failure detectors. We also propose an efficient algorithm to transform ♦C into ♦P in models of partial synchrony. Finally, to show the power of this new class of failure detectors, we present a Consensus algorithm based on ♦C. This algorithm successfully exploits both the leader election and the failure detection capabilities of the failure detector, and performs better in number of rounds than all the previously proposed algorithms for ♦S.

Information Processing Letters, 2011

IEEE Transactions on Computers, 2004