From Static Distributed Systems to Dynamic Systems (original) (raw)

On the Study of Dynamic and Adaptive Dependable Distributed Systems

International Conference on Software and Data Technologies, 2009

Due to the usage of MANETs and some kinds of collaborative applications (P2P), current distributed systems are becoming increasingly dynamic; i.e., it is difficult to manage membership information and to forecast the accessibility of each system node. Moreover, dependable applications for static distributed systems also need to provide good adaptability levels (to different request arrival rates, usage patterns, classes of requests,...) and good scalability; a case to study is the cloud computing paradigm. Development of dependable applications in dynamic and adaptive systems is not trivial, since both dynamism and adaptability may compromise algorithm liveness or may complicate the design of such algorithms, specially those best suited for static systems. Strategies for building adaptable and scalable dependable services (based on "cloud systems") will be surveyed and improved. Moreover, an efficient support for dependable applications in dynamic systems will be provided, combining three different approaches: relaxed consistency models, interconnection protocols (for supporting both consistency and multicasting) and reconciliation strategies. Last but not least, also the usage and support for integrity constraints in replicated systems will be analyzed and improved for dynamic systems.

19th International Conference on Principles of Distributed Systems, OPODIS 2015, December 14-17, 2015, Rennes, France

2016

Communication and agreement are fundamental abstractions in any distributed system. (If the computing entities do not need to communicate or agree in one way or another, the system is not a distributed system!) This tutorial was devoted to the design of such abstractions built on top of signature-free asynchronous distributed systems prone to Byzantine process failures. It is made up of three parts, each devoted to an abstraction and algorithms that implement it. 1998 ACM Subject Classification C.2.4 [Computer-Communication Network] Distributed Systems – distributed applications, network operating systems, D.4.5 [Operating Systems] Reliability – fault-tolerance, F.1.1 [Computation by Abstract Devices] Models of Computation, Computability theory

The Communication Mechanism in a Distributed System

International Journal for Electronic Crime Investigation

In this research, problems are discussed dynamically distributed systems that relate to the sharing of data and communication from one system to another over the network. A distributed system communicates with its related systems by sending and receiving messages over the internet and in this way, it fulfills its work. When we discuss dynamic distributed systems, it means that it includes many different changeable types of networks, different operating systems like android, mac, windows, different software processors portability, breaking down of WAN, and inter-process communication errors. Another problem that accrues in distributed systems is latency. So, it is very difficult to develop software for these types of environments. Proposed work is related to make message communication in distributed systems easy, reliable, and efficient. For the sharing of data, coherence is responsible. Every problem can be solved but that proper appropriate methods and algorithms are required. We c...

Maintaining group connectivity in dynamic asynchronous distributed systems

In the context of asynchronous distributed systems with infinitely many processes, this paper studies the problem of maintaining connectivity among a set of processes forming a group in a dynamic context where (i) processes can arrive to and depart from the group and (ii) processes have a partial knowledge of other processes belonging to the group. In this setting we give the specification of a new problem, namely the Dynamic Group Connectivity (DGC), we provide a few impossibility results and give a deterministic protocol solving the problem. We give, in such a dynamic context, (i) the specification of a service of reliable broadcast showing that it is equivalent to DGC and (ii) the specification of a service of atomic broadcast and a solution based on the protocol presented to solve DGC.

The dynamic tree protocol: avoiding'graceful degradation'in the tree protocol for distributed mutual exclusion

1992

This paper presents a modification of the tree protocol for distributed mutual exclusion. The existing tree protocol is very efficient as long as all nodes in the system are accessible, but suffers from performance degradation during the time when some nodes are down or partitioned. Although failures may be infrequent, once a node is down, it may remain down for a relatively long time. Also, planned downtime takes on the order of hours. Thus, the above property of the existing tree protocol is undesirable. In our modified protocol, a node requesting mutual exclusion may experience performance degradation at most once after some failures occur but then it adapts to a new system topology and its performance returns to normal. In the case when all nodes in the system are accessible, our protocol exhibits the same performance as the existing tree protocol.

Reliable communication in the presence of failures

ACM Transactions on Computer Systems, 1985

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local-and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.

Broadcast protocols for distributed systems

IEEE Transactions on Parallel and Distributed Systems, 1990

We present an innovative approach to the design of faultprocessors agree on exactly the same sequence of broadcast tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The messages. approach is based on broadcast communication over a local area It is easy to demonstrate that placing a total order on network, such as an Ethernet or a token ring, and on two novel protocols, broadcast messages, so that every working processor procthe Tram protocol, which provides efficient reliable broadcast communi-esses the same messages in the same order, provides an cation, and the Total protocol, which with high probability promptly immediate solution to the agreement problem. Once this total places a total order on messages and achieves distributed agreement even in the presence of fail-stoo. omission. timing, and communication faults. order is determined, distributed actions can be carried out Reliable distributed operations such as locking, update and commitment, using simple sequential fault-tolerant algorithms. The strategy typically require only a single broadcast message rather than the several is very efficient: for example, locking records in a distributed tens of messages required by current algorithms. database typically requires only a single broadcast message to claim a lock and a single broadcast message to release it.

4-dimensional Model for Describing Status of Peers in Peer-to-Peer Distributed Systems

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2013

One of the important aspects of decision making and management in distributed systems is collecting accurate information about the available resources of the peers. The previously proposed approaches for collecting such information completely depend on the system's architecture. In the server-oriented architecture, servers assume the main role of collecting comprehensive information from the peers and the system. Next, based on the information about the features of the basic activities and the system, an exact description of the peers' status is produced. Accurate decisions are then made using this description. However, the amount of information gathered in this architecture is too large, and it requires massive processing. On the other hand, updating the information takes time, causing delays and undermining the validity of the information. In addition, due to the limitations imposed by the servers, such architecture is not scalable and dynamic enough. The peer-to-peer architecture was introduced to address these concerns. However, due to a lack of complete knowledge of the peers and the system, the decisions are made without a precise description of the peers' status and are only based on the hardware data collected from the peers. Such an abstract and general image of the peers is not adequate for the purpose of decision making. In this paper, a 4-dimensional model is presented for the purpose of information collection and the exact description of the peer's status, including the features of the peer, the basic activity, the time, and the specifications of the system. The proposed model is for a server-oriented architecture, but it also adapts to the peer-to-peer serverless architecture. Based on this model, a new approach is introduced for information collection and an exact description of the peers' status in a peer-to-peer system based on the Latin square concept. We evaluate the model in the server-oriented and serverless situations. The workload is considered as the basic activity in our evaluation. Our evaluation demonstrates that in a server-oriented situation, increasing the size of the system has a direct relation with time. However, a serverless situation does not follow this behavior.

Building global and scalable systems with atomic multicast

Proceedings of the 15th International Middleware Conference on - Middleware '14, 2014

The rise of worldwide Internet-scale services demands large distributed systems. Indeed, when handling several millions of users, it is common to operate thousands of servers spread across the globe. Here, replication plays a central role, as it contributes to improve the user experience by hiding failures and by providing acceptable latency. In this thesis, we claim that atomic multicast, with strong and well-defined properties, is the appropriate abstraction to efficiently design and implement globally scalable distributed systems. Internet-scale services rely on data partitioning and replication to provide scalable performance and high availability. Moreover, to reduce user-perceived response times and tolerate disasters (i.e., the failure of a whole datacenter), services are increasingly becoming geographically distributed. Data partitioning and replication, combined with local and geographical distribution, introduce daunting challenges, including the need to carefully order requests among replicas and partitions. One way to tackle this problem is to use group communication primitives that encapsulate order requirements. While replication is a common technique used to design such reliable distributed systems, to cope with the requirements of modern cloud based "alwayson" applications, replication protocols must additionally allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. We propose a dynamic atomic multicast protocol which fulfills these requirements. It allows to dynamically add and remove resources to an online replicated state machine and to recover crashed processes. Major efforts have been spent in recent years to improve the performance, scalability and reliability of distributed systems. In order to hide the complexity of designing distributed applications, many proposals provide efficient highlevel communication abstractions. Since the implementation of a productionready system based on this abstraction is still a major task, we further propose to expose our protocol to developers in the form of distributed data structures. B-trees for example, are commonly used in different kinds of applications, including database indexes or file systems. Providing a distributed, fault-tolerant vii xi xii Contents Contents xi List of Figures xvii List of Tables xxi