On the Impact of Fault-Tolerance Mechanisms in a Peer-to-Peer Middleware (original) (raw)

Lightweight Fault-Tolerance for Peer-to-Peer Middleware

2010

Abstract We address the problem of providing transparent, lightweight, fault-tolerance mechanisms for generic peerto-peer middleware systems. The main idea is to use the peer-to-peer overlay to provide for fault-tolerance rather than support it higher up in the middleware architecture, eg in the form of services. To evaluate our approach we have implemented a fault-tolerant middleware prototype that uses a hierarchical peer-to-peer overlay in which the leaf peers connect to sensors that provide data streams.

A platform for creating efficient, robust, and resilient peer-to-peer systems

The rapid growth of communication environments such as the Internet has spurred the development of a wide range of systems and applications based on peer-to-peer ideologies. As these applications continue to evolve, there is an increasing effort towards improving their overall performance. This effort has led to the incorporation of measurement-based adaptivity mechanisms and network awareness into peer-to-peer applications, which can greatly increase peer-to-peer performance and dependability. Unfortunately, these mechanisms are often vulnerable to attack, making the entire solution less suitable for real-world deployment. In this dissertation, we study how to create robust systems components for adaptivity, network awareness, and responding to identified threats. These components can form the basis for creating efficient, high-performance, and resilient peer-to-peer systems.

A peer-to-peer middleware platform for fault-tolerant, QoS, real-time computing

2008

Abstract In this paper we present the architecture of RTP M, a middleware framework aimed at supporting the development and management of information systems for high-speed public transportation systems. The framework is based on a peer-to-peer overlay infrastructure with the main focus being on providing a scalable, resilient, reconfigurable, highly available platform for real-time and QoS computing.

Peer-To-Peer Middleware

Mahmoud/Middleware for Communications, 2005

Peer-to-Peer networking has a great potential to make a vast amount of resources accessible [19]. Several years ago, file sharing applications like Napster [15] and Gnutella [8] impressively demonstrated the possibilities for the first time. Because of their success, Peerto-Peer mistakenly became synonymous for file sharing. However, the fundamental Peer-to-Peer concept is general and not limited to a specific application type. Thus, a broader field of applications can benefit from using Peer-to-Peer technology. Content delivery [11], media streaming [22], games [14], and collaboration tools [9] are examples of applications fields that use Peer-to-Peer networks today. Although Peer-to-Peer networking is still an emerging area, some Peer-to-Peer concepts are already applied successfully in different contexts. Good examples are Internet routers, which deliver IP packages along paths that are considered efficient. Theses routers form a decentralized, hierarchical network. They consider each others as peers, which collaborate in the routing process and in updating each other. Unlike centralized networks, they can compensate node failures and remain functional as a network. Another example of decentralized systems with analogies to Peer-to-Peer networks is the Usenet [31]. Considering those examples, many Peer-to-Peer concepts are nothing new. However, Peerto-Peer takes these concepts from the network to the application layer, where software defines purpose and algorithms of virtual (non-physical) Peer-to-Peer networks. Widely used Web-based services such as Google, Yahoo, Amazon, and eBay can handle a large number of users while maintaining a good degree of failure tolerance. These centralized systems offer a higher level of control, are easier to develop, and perform more predictable than decentralized systems. Thus, a pure Peer-to-Peer system would an inappropriate choice for applications demanding a certain degree of control, for example "who may access what". Although Peer-to-Peer systems cannot replace centralized system, there are areas where they can complement them. For example, Peer-to-Peer systems encourage direct collaboration of users. If a centralized system in between is not required, this approach can be more efficient because the communication path is shorter. In addition, a Peer-to-Peer (sub-) system does not require additional server logic and is more resistant to server failures. For similar reasons, because they take workload and traffic off from servers to peers, Peer-to-Peer could reduce the required infrastructure of centralized systems. In this way, Peer-to-Peer networks could cut acquisition and running costs for server hardware. This becomes increasingly relevant when one considers the growing number of end users with powerful computers connected by high bandwidth links.

Engineering Realities of Building a Working Peer-to-Peer System

2004

The Herald project at Microsoft Research has built working implementations of several scalable peer-to-peer algorithms as part of our work on a scalable, fault-tolerant event notification system. Our goal has been to construct and validate implementations that will work on real networks at scale – not just to simulate such systems and reason about what might be buildable – but

Performance and Dependability of Structured Peer-to-Peer Overlays

2004

Structured peer-to-peer (p2p) overlay networks provide a useful substrate for building distributed applications. They map object keys to overlay nodes and offer a primitive to send a message to the node responsible for a key. They can implement, for example, distributed hash tables and multicast trees. However, there are concerns about the performance and dependability of these overlays in realistic environments. Several studies have shown that current p2p environments have high churn rates: nodes join and leave the overlay continuously. This paper presents techniques that continuously detect faults and repair the overlay to achieve high dependability and good performance in realistic environments. The techniques are evaluated using large-scale network simulation experiments with fault injection guided by real traces of node arrivals and departures. The results show that previous concerns are unfounded; our techniques can achieve dependable routing in realistic environments with an average delay stretch below two and a maintenance overhead of less than half a message per second per node.

Node selection for a fault-tolerant streaming service on a peer-to-peer network

2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698), 2003

Peer-to-Peer (P2P) networks are attracting considerable research interest because of their scalability and high performance relative to cost. One of the important services on a P2P network is the streaming service. However, because each node in the P2P network is autonomous, it is difficult to provide a stable streaming service on the network. Therefore, for a stable streaming service on the P2P network, a fault-tolerant scheme must be provided. In this paper, we propose two new node selection schemes, Playback Node First (PNF) and Playback Node First with Prefetching (PNF-P), that can be used for a service migration-based fault-tolerant streaming service. The proposed schemes exploit the fact that the failure probability of a node currently being served is lower than that of a node not being served. Simulation results show that the proposed schemes outperform traditional node selection schemes.

A Repair Mechanism for Fault-Tolerance for Tree-Structured Peer-to-Peer Systems

Facing the limits of traditional tools of resource management within computational grids (related to scale, dynamicity, etc. of the platforms newly considered), new approaches, based on peer-to-peer technologies are emerging. The resource discovery and in particular the service discovery is concerned by this evolution. Among the solutions, a promising one is the indexing of resources using trie structures and more particularly prefix trees. The major advantages of trie-structured approaches is the capability to support search queries on ranges of values with a latency growing logarithmically in the number of nodes in the trie. Those techniques are easy to extend to multicriteria searches. One drawback of using tries is its inherent poor robustness in a dynamic environment, where nodes join and leave the network, leading to the split of the tree into a forest, which results in the impossibility to route requests. Within most recent approaches, the fault-tolerance is a prevention mechanism, often replication-based. The replication can be costly in term of resources required. In this paper, we propose a fault-tolerance protocol that reconnects subtrees a posteriori, after crashes, to have again a connected graph and then reorder the nodes to rebuild a consistent tree.

Dagstream: Locality aware and failure resilient peer-to-peer streaming

2006

abstract Live peer to peer (P2P) media streaming faces many challenges such as peer unreliability and bandwidth heterogeneity. To effectively address these challenges, general" mesh" based P2P streaming architectures have recently been adopted. Mesh-based systems allow peers to aggregate bandwidth from multiple neighbors, and dynamically adapt to changing network conditions and neighbor failures.

DKS (N, k, f): A Family of Low Communication, Scalable and Fault-Tolerant Infrastructures for P2P Applications

2003

In this paper, we present DKS(N, k, f ), a family of infrastructures for building Peer-To-Peer applications. Each instance of DKS(N, k, f ) is a fully decentralized overlay network characterized by three parameters: N the maximum number of nodes that can be in the network; k the search arity within the network and f the degree of fault-tolerance. Once these parameters are instantiated, the resulting network has several desirable properties. The first property, which is the main contribution of this paper, is that there is no separate procedure for maintaining routing tables; instead, any out-of-date or erroneous routing entry is eventually corrected onthe-fly thereby, eliminating unnecessary bandwidth consumption. The second property is that each lookup request is resolved in at most log k (N ) overlay hops under normal operations. Third, each node maintains only (k − 1) log k (N ) + 1 addresses of other nodes for routing purposes. Fourth, new nodes can join and existing nodes can leave at will with a negligible disturbance to the ability to resolve lookups in log k (N ) hops in average. Fifth, the probability of getting a lookup failure for a pair key/value that was inserted in the system is negligible. Sixth, even if f consecutive nodes fail simultaneously, correct lookup is still guaranteed.