Building a Reliable Message Delivery System Using the CORBA Event Service (original) (raw)

Fault Tolerant Reliable Delivery of Events in Distributed Middleware Systems

2005

Abstract Reliable delivery of events (or messages, where the terms messages and events are used interchangeably) is an important problem that needs to be addressed in distributed systems. Increasingly, interactions between entities within a distributed system are encapsulated in events. In this paper we present our strategy to enable reliable delivery of events in the presence of link and node failures.

Quality of service in event-based systems

2010

Future software systems must be responsive to events and must be able to adapt the software to enhance business processes. Examples are production and logistics processes that must be rescheduled based on relevant traffic information. It is therefore essential that relevant events are detected and transported to the software components responsible for dynamic changes. The trust in such reactive systems depends to a large extent on the Quality of Service (QoS) provided by the underlying event system. This paper introduces QoS aspects in event-based systems and discusses ways of identifying and evaluating QoS needs. This is done by identifying the different system layers, and the quality requirements at each layer based on the layer's functionality.

Resilient and Preserving Dissemination of Events in a Large-Scale Event Notification Service System

Event Notification service system is a data dissemination technology which asynchronously notifies consumers whose interests match with the events published by producers. Fault-tolerance is important for a largescale event notification service system as link or node failures usually occur in a wide-area network. In this paper, we first describe architecture of event notification service system for large scale network to minimize the size of routing entities and to reduce latency of notification delivery to consumers. We then present a replication algorithm based on primary-back replication so that event notification service system is resilient to failures of event servers and links between them and maintains dissemination of events. The replication technique used in proposed architecture can minimize the portion of the system affected by these failures.

Managing Fault Tolerance Transparently Using CORBA Services

Lecture Notes in Computer Science, 1999

Fault tolerance problems arise in large-scale distributed systems because application components may eventually fail due to hardware problems, operator mistakes or design faults. Fault tolerance mechanisms must be employed to reduce the susceptibility of a given system to failure. In this paper, we describe the design of an architecture to overcome potential application component failures, using CORBA, a distributed object middleware specified by the OMG. Of primary importance to this architecture is OMG's CORBA Object Trading Service as the mechanism to advertise and manage service offers for fault tolerant application components. This mechanism enables clients transparently to detect a failed connection to a service object, to discover a similar backup service object and to reconnect to it. This improves overall system stability and enables scalability.

DREAM: Distributed Reliable Event-Based Application Management

Web Dynamics, 2004

New applications and the convergence of technologies, ranging from sensor networks to ubiquitous computing and from autonomic systems to eventdriven supply chain management, require new middleware platforms that support proactive event notification. We present a system overview and discuss the principles of Dream, a reactive middleware platform that integrates event detection and composition mechanisms in a highly distributed environment; fault-tolerant and scalable event notification that exploits a variety of filter placement strategies; content-based notification to formulate powerful filters and concept-based notification to extend content-based filtering to heterogeneous environments; middleware-mediated transactions that integrate notifications and transactions; and scopes, which are administration primitives for both deployment-and runtime configurability, as well as for the management of policies. We discuss four prototypes that were implemented as proof of concept systems and present lessons learned from them.

Reliable communication in the presence of failures

ACM Transactions on Computer Systems, 1985

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local-and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.

Designing and Optimizing a Scalable CORBA Notification Service

ACM SIGPLAN Notices, 2001

Many distributed applications require a scalable event-driven communication model that decouples suppliers from consumers and simultaneously supports advanced quality of service (QoS) properties and event filtering mechanisms. The CORBA Notification Service provides a publish/subscribe mechanism that is designed to support scalable event-driven communication by routing events efficiently between many suppliers and consumers, enforcing various QoS properties (such as reliability, priority, ordering, and timeliness), and filtering events at multiple points in a distributed system.This paper provides several contributions to research on scalable notification services. First, we present the CORBA Notification Service architecture and illustrate how it addresses limitations with the earlier CORBA Event Service. Second, we explain how we addressed key design challenges faced when implementing the Notification Service in TAO, which is our high-performance, real-time ORB. We discuss the opt...

Reliability in three-tier systems without application server coordination and persistent message queues

2005

When dealing with fault tolerance in three-tier systems, two major problems need to be addressed, that is how to prevent duplicate transaction executions when classical timeout based retransmission logics are employed, and how to ensure the agreement among the back-end databases despite failures (a transaction needs to be aborted or committed at all the involved databases independently of the failure scenario). In this paper we address these problems by proposing a fault tolerant protocol that, unlike previous solutions, (i) avoids the additional phase of storing the client request into a persistent message queue and (ii) avoids explicit coordination of middle tier application servers (during both normal behavior and fail-over). Our protocol reduces therefore the overhead imposed on the end-to-end interaction, thus improving user perceived responsiveness, and provides better scalability.

Fault-tolerant reliable delivery of messages in distributed publish/subscribe systems

2007

Abstract Reliable delivery of messages is an important problem that needs to be addressed in distributed systems. In this paper we briefly describe our basic strategy to enable reliable delivery of messages in the presence of link and node failures. This is facilitated by a specialized repository node. We then present our strategy to make this scheme even more failure resilient, by incorporating support for repository redundancy. Each repository functions autonomously.