Medusa: Scalable Distributed Stream Processing (original) (raw)

Medusa is part of the SLAM project.

Overview

There is a large class of emerging applications in which data, generated in a distributed environment, is pushed asynchronously to servers for processing. Some example applications for which this ``push'' model for data processing is appropriate include financial services (e.g., price feeds), asset-tracking services (e.g., reporting the status of objects and equipment in real-time), fabrication line management (e.g., real-time monitoring and control of manufacturing systems), network management (e.g., intrusion detection), medical applications (e.g., monitoring devices and sensors attached to patients), environmental sensor/actuator systems (e.g., climate, traffic, building, bridge monitoring), and military applications (e.g., missile or target detection).

Several research projects currently focus on building novel stream-processing engines that are better suited to support this new class of applications than classic data-management systems. Some of these projects are Aurora, STREAM, TelegraphCQ. and Cougar, Early efforts in stream-oriented processing have focused on designing new operators and new languages, as well as building high-performance engines operating at a single site. More recently, the attention has shifted toward extending these engines to distributed environments. The latter is the focus of Medusa.

Medusa is a distributed stream-processing system built using Aurora as the single-site processing engine. Medusa takes Aurora queries and distributes them across multiple nodes. These nodes can all be under the control of one entity or can be organized as a loosely coupled federation under the control of different autonomous participants.

A distributed stream-processing system such as Medusa offers several benefits:

It allows stream processing to be incrementally scaled over multiple nodes.
It enables high-availability because the processing nodes can monitor and take over for each other when failures occur.
It allows the composition of stream feeds from different participants to produce end-to-end services, and to take advantage from the distribution inherent in many stream processing applications (e.g., climate monitoring, financial analysis, etc.).
It allows participants to cope with load spikes without individually having to maintain and administer the computing, network, and storage resources required for peak operation. When organized as a loosely coupled federated system, load movements between participants based on pre-defined contracts can significantly improve performance. In Medusa we thus focus on distributed stream processing. We investigate in particular load management and high availability issues. We also take into consideration participant autonomy focusing on schemes that apply to loosely coupled federated environments. To promote positive interactions in such environments, Medusa relies on economic principles to regulate participant collaborations and solve the hard problems concerning load management and sharing.

Stream Processing

Figure 1: Example of distributed query.

In stream-processing applications, data streams produced by sensors or other data sources are composed and aggregated by_operators_ to produce some output of interest. A data stream is a continuous sequence of attribute-value tuples that all conform to some pre-defined schema (sequence of typed attributes). Operators are functions that transform one or more input streams into one or more output streams. A loop-free, directed graph of operators is called a_query network_ and all queries are continuous, because they continuously processes tuples pushed on their input streams.

Figure 1 shows an example of Medusa/Aurora query using a subset of the Aurora operators. The query takes a stream of ``car sightings'' as input, and produces streams of ``toll notifications'' and ``tow truck dispatch''. The query first applies two windowed aggregate_operators to compute the average speed (a) and traffic volume (b) on each segment of road, every minute. These values are then used to compute tolls on these segments (c). Toll values are in turn_joined (d) with car locations to produce toll notifications to these cars. Only cars whose speed is greater than zero (e and f) are billed. The query also filters (g) cars identified as tow trucks and joins (h) them, on the location field, with cars that have broken down.

Medusa System Architecture

Figure 2: Medusa node software architecture.

Figure 2 shows the software structure of a Medusa node. There are two components in addition to the Aurora query processor. The Lookup component is a client of an inter-node distributed catalog that holds information on streams, schemas, and queries running in the system. The Brain handles definitions of new schemas or streams and handles query setup operations. Brain components at different nodes communicate with each other to re-allocate queries and improve load distribution. To do so, each Brain monitors local load using information about the queues (IOQueues) feeding Aurora. It also uses statistics on individual box load provided by Aurora. The Brain uses this information to take autonomous and selfish load balancing decisions that converge to good overall load distribution. Brain also handles failure recovery. When a node detects a failure, it informs a pre-assigned secondary, which takes over all queries and tuple forwarding that were previously under the responsibility of the failed node.

To move operators with a relatively low effort and overhead compared to full-blown process migration, Medusa participants use remote definitions. A remote definition maps an operator defined at a node on to an operator defined at another. At runtime, when a path of operators in the boxes-and-arrows diagram needs to be moved to another node, all that's required is for the corresponding operators to be instantiated remotely and for the incoming streams to be diverted to the appropriately named inputs on the new node.

Load Management

Medusa employs an agoric system model to create incentives for autonomous participants to handle each others load. Clients outside the system pay Medusa participants for processing their queries and Medusa participants pay each other to handle load. Payments and load movements are based on pairwise contracts negotiated offline between participants. These contracts set tightly bounded prices for migrating each unit of load and specify the set of tasks that each participant is willing to execute on behalf of its partner. Our mechanism, called the bounded-price mechanism, gives participants tight control over their choice of partners, the acceptable range of unit-prices for load, and the set of tasks that can be shed or accepted. It also achieves a low runtime overhead by bounding prices throu gh offline negotiations.

High Availability

In collaboration with members of the Aurora team, we are exploring the runtime overhead and recovery time tradeoffs between different approaches to achieve high-availability (HA) in distributed stream processing. These approaches range from classical Tandem-style process-pairs to using upstream nodes in the processing flow as backup for their downstream neighbors. Different approaches also provide different recovery semantics where either some tuples are lost, some tuples are re-processed, or operations take-over precisely where the failure happened. We discuss these algorithms in more detail in the technical report below. An important HA goal for the future is handling network partitions in addition to individual node failures.

A more detailed overview of Medusa is available here: Medusa overview. Even more details are in the papers and technical reports below. The figure below illustrates a distributed Medusa system.

Figure 3: Example of federated Medusa deployment.

Papers and Technical Reports

Fault-Tolerance in the Borealis Distributed Stream Processing System
Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker
To appear in SIGMOD 2005.
High-Availability Algorithms for Distributed Stream Processing
Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin,
Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik
The 21st International Conference on Data Engineering. ICDE 2005
Contract-Based Load Management in Federated Distributed Systems
Magdalena Balazinska, Hari Balakrishnan, and Mike Stonebraker
1st Symposium on Networked Systems Design and Implementation (NSDI)
San Francisco, CA, March 2004.
A Comparison of Stream-Oriented High-Availability Algorithms
Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin,
Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik
Technical Report CS-03-17. Brown University, September 2003.
The Aurora and Medusa Projects
Stan Zdonik, Michael Stonebraker, Mitch Cherniack, Ugur Cetintemel,
Magdalena Balazinska, and Hari Balakrishnan
Bulletin of the Technical Committe on Data Engineering
IEEE Computer Society. March 2003. p.3-10. (Invited Paper).
Scalable Distributed Stream Processing
Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska,
Don Carney, Ugur Cetintemel, Ying Xing, and Stan Zdonik
CIDR 2003 - First Biennial Conference on Innovative Data Systems Research,
Asilomar, California, January 2003.

Theses

Richard Tibbetts

**Linear Road: Benchmarking Stream-Based Data Management Systems
M. Eng. Thesis, Massachusetts Institute of Technology, August 2003.
[Postscript (1.3 MB)] [.ps.gz (563 KB)] [PDF (631 KB)](61 pages)

Hiroyoshi Iwashima

**Differential Bandwidth Allocation with Multiplexed TCP Connections
M. Eng. Thesis, Massachusetts Institute of Technology, August 2003.
[Postscript (1 MB)] [.ps.gz (250 KB)] [PDF (345 KB)](66 pages)

People

Faculty:

Graduate Students:

Daniel Abadi
Magdalena Balazinska

Alumni/ae:

Hiroyoshi Iwashima
Jon Salz
Kwok Lee Tang
Richard Tibbetts

Collaborations

The Medusa group collaborates with the Aurora project, which is a collaboration between Brown University, Brandeis University and MIT.

NMS Home	Projects	People	Papers	Software

M. I. T. Computer Science and Artificial Intelligence Laboratory · 32 Vassar Street · Cambridge, MA 02139 · USA

Last modified: Fri Oct 17 16:58:34 EDT 2003