D-cape: A self-tuning continuous query plan distribution architecture (original) (raw)

The study of systems for querying data streams, coined Data Stream Management Systems (DSMS), has gained in popularity over the last several years. This new area of research for the database community includes studies in areas such as Sensor Networks, Network Intrusion, and monitoring data such as Medicine, Stock, or Weather feeds. With this new popularity comes increased performance expectations, with increased data sizes and speed and larger more complex query plans as well as high volumes of possibly small queries. Due to the finite resources on a single query processor, future Data Stream Management Systems must distribute their workload to multiple query processors, working together in a synchronized manner. This thesis discusses a new Distributed Continuous Query System (D-CAPE) developed here at WPI that has the ability to distribute query plans over a large cluster of machines. We describe the architecture of the new system, policies for query plan distribution to improve overall performance, as well as techniques for self-tuning query plan redistribution. D-CAPE is designed to be as flexible as possible for future research. We include a multi-tiered architecture that scales to a large number of query processors. D-CAPE has also been designed to minimize the cost of the communications network by bundling synchronization messages, thus minimizing packets sent between query processors. These messages are also incremental at run-time to aid in minimizing the communication cost of D-CAPE. The architecture allows for the flexible incorporation of different distribution algorithms and operator reallocation policies.. D-CAPE provides an operator reallocation algorithm that is able to seamlessly move an operator(s) across any query processors in our computing cluster. We do so by creating "pipes" between query processors to allow the data streams to flow, and then filling these pipes with data streams once First and foremost, I would like to thank Professor Elke A. Rundensteiner for all of her guidance over the last two years. She has committed herself to helping our research group strive for excellence. I would also like to thank my reader Professor Murali Mani, who gave critical feedback during my thesis discussions. I would also like to thank our CAPE team: Luping Ding, Yali Zhu, Hong Su, Malav Shah, Nishant Mehta, and of course Brad Pielech, who helped develop other CAPE components and worked closely to research new ways the CAPE system can improve performance for continuous query systems. In my time at WPI I have had many excellent course instructors whom I would also like to thank. Professors Heineman, Ciaraldi, Wills, and Claypool, thank you all for helping me throughout my career at WPI. Your dedication and commitment to me has inspired me to work hard, question science, and help others to do the same.