Peter M. Fischer | University of Augsburg (original) (raw)

Papers by Peter M. Fischer

Research paper thumbnail of Kontextsensitive Informationsfilter

Mobile Datenbanken und Informationssysteme, 2004

Die Kombination von kontextsensitiven Informationssystemen mit leis- tungsfähigen Informationsfil... more Die Kombination von kontextsensitiven Informationssystemen mit leis- tungsfähigen Informationsfiltern (Publish/Subscribe) verspricht qualitativ hoch- wertige individuelle Informationsversorgung bei gleichzeitig guter Skalierbarkeit. Die besondere Herausforderung dabei ist das Zusammentreffen von zahlreichen Kontextänderungen und hohen Nachrichtenraten. Das von uns entwickelte Verfah- ren verbessert bestehende Indexmethoden dahingehend, sich auf hohe und schwan- kende Updateraten automatisch anzupassen.

Research paper thumbnail of Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems, 2003

XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbe... more XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbers of query specifications containing constraints on both structure and content. It is now well accepted that approaches using event-based parsing and Finite State Machines (FSMs) can provide the basis for highly scalable structure-oriented XML filtering systems. The XFilter system [Altinel and Franklin 2000] was the first published FSM-based XML filtering approach. XFilter used a separate FSM per path query and a novel indexing mechanism to allow all of the FSMs to be executed simultaneously during the processing of a document. Building on the insights of the XFilter work, we describe a new method, called "YFilter" that combines all of the path queries into a single Nondeterministic Finite Automaton (NFA). YFilter exploits commonality among queries by merging common prefixes of the query paths such that they are processed at most once. The resulting shared processing provides tremendous improvements in structure matching performance but complicates the handling of value-based predicates.

Research paper thumbnail of Proceedings BTW 2011 -- Workshops und Studierendenprogramm

Research paper thumbnail of Editorial

Research paper thumbnail of Benchmarking Bitemporal Database Systems: Ready for the Future or Stuck in the Past?

After more than a decade of a virtual standstill, the adoption of temporal data management featur... more After more than a decade of a virtual standstill, the adoption of temporal data management features has recently picked up speed, driven by customer demand and the inclusion of temporal expressions into SQL:2011. Most of the big commercial DBMS now include support for bitemporal data and operators. In this paper, we perform a thorough analysis of these commercial temporal DBMS: We investigate their architecture, determine their performance and study the impact of performance tuning. This analysis utilizes our recent (TPCTC 2013) benchmark proposal, which includes a comprehensive temporal workload definition. The results of our analysis show that the support for temporal data is still in its infancy: All systems store their data in regular, statically partitioned tables and rely on standard indexes as well as query rewrites for their operations. As shown by our measurements, this causes considerable performance variations on slight workload variations and significant overhead even after extensive tuning.

Research paper thumbnail of A generic database benchmarking service

2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013

Benchmarks are widely applied for the development and optimization of database systems. Standard ... more Benchmarks are widely applied for the development and optimization of database systems. Standard benchmarks such as TPC-C and TPC-H provide a way of comparing the performance of different systems. In addition, micro benchmarks can be exploited to test a specific behavior of a system.

Research paper thumbnail of TPC-BiH: A Benchmark for Bitemporal Databases

Lecture Notes in Computer Science, 2014

An increasing number of applications such as risk evaluation in banking or inventory management r... more An increasing number of applications such as risk evaluation in banking or inventory management require support for temporal data. After more than a decade of standstill, the recent adoption of some bitemporal features in SQL:2011 has reinvigorated the support among commercial database vendors, who incorporate an increasing number of relevant bitemporal features. Naturally, assessing the performance and scalability of temporal data storage and operations is of great concern for potential users. The cost of keeping and querying history with novel operations (such as time travel, temporal joins or temporal aggregations) is not adequately reflected in any existing benchmark. In this paper, we present a benchmark proposal which provides comprehensive coverage of the bitemporal data management. It builds on the solid foundations of TPC-H but extends it with a rich set of queries and update scenarios. This workload stems both from real-life temporal applications from SAP's customer base and a systematic coverage of temporal operators proposed in the academic literature. We present preliminary results of our benchmark on a number of temporal database systems, also highlighting the need for certain language extensions.

Research paper thumbnail of Comprehensive and interactive temporal query processing with SAP HANA

Proceedings of the VLDB Endowment, 2013

ABSTRACT In this demo, we present a prototype of a main memory database system which provides a w... more ABSTRACT In this demo, we present a prototype of a main memory database system which provides a wide range of temporal operators featuring predictable and interactive response times. Much of real-life data is temporal in nature, and there is an increasing application demand for temporal models and operations in databases. Nevertheless, SQL:2011 has only recently overcome a decade-long standstill on standardizing temporal features. As a result, few database systems provide any temporal support, and even those only have limited expressiveness and poor performance. Our prototype combines an in-memory column store and a novel, generic temporal index structure named Timeline Index. As we will show on a workload based on real customer use cases, it achieves predictable and interactive query performance for a wide range of temporal query types and data sizes.

Research paper thumbnail of Timeline index: A unified data structure for processing queries on temporal data in SAP HANA

Managing temporal data is becoming increasingly important for many applications. Several database... more Managing temporal data is becoming increasingly important for many applications. Several database systems already support the time dimension, but provide only few temporal operators, which also often exhibit poor performance characteristics. On the academic side, a large number of algorithms and data structures have been proposed, but they often address a subset of these temporal operators only. In this paper, we develop the Timeline Index as a novel, unified data structure that efficiently supports temporal operators such as temporal aggregation, time travel, and temporal joins. As the Timeline Index is independent of the physical order of the data, it provides flexibility in physical design; e.g., it supports any kind of compression scheme, which is crucial for main memory column stores. Our experiments show that the Timeline Index has predictable performance and beats state-of-the-art approaches significantly, sometimes by orders of magnitude.

Research paper thumbnail of Ariadne

Proceedings of the 7th ACM international conference on Distributed event-based systems - DEBS '13, 2013

Managing fine-grained provenance is a critical requirement for data stream management systems (DS... more Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce run-time overhead and avoid unnecessary provenance retrieval. This includes computing a concise superset of the provenance to allow lazily replaying a query network and reconstruct its provenance as well as lazy retrieval to avoid unnecessary reconstruction of provenance. We develop stream-specific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state-of-the-art.

Research paper thumbnail of Efficient Stream Provenance via Operator Instrumentation

ACM Transactions on Internet Technology, 2014

Managing fine-grained provenance is a critical requirement for data stream management systems (DS... more Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce run-time overhead and avoid unnecessary provenance retrieval. Our proposals include computing a concise superset of the provenance (to allow lazily replaying a query and reconstruct its provenance) as well as lazy retrieval (to avoid unnecessary reconstruction of provenance). We develop streamspecific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state-of-the-art.

Research paper thumbnail of Integration of Reliable Sensor Data Stream Management into Digital Libraries

Lecture Notes in Computer Science, 2007

Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the... more Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, into stream processes. Due to the recent advantages in sensor technologies and wireless communication, the amount of information generated by DSM will increase significantly. In order to efficiently deal with this streaming information, Digital Library (DL) systems have to merge with DSM systems. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) will generate a significant amount of information stored in an e-health digital library (electronic patient record). In order to stream-enable DL systems, we present an integrated data stream management and Digital Library infrastructure in this work. A vital requirement for healthcare applications is however that this infrastructure provides a high degree of reliability. In this paper, we present novel approaches to reliable DSM within a DL infrastructure. In particular, we propose information filtering operators, a declarative query engine called MXQuery, and efficient operator checkpointing to maintain high result quality of DSM. Furthermore, we present a demonstrator implementation of the integrated DSM and DL infrastructure, called OSIRIS-SE. OSIRIS-SE supports flexible and efficient failure handling to ensures complete and consistent continuous data stream processing and execution of DL processes even in the case of multiple failures.

Research paper thumbnail of Flexible and scalable storage management for data-intensive stream processing

Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009

Data Stream Management Systems (DSMS) operate under strict performance requirements. Key to meeti... more Data Stream Management Systems (DSMS) operate under strict performance requirements. Key to meeting such requirements is to efficiently handle time-critical tasks such as managing internal states of continuous query operators, traffic on the queues between operators, as well as providing storage support for shared computation and archived data. In this paper, we introduce a general purpose storage management framework for DSMSs that performs these tasks based on a clean, loosely-coupled, and flexible system design that also facilitates performance optimization. An important contribution of the framework is that, in analogy to buffer management techniques in relational database systems, it uses information about the access patterns of streaming applications to tune and customize the performance of the storage manager. In the paper, we first analyze typical application requirements at different granularities in order to identify important tunable parameters and their corresponding values. Based on these parameters, we define a general-purpose storage management interface. Using the interface, a developer can use our SMS (Storage Manager for Streams) to generate a customized storage manager for streaming applications. We explore the performance and potential of SMS through a set of experiments using the Linear Road benchmark.

Research paper thumbnail of Extending XQuery with a pattern matching facility

Database and XML Technologies, 2010

Considering the growing usage of XML for communication and data representation, the need for more... more Considering the growing usage of XML for communication and data representation, the need for more advanced analytical capabilities on top of XQuery is emerging. In this regard, a pattern matching facility can be considered as a natural extension to empower XQuery. In this paper we first provide some use cases for XML pattern matching. After showing that current XQuery falls short in meeting basic requirements, we propose an extension to XQuery which imposes no changes into current model, while covering a wide range of ...

Research paper thumbnail of Stream schema

Proceedings of the 13th International Conference on Extending Database Technology - EDBT '10, 2010

Schemas, and more generally metadata specifying structural and semantic constraints, are invaluab... more Schemas, and more generally metadata specifying structural and semantic constraints, are invaluable in data management. They facilitate conceptual design and enable checking of data consistency. They also play an important role in permitting semantic query optimization, that is, optimization and processing strategies that are often highly effective, but only correct for data conforming to a given schema. While the use of metadata is well-established in relational and XML databases, the same is not true for data streams. The existing work mostly focuses on the specification of dynamic information. In this paper, we consider the specification of static metadata for streams in a model called Stream Schema. We show how Stream Schema can be used to validate the consistency of streams. By explicitly modeling stream constraints, we show that stream queries can be simplified by removing predicates or subqueries that check for consistency. This can greatly enhance programmability of stream processing systems. We also present a set of semantic query optimization strategies that both permit compiletime checking of queries (for example, to detect empty queries) and new runtime processing options, options that would not have been possible without a Stream Schema specification. Case studies on two stream processing platforms (covering different applications and underlying stream models), along with an experimental evaluation, show the benefits of Stream Schema.

Research paper thumbnail of YFilter: Efficient and scalable filtering of XML documents

Proceedings of the …

Recently, there has been growing interest in the filtering and routing of data based on user pref... more Recently, there has been growing interest in the filtering and routing of data based on user preferences. In an XML filtering system, continuously arriving XML documents are routed to users according to subscriptions specified as queries. XML allows the encoding of semantic ...

Research paper thumbnail of Towards Systematic Achievement of Compliance in Service-Oriented Architectures: The MASTER Approach

WIRTSCHAFTSINFORMATIK, 2008

Service-oriented architectures (SOA) provide the flexible IT support required by agile businesses... more Service-oriented architectures (SOA) provide the flexible IT support required by agile businesses. To simultaneously meet their compliance requirements, continuous assessment and adaptation of the IT controls embedded in SOA is mandatory. The paper outlines the MASTER methodology and architecture for systematic achievement of compliance in SOA. MASTER features automated support of the full control lifecycle, definition of key indicators that can be interpreted in the business context and scale to outsourcing scenarios, and a model-based and policy-driven approach that allows to capture business and technical context and to adapt metrics and controls to it.

Research paper thumbnail of Management of and Access to Virtual Electronic Health Records

DELOS Research …, 2005

11 Management of and Access to Virtual Electronic Health Records Robert Penz, Raimund Vogl Health... more 11 Management of and Access to Virtual Electronic Health Records Robert Penz, Raimund Vogl Health Information Technologies Tyrol (HITT), Innsbruck, Austria Wilhelm Hasselbring, Ulrike Steffens Kuratorium OFFIS, Oldenburg, Germany Charalampos Dimitropoulos, Yannis ...

Research paper thumbnail of The case for fine-grained stream provenance

BTW Workshops, Feb 1, 2011

Abstract: The current state of the art for provenance in data stream management systems (DSMS) is... more Abstract: The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we first demonstrate by means of well-chosen use cases that this is a misconception, ie, coarse-grained provenance is in fact insufficient ...

Research paper thumbnail of Changing flights in mid-air

Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011

Continuous queries can run for unpredictably long periods of time. During their lifetime, these q... more Continuous queries can run for unpredictably long periods of time. During their lifetime, these queries may need to be adapted either due to changes in application semantics (e.g., the implementation of a new alert detection policy), or due to changes in the system's behavior (e.g., adapting performance to a changing load). While in previous works query modification has been implicitly utilized to serve specific purposes (e.g., load management), to date no research has been done that defines a general-purpose, reliable, and efficiently implementable model for modifying continuous queries at run-time. In this paper, we introduce a punctuation-based framework that can formally express arbitrary lifecycle operations on the basis of input-output mappings and basic control elements such as start or stop of queries. On top of this foundation, we derive all possible query change methods, each providing different levels of correctness guarantees and performance. We further show how these models can be efficiently realized in a state-of-the-art stream processing engine; we also provide experimental results demonstrating the key performance tradeoffs of the change methods.

Research paper thumbnail of Kontextsensitive Informationsfilter

Mobile Datenbanken und Informationssysteme, 2004

Die Kombination von kontextsensitiven Informationssystemen mit leis- tungsfähigen Informationsfil... more Die Kombination von kontextsensitiven Informationssystemen mit leis- tungsfähigen Informationsfiltern (Publish/Subscribe) verspricht qualitativ hoch- wertige individuelle Informationsversorgung bei gleichzeitig guter Skalierbarkeit. Die besondere Herausforderung dabei ist das Zusammentreffen von zahlreichen Kontextänderungen und hohen Nachrichtenraten. Das von uns entwickelte Verfah- ren verbessert bestehende Indexmethoden dahingehend, sich auf hohe und schwan- kende Updateraten automatisch anzupassen.

Research paper thumbnail of Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems, 2003

XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbe... more XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbers of query specifications containing constraints on both structure and content. It is now well accepted that approaches using event-based parsing and Finite State Machines (FSMs) can provide the basis for highly scalable structure-oriented XML filtering systems. The XFilter system [Altinel and Franklin 2000] was the first published FSM-based XML filtering approach. XFilter used a separate FSM per path query and a novel indexing mechanism to allow all of the FSMs to be executed simultaneously during the processing of a document. Building on the insights of the XFilter work, we describe a new method, called "YFilter" that combines all of the path queries into a single Nondeterministic Finite Automaton (NFA). YFilter exploits commonality among queries by merging common prefixes of the query paths such that they are processed at most once. The resulting shared processing provides tremendous improvements in structure matching performance but complicates the handling of value-based predicates.

Research paper thumbnail of Proceedings BTW 2011 -- Workshops und Studierendenprogramm

Research paper thumbnail of Editorial

Research paper thumbnail of Benchmarking Bitemporal Database Systems: Ready for the Future or Stuck in the Past?

After more than a decade of a virtual standstill, the adoption of temporal data management featur... more After more than a decade of a virtual standstill, the adoption of temporal data management features has recently picked up speed, driven by customer demand and the inclusion of temporal expressions into SQL:2011. Most of the big commercial DBMS now include support for bitemporal data and operators. In this paper, we perform a thorough analysis of these commercial temporal DBMS: We investigate their architecture, determine their performance and study the impact of performance tuning. This analysis utilizes our recent (TPCTC 2013) benchmark proposal, which includes a comprehensive temporal workload definition. The results of our analysis show that the support for temporal data is still in its infancy: All systems store their data in regular, statically partitioned tables and rely on standard indexes as well as query rewrites for their operations. As shown by our measurements, this causes considerable performance variations on slight workload variations and significant overhead even after extensive tuning.

Research paper thumbnail of A generic database benchmarking service

2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013

Benchmarks are widely applied for the development and optimization of database systems. Standard ... more Benchmarks are widely applied for the development and optimization of database systems. Standard benchmarks such as TPC-C and TPC-H provide a way of comparing the performance of different systems. In addition, micro benchmarks can be exploited to test a specific behavior of a system.

Research paper thumbnail of TPC-BiH: A Benchmark for Bitemporal Databases

Lecture Notes in Computer Science, 2014

An increasing number of applications such as risk evaluation in banking or inventory management r... more An increasing number of applications such as risk evaluation in banking or inventory management require support for temporal data. After more than a decade of standstill, the recent adoption of some bitemporal features in SQL:2011 has reinvigorated the support among commercial database vendors, who incorporate an increasing number of relevant bitemporal features. Naturally, assessing the performance and scalability of temporal data storage and operations is of great concern for potential users. The cost of keeping and querying history with novel operations (such as time travel, temporal joins or temporal aggregations) is not adequately reflected in any existing benchmark. In this paper, we present a benchmark proposal which provides comprehensive coverage of the bitemporal data management. It builds on the solid foundations of TPC-H but extends it with a rich set of queries and update scenarios. This workload stems both from real-life temporal applications from SAP's customer base and a systematic coverage of temporal operators proposed in the academic literature. We present preliminary results of our benchmark on a number of temporal database systems, also highlighting the need for certain language extensions.

Research paper thumbnail of Comprehensive and interactive temporal query processing with SAP HANA

Proceedings of the VLDB Endowment, 2013

ABSTRACT In this demo, we present a prototype of a main memory database system which provides a w... more ABSTRACT In this demo, we present a prototype of a main memory database system which provides a wide range of temporal operators featuring predictable and interactive response times. Much of real-life data is temporal in nature, and there is an increasing application demand for temporal models and operations in databases. Nevertheless, SQL:2011 has only recently overcome a decade-long standstill on standardizing temporal features. As a result, few database systems provide any temporal support, and even those only have limited expressiveness and poor performance. Our prototype combines an in-memory column store and a novel, generic temporal index structure named Timeline Index. As we will show on a workload based on real customer use cases, it achieves predictable and interactive query performance for a wide range of temporal query types and data sizes.

Research paper thumbnail of Timeline index: A unified data structure for processing queries on temporal data in SAP HANA

Managing temporal data is becoming increasingly important for many applications. Several database... more Managing temporal data is becoming increasingly important for many applications. Several database systems already support the time dimension, but provide only few temporal operators, which also often exhibit poor performance characteristics. On the academic side, a large number of algorithms and data structures have been proposed, but they often address a subset of these temporal operators only. In this paper, we develop the Timeline Index as a novel, unified data structure that efficiently supports temporal operators such as temporal aggregation, time travel, and temporal joins. As the Timeline Index is independent of the physical order of the data, it provides flexibility in physical design; e.g., it supports any kind of compression scheme, which is crucial for main memory column stores. Our experiments show that the Timeline Index has predictable performance and beats state-of-the-art approaches significantly, sometimes by orders of magnitude.

Research paper thumbnail of Ariadne

Proceedings of the 7th ACM international conference on Distributed event-based systems - DEBS '13, 2013

Managing fine-grained provenance is a critical requirement for data stream management systems (DS... more Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce run-time overhead and avoid unnecessary provenance retrieval. This includes computing a concise superset of the provenance to allow lazily replaying a query network and reconstruct its provenance as well as lazy retrieval to avoid unnecessary reconstruction of provenance. We develop stream-specific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state-of-the-art.

Research paper thumbnail of Efficient Stream Provenance via Operator Instrumentation

ACM Transactions on Internet Technology, 2014

Managing fine-grained provenance is a critical requirement for data stream management systems (DS... more Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS), not only to address complex applications that require diagnostic capabilities and assurance, but also for providing advanced functionality such as revision processing or query debugging. This paper introduces a novel approach that uses operator instrumentation, i.e., modifying the behavior of operators, to generate and propagate fine-grained provenance through several operators of a query network. In addition to applying this technique to compute provenance eagerly during query execution, we also study how to decouple provenance computation from query processing to reduce run-time overhead and avoid unnecessary provenance retrieval. Our proposals include computing a concise superset of the provenance (to allow lazily replaying a query and reconstruct its provenance) as well as lazy retrieval (to avoid unnecessary reconstruction of provenance). We develop streamspecific compression methods to reduce the computational and storage overhead of provenance generation and retrieval. Ariadne, our provenance-aware extension of the Borealis DSMS implements these techniques. Our experiments confirm that Ariadne manages provenance with minor overhead and clearly outperforms query rewrite, the current state-of-the-art.

Research paper thumbnail of Integration of Reliable Sensor Data Stream Management into Digital Libraries

Lecture Notes in Computer Science, 2007

Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the... more Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, into stream processes. Due to the recent advantages in sensor technologies and wireless communication, the amount of information generated by DSM will increase significantly. In order to efficiently deal with this streaming information, Digital Library (DL) systems have to merge with DSM systems. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) will generate a significant amount of information stored in an e-health digital library (electronic patient record). In order to stream-enable DL systems, we present an integrated data stream management and Digital Library infrastructure in this work. A vital requirement for healthcare applications is however that this infrastructure provides a high degree of reliability. In this paper, we present novel approaches to reliable DSM within a DL infrastructure. In particular, we propose information filtering operators, a declarative query engine called MXQuery, and efficient operator checkpointing to maintain high result quality of DSM. Furthermore, we present a demonstrator implementation of the integrated DSM and DL infrastructure, called OSIRIS-SE. OSIRIS-SE supports flexible and efficient failure handling to ensures complete and consistent continuous data stream processing and execution of DL processes even in the case of multiple failures.

Research paper thumbnail of Flexible and scalable storage management for data-intensive stream processing

Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009

Data Stream Management Systems (DSMS) operate under strict performance requirements. Key to meeti... more Data Stream Management Systems (DSMS) operate under strict performance requirements. Key to meeting such requirements is to efficiently handle time-critical tasks such as managing internal states of continuous query operators, traffic on the queues between operators, as well as providing storage support for shared computation and archived data. In this paper, we introduce a general purpose storage management framework for DSMSs that performs these tasks based on a clean, loosely-coupled, and flexible system design that also facilitates performance optimization. An important contribution of the framework is that, in analogy to buffer management techniques in relational database systems, it uses information about the access patterns of streaming applications to tune and customize the performance of the storage manager. In the paper, we first analyze typical application requirements at different granularities in order to identify important tunable parameters and their corresponding values. Based on these parameters, we define a general-purpose storage management interface. Using the interface, a developer can use our SMS (Storage Manager for Streams) to generate a customized storage manager for streaming applications. We explore the performance and potential of SMS through a set of experiments using the Linear Road benchmark.

Research paper thumbnail of Extending XQuery with a pattern matching facility

Database and XML Technologies, 2010

Considering the growing usage of XML for communication and data representation, the need for more... more Considering the growing usage of XML for communication and data representation, the need for more advanced analytical capabilities on top of XQuery is emerging. In this regard, a pattern matching facility can be considered as a natural extension to empower XQuery. In this paper we first provide some use cases for XML pattern matching. After showing that current XQuery falls short in meeting basic requirements, we propose an extension to XQuery which imposes no changes into current model, while covering a wide range of ...

Research paper thumbnail of Stream schema

Proceedings of the 13th International Conference on Extending Database Technology - EDBT '10, 2010

Schemas, and more generally metadata specifying structural and semantic constraints, are invaluab... more Schemas, and more generally metadata specifying structural and semantic constraints, are invaluable in data management. They facilitate conceptual design and enable checking of data consistency. They also play an important role in permitting semantic query optimization, that is, optimization and processing strategies that are often highly effective, but only correct for data conforming to a given schema. While the use of metadata is well-established in relational and XML databases, the same is not true for data streams. The existing work mostly focuses on the specification of dynamic information. In this paper, we consider the specification of static metadata for streams in a model called Stream Schema. We show how Stream Schema can be used to validate the consistency of streams. By explicitly modeling stream constraints, we show that stream queries can be simplified by removing predicates or subqueries that check for consistency. This can greatly enhance programmability of stream processing systems. We also present a set of semantic query optimization strategies that both permit compiletime checking of queries (for example, to detect empty queries) and new runtime processing options, options that would not have been possible without a Stream Schema specification. Case studies on two stream processing platforms (covering different applications and underlying stream models), along with an experimental evaluation, show the benefits of Stream Schema.

Research paper thumbnail of YFilter: Efficient and scalable filtering of XML documents

Proceedings of the …

Recently, there has been growing interest in the filtering and routing of data based on user pref... more Recently, there has been growing interest in the filtering and routing of data based on user preferences. In an XML filtering system, continuously arriving XML documents are routed to users according to subscriptions specified as queries. XML allows the encoding of semantic ...

Research paper thumbnail of Towards Systematic Achievement of Compliance in Service-Oriented Architectures: The MASTER Approach

WIRTSCHAFTSINFORMATIK, 2008

Service-oriented architectures (SOA) provide the flexible IT support required by agile businesses... more Service-oriented architectures (SOA) provide the flexible IT support required by agile businesses. To simultaneously meet their compliance requirements, continuous assessment and adaptation of the IT controls embedded in SOA is mandatory. The paper outlines the MASTER methodology and architecture for systematic achievement of compliance in SOA. MASTER features automated support of the full control lifecycle, definition of key indicators that can be interpreted in the business context and scale to outsourcing scenarios, and a model-based and policy-driven approach that allows to capture business and technical context and to adapt metrics and controls to it.

Research paper thumbnail of Management of and Access to Virtual Electronic Health Records

DELOS Research …, 2005

11 Management of and Access to Virtual Electronic Health Records Robert Penz, Raimund Vogl Health... more 11 Management of and Access to Virtual Electronic Health Records Robert Penz, Raimund Vogl Health Information Technologies Tyrol (HITT), Innsbruck, Austria Wilhelm Hasselbring, Ulrike Steffens Kuratorium OFFIS, Oldenburg, Germany Charalampos Dimitropoulos, Yannis ...

Research paper thumbnail of The case for fine-grained stream provenance

BTW Workshops, Feb 1, 2011

Abstract: The current state of the art for provenance in data stream management systems (DSMS) is... more Abstract: The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we first demonstrate by means of well-chosen use cases that this is a misconception, ie, coarse-grained provenance is in fact insufficient ...

Research paper thumbnail of Changing flights in mid-air

Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011

Continuous queries can run for unpredictably long periods of time. During their lifetime, these q... more Continuous queries can run for unpredictably long periods of time. During their lifetime, these queries may need to be adapted either due to changes in application semantics (e.g., the implementation of a new alert detection policy), or due to changes in the system's behavior (e.g., adapting performance to a changing load). While in previous works query modification has been implicitly utilized to serve specific purposes (e.g., load management), to date no research has been done that defines a general-purpose, reliable, and efficiently implementable model for modifying continuous queries at run-time. In this paper, we introduce a punctuation-based framework that can formally express arbitrary lifecycle operations on the basis of input-output mappings and basic control elements such as start or stop of queries. On top of this foundation, we derive all possible query change methods, each providing different levels of correctness guarantees and performance. We further show how these models can be efficiently realized in a state-of-the-art stream processing engine; we also provide experimental results demonstrating the key performance tradeoffs of the change methods.