oliver kennedy - Academia.edu (original) (raw)
Papers by oliver kennedy
tpc technology conference, 2019
Certain answers are a principled method for coping with uncertainty that arises in many practical... more Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximat...
Performance Evaluation and Benchmarking for the Era of Cloud(s), 2020
— Collaboration between small-scale wireless devices hinges on their ability to infer properties ... more — Collaboration between small-scale wireless devices hinges on their ability to infer properties shared across multiple nearby nodes. Wireless-enabled mobile devices in particular create a highly dynamic environment not conducive to distributed reasoning about such global properties. This paper addresses a specific instance of this problem: distributed aggregation. We present extensions to existing unstructured aggregation protocols that enable estimation of count, sum, and average aggregates in highly dynamic environments. With the modified protocols, devices with only limited connectivity can maintain estimates of the aggregate, despite unexpected peer departures and arrivals. Our analysis of these aggregate maintenance extensions demonstrates their effectiveness in unstructured environments despite high levels of node mobility. I.
ArXiv, 2016
The present state of the art in analytics requires high upfront investment of human effort and co... more The present state of the art in analytics requires high upfront investment of human effort and computational resources to curate datasets, even before the first query is posed. So-called pay-as-you-go data curation techniques allow these high costs to be spread out, first by enabling queries over uncertain and incomplete data, and then by assessing the quality of the query results. We describe the design of a system, called Mimir, around a recently introduced class of probabilistic pay-as-you-go data cleaning operators called Lenses. Mimir wraps around any deterministic database engine using JDBC, extending it with support for probabilistic query processing. Queries processed through Mimir produce uncertainty-annotated result cursors that allow client applications to quickly assess result quality and provenance. We also present a GUI that provides analysts with an interactive tool for exploring the uncertainty exposed by the system. Finally, we present optimizations that make Lenses...
Proceedings of the 1st International Conference on Operations Research and Enterprise Systems, 2012
Proceedings of the 2019 International Conference on Management of Data, 2019
Proceedings of the 2019 International Conference on Management of Data, 2019
Proceedings of the 2021 International Conference on Management of Data, 2021
2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
Data is becoming increasingly personal. Individuals regularly interact with a wide variety of str... more Data is becoming increasingly personal. Individuals regularly interact with a wide variety of structured data, from SQLite databases on phones, to HR spreadsheets, to personal sensors, to open government data appearing in news articles. Although these workloads are important, many of the classical challenges associated with scale and Big Data do not apply. This panel brings together experts in a variety of fields to explore the new opportunities and challenges presented by "Small Data"
Proceedings of the VLDB Endowment, 2019
Proceedings of the VLDB Endowment, 2018
Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmar... more Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable use for many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of "pattern" and "pattern mixture" log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.
BMJ (Clinical research ed.), Nov 22, 2017
Objectives To evaluate the existing evidence for associations between coffee consumption and mult... more Objectives To evaluate the existing evidence for associations between coffee consumption and multiple health outcomes.Design Umbrella review of the evidence across meta-analyses of observational and interventional studies of coffee consumption and any health outcome.Data sources PubMed, Embase, CINAHL, Cochrane Database of Systematic Reviews, and screening of references.Eligibility criteria for selecting studies Meta-analyses of both observational and interventional studies that examined the associations between coffee consumption and any health outcome in any adult population in all countries and all settings. Studies of genetic polymorphisms for coffee metabolism were excluded.Results The umbrella review identified 201 meta-analyses of observational research with 67 unique health outcomes and 17 meta-analyses of interventional research with nine unique outcomes. Coffee consumption was more often associated with benefit than harm for a range of health outcomes across exposures incl...
Security modifications to legacy network protocols are expensive and disruptive. This paper outli... more Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol, check the behavior of the host against a safety specification, and communicate using an overlay to alert other hosts about invalid behavior and to initiate remedial actions. Trusted computing hardware provides the basis for trust in external security monitors. This paper applies this approach to secure the Border Gateway Protocol, yielding an external security monitor called N-BGP. N-BGP can accurately monitor a BGP router using commodity trusted computing hardware. Deploying N-BGP at a random 10% of BGP routers is sufficient to guarantee the security of 80% of Internet routes where both endpoints are monitored by N-BGP. Overall, external security monitors secure the routing infrastructure using trusted computing hardware and construct a security plane for BGP without having to modify the large base of installed routers and servers.
Lecture Notes in Computer Science, 2016
Proceedings of the 16th International Workshop, Feb 12, 2015
Interdisciplinary learning flourishes in situations where students are self motivated to draw on ... more Interdisciplinary learning flourishes in situations where students are self motivated to draw on a broad range of knowledge in order to solve a challenging Engineering problem. A key element in this picture is the need for a high degree of self motivation i.e. enthusiasm and excitement coming from the students (not the teacher) for the project. The Formula SAE project,
Proceedings of the VLDB Endowment, 2015
tpc technology conference, 2019
Certain answers are a principled method for coping with uncertainty that arises in many practical... more Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximat...
Performance Evaluation and Benchmarking for the Era of Cloud(s), 2020
— Collaboration between small-scale wireless devices hinges on their ability to infer properties ... more — Collaboration between small-scale wireless devices hinges on their ability to infer properties shared across multiple nearby nodes. Wireless-enabled mobile devices in particular create a highly dynamic environment not conducive to distributed reasoning about such global properties. This paper addresses a specific instance of this problem: distributed aggregation. We present extensions to existing unstructured aggregation protocols that enable estimation of count, sum, and average aggregates in highly dynamic environments. With the modified protocols, devices with only limited connectivity can maintain estimates of the aggregate, despite unexpected peer departures and arrivals. Our analysis of these aggregate maintenance extensions demonstrates their effectiveness in unstructured environments despite high levels of node mobility. I.
ArXiv, 2016
The present state of the art in analytics requires high upfront investment of human effort and co... more The present state of the art in analytics requires high upfront investment of human effort and computational resources to curate datasets, even before the first query is posed. So-called pay-as-you-go data curation techniques allow these high costs to be spread out, first by enabling queries over uncertain and incomplete data, and then by assessing the quality of the query results. We describe the design of a system, called Mimir, around a recently introduced class of probabilistic pay-as-you-go data cleaning operators called Lenses. Mimir wraps around any deterministic database engine using JDBC, extending it with support for probabilistic query processing. Queries processed through Mimir produce uncertainty-annotated result cursors that allow client applications to quickly assess result quality and provenance. We also present a GUI that provides analysts with an interactive tool for exploring the uncertainty exposed by the system. Finally, we present optimizations that make Lenses...
Proceedings of the 1st International Conference on Operations Research and Enterprise Systems, 2012
Proceedings of the 2019 International Conference on Management of Data, 2019
Proceedings of the 2019 International Conference on Management of Data, 2019
Proceedings of the 2021 International Conference on Management of Data, 2021
2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
Data is becoming increasingly personal. Individuals regularly interact with a wide variety of str... more Data is becoming increasingly personal. Individuals regularly interact with a wide variety of structured data, from SQLite databases on phones, to HR spreadsheets, to personal sensors, to open government data appearing in news articles. Although these workloads are important, many of the classical challenges associated with scale and Big Data do not apply. This panel brings together experts in a variety of fields to explore the new opportunities and challenges presented by "Small Data"
Proceedings of the VLDB Endowment, 2019
Proceedings of the VLDB Endowment, 2018
Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmar... more Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable use for many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of "pattern" and "pattern mixture" log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.
BMJ (Clinical research ed.), Nov 22, 2017
Objectives To evaluate the existing evidence for associations between coffee consumption and mult... more Objectives To evaluate the existing evidence for associations between coffee consumption and multiple health outcomes.Design Umbrella review of the evidence across meta-analyses of observational and interventional studies of coffee consumption and any health outcome.Data sources PubMed, Embase, CINAHL, Cochrane Database of Systematic Reviews, and screening of references.Eligibility criteria for selecting studies Meta-analyses of both observational and interventional studies that examined the associations between coffee consumption and any health outcome in any adult population in all countries and all settings. Studies of genetic polymorphisms for coffee metabolism were excluded.Results The umbrella review identified 201 meta-analyses of observational research with 67 unique health outcomes and 17 meta-analyses of interventional research with nine unique outcomes. Coffee consumption was more often associated with benefit than harm for a range of health outcomes across exposures incl...
Security modifications to legacy network protocols are expensive and disruptive. This paper outli... more Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol, check the behavior of the host against a safety specification, and communicate using an overlay to alert other hosts about invalid behavior and to initiate remedial actions. Trusted computing hardware provides the basis for trust in external security monitors. This paper applies this approach to secure the Border Gateway Protocol, yielding an external security monitor called N-BGP. N-BGP can accurately monitor a BGP router using commodity trusted computing hardware. Deploying N-BGP at a random 10% of BGP routers is sufficient to guarantee the security of 80% of Internet routes where both endpoints are monitored by N-BGP. Overall, external security monitors secure the routing infrastructure using trusted computing hardware and construct a security plane for BGP without having to modify the large base of installed routers and servers.
Lecture Notes in Computer Science, 2016
Proceedings of the 16th International Workshop, Feb 12, 2015
Interdisciplinary learning flourishes in situations where students are self motivated to draw on ... more Interdisciplinary learning flourishes in situations where students are self motivated to draw on a broad range of knowledge in order to solve a challenging Engineering problem. A key element in this picture is the need for a high degree of self motivation i.e. enthusiasm and excitement coming from the students (not the teacher) for the project. The Formula SAE project,
Proceedings of the VLDB Endowment, 2015