oliver kennedy - Profile on Academia.edu (original) (raw)

Papers by oliver kennedy

Benchmarking Databases "On-The-Go

tpc technology conference, 2019

Certain answers are a principled method for coping with uncertainty that arises in many practical... more Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximat...

Performance Evaluation and Benchmarking for the Era of Cloud(s), 2020

Embedded database libraries provide developers with a common and convenient data persistence laye... more Embedded database libraries provide developers with a common and convenient data persistence layer. They are a key component of major mobile operating systems, and are used extensively on interactive devices like smartphones. Database performance affects the response times and resource consumption of millions of smartphone apps and billions of smartphone users. Given their wide use and impact, it is critical that we understand how embedded databases operate in realistic mobile settings, and how they interact with mobile environments. We argue that traditional database benchmarking methods produce misleading results when applied to mobile devices, due to evaluating performance only at saturation. To rectify this, we present PocketData, a new benchmark for mobile device database evaluation that uses typical workloads to produce representative performance results. We explain the performance measurement methodology behind PocketData, and address specific challenges. We analyze the results obtained, and show how different classes of workload interact with database performance. Notably, our study of mobile databases at non-saturated levels uncovers significant latency and energy variation in database workloads resulting from CPU frequency scaling policies called governors -variation that we show is hidden by typical benchmark measurement techniques.

We discuss a multi-objective/goal programming model for the allocation of inventory of graphical ... more We discuss a multi-objective/goal programming model for the allocation of inventory of graphical advertisements. The model considers two types of campaigns: guaranteed delivery (GD), which are sold months in advance, and non-guaranteed delivery (NGD), which are sold using real-time auctions. We investigate various advertiser and publisher objectives such as (a) revenue from the sale of impressions, clicks and conversions, (b) future revenue from the sale of NGD inventory, and (c) "fairness" of allocation. While the first two objectives are monetary, the third is not. This combination of demand types and objectives leads to potentially many variations of our model, which we delineate and evaluate. Our experimental results, which are based on optimization runs using real data sets, demonstrate the effectiveness and flexibility of the proposed model.

— Collaboration between small-scale wireless devices hinges on their ability to infer properties ... more — Collaboration between small-scale wireless devices hinges on their ability to infer properties shared across multiple nearby nodes. Wireless-enabled mobile devices in particular create a highly dynamic environment not conducive to distributed reasoning about such global properties. This paper addresses a specific instance of this problem: distributed aggregation. We present extensions to existing unstructured aggregation protocols that enable estimation of count, sum, and average aggregates in highly dynamic environments. With the modified protocols, devices with only limited connectivity can maintain estimates of the aggregate, despite unexpected peer departures and arrivals. Our analysis of these aggregate maintenance extensions demonstrates their effectiveness in unstructured environments despite high levels of node mobility. I.

ArXiv, 2016

The present state of the art in analytics requires high upfront investment of human effort and co... more The present state of the art in analytics requires high upfront investment of human effort and computational resources to curate datasets, even before the first query is posed. So-called pay-as-you-go data curation techniques allow these high costs to be spread out, first by enabling queries over uncertain and incomplete data, and then by assessing the quality of the query results. We describe the design of a system, called Mimir, around a recently introduced class of probabilistic pay-as-you-go data cleaning operators called Lenses. Mimir wraps around any deterministic database engine using JDBC, extending it with support for probabilistic query processing. Queries processed through Mimir produce uncertainty-annotated result cursors that allow client applications to quickly assess result quality and provenance. We also present a GUI that provides analysts with an interactive tool for exploring the uncertainty exposed by the system. Finally, we present optimizations that make Lenses...

Proceedings of the 1st International Conference on Operations Research and Enterprise Systems, 2012

Proceedings of the 2019 International Conference on Management of Data, 2019

We present Vizier, a multi-modal data exploration and debugging tool. The system supports a wide ... more We present Vizier, a multi-modal data exploration and debugging tool. The system supports a wide range of operations by seamlessly integrating Python, SQL, and automated data curation and debugging methods. Using Spark as an execution backend, Vizier handles large datasets in multiple formats. Ease-of-use is attained through integration of a notebook with a spreadsheet-style interface and with visualizations that guide and support the user in the loop. In addition, native support for provenance and versioning enable collaboration and uncertainty management. In this demonstration we will illustrate the diverse features of the system using several realistic data science tasks based on real data.

Proceedings of the 2019 International Conference on Management of Data, 2019

Certain answers are a principled method for coping with uncertainty that arises in many practical... more Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under-and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notion of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.

Proceedings of the 2021 International Conference on Management of Data, 2021

Incomplete and probabilistic database techniques are principled methods for coping with uncertain... more Incomplete and probabilistic database techniques are principled methods for coping with uncertainty in data. Unfortunately, the class of queries that can be answered eciently over such databases is severely limited, even when advanced approximation techniques are employed. We introduce attribute-annotated uncertain databases (AU-DBs), an uncertain data model that annotates tuples and attribute values with bounds to compactly approximate an incomplete database. AU-DBs are closed under relational algebra with aggregation using an ecient evaluation semantics. Using optimizations that trade accuracy for performance, our approach scales to complex queries and large datasets, and produces accurate results.

Small Data

2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017

Data is becoming increasingly personal. Individuals regularly interact with a wide variety of str... more Data is becoming increasingly personal. Individuals regularly interact with a wide variety of structured data, from SQLite databases on phones, to HR spreadsheets, to personal sensors, to open government data appearing in news articles. Although these workloads are important, many of the classical challenges associated with scale and Big Data do not apply. This panel brings together experts in a variety of fields to explore the new opportunities and challenges presented by "Small Data"

Proceedings of the VLDB Endowment, 2019

Proceedings of the VLDB Endowment, 2018

Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmar... more Analyzing database access logs is a key part of performance tuning, intrusion detection, benchmark development, and many other database administration tasks. Unfortunately, it is common for production databases to deal with millions or even more queries each day, so these logs must be summarized before they can be used. Designing an appropriate summary encoding requires trading off between conciseness and information content. For example: simple workload sampling may miss rare, but high impact queries. In this paper, we present LogR, a lossy log compression scheme suitable use for many automated log analytics tools, as well as for human inspection. We formalize and analyze the space/fidelity trade-off in the context of a broader family of "pattern" and "pattern mixture" log encodings to which LogR belongs. We show through a series of experiments that LogR compressed encodings can be created efficiently, come with provable information-theoretic bounds on their accuracy, and outperform state-of-art log summarization strategies.

BMJ (Clinical research ed.), Nov 22, 2017

Objectives To evaluate the existing evidence for associations between coffee consumption and mult... more Objectives To evaluate the existing evidence for associations between coffee consumption and multiple health outcomes.Design Umbrella review of the evidence across meta-analyses of observational and interventional studies of coffee consumption and any health outcome.Data sources PubMed, Embase, CINAHL, Cochrane Database of Systematic Reviews, and screening of references.Eligibility criteria for selecting studies Meta-analyses of both observational and interventional studies that examined the associations between coffee consumption and any health outcome in any adult population in all countries and all settings. Studies of genetic polymorphisms for coffee metabolism were excluded.Results The umbrella review identified 201 meta-analyses of observational research with 67 unique health outcomes and 17 meta-analyses of interventional research with nine unique outcomes. Coffee consumption was more often associated with benefit than harm for a range of health outcomes across exposures incl...

Security modifications to legacy network protocols are expensive and disruptive. This paper outli... more Security modifications to legacy network protocols are expensive and disruptive. This paper outlines an approach, based on external security monitors, for securing legacy protocols by deploying additional hosts that locally monitor the inputs and outputs of each host executing the protocol, check the behavior of the host against a safety specification, and communicate using an overlay to alert other hosts about invalid behavior and to initiate remedial actions. Trusted computing hardware provides the basis for trust in external security monitors. This paper applies this approach to secure the Border Gateway Protocol, yielding an external security monitor called N-BGP. N-BGP can accurately monitor a BGP router using commodity trusted computing hardware. Deploying N-BGP at a random 10% of BGP routers is sufficient to guarantee the security of 80% of Internet routes where both endpoints are monitored by N-BGP. Overall, external security monitors secure the routing infrastructure using trusted computing hardware and construct a security plane for BGP without having to modify the large base of installed routers and servers.

Lecture Notes in Computer Science, 2016

Embedded database engines such as SQLite provide a convenient data persistence layer and have spr... more Embedded database engines such as SQLite provide a convenient data persistence layer and have spread along with the applications using them to many types of systems, including interactive devices such as smartphones. Android, the most widely-distributed smartphone platform, both uses SQLite internally and provides interfaces encouraging apps to use SQLite to store their own private structured data. As similar functionality appears in all major mobile operating systems, embedded database performance affects the response times and resource consumption of billions of smartphones and the millions of apps that run on them-making it more important than ever to characterize smartphone embedded database workloads. To do so, we present results from an experiment which recorded SQLite activity on 11 Android smartphones during one month of typical usage. Our analysis shows that Android SQLite usage produces queries and access patterns quite different from canonical server workloads. We argue that evaluating smartphone embedded databases will require a new benchmarking suite and we use our results to outline some of its characteristics.

Proceedings of the 16th International Workshop, Feb 12, 2015

One of the reasons programming mobile systems is so hard is the wide variety of environments a ty... more One of the reasons programming mobile systems is so hard is the wide variety of environments a typical app encounters at runtime. As a result, in many cases only post-deployment user testing can determine the right algorithm to use, the rate at which something should happen, or when an app should attempt to conserve energy. Programmers should not be forced to make these choices at development time. Unfortunately, languages leave no way for programmers to express and structure uncertainty about runtime conditions, forcing them to adopt ineffective or fragile ad-hoc solutions. We introduce a new approach based on structured uncertainty through a new language construct: the maybe statement. maybe statements allow programmers to defer choices about app behavior that cannot be made at development time, while providing enough structure to allow a system to later adaptively choose from multiple alternatives. Eliminating the uncertainty introduced by maybe statements can be done in a large variety of ways: through simulation, split testing, user configuration, temporal adaptation, or machine learning techniques, depending on the type of adaptation appropriate for each situation. Our paper motivates the maybe statement, presents its syntax, and describes a complete system for testing and choosing from maybe alternatives.

Promoting Interdisciplinary Learning in a Practical Environment Using the Formula Sae Competition

Interdisciplinary learning flourishes in situations where students are self motivated to draw on ... more Interdisciplinary learning flourishes in situations where students are self motivated to draw on a broad range of knowledge in order to solve a challenging Engineering problem. A key element in this picture is the need for a high degree of self motivation i.e. enthusiasm and excitement coming from the students (not the teacher) for the project. The Formula SAE project,

External security monitors (ESMs) are a new network component for securing legacy protocols witho... more External security monitors (ESMs) are a new network component for securing legacy protocols without requiring modifications to existing hardware, software, or the protocol. An ESM is an additional host that checks each message sent by a legacy host against a safety specification. ESMs use trusted hardware to assure remote principals that the safety specification is being enforced; ESMs use an overlay network to alert each other about invalid behavior and to initiate remedial actions. N-BGP is an ESM for securing the Internet's Border Gateway Protocol (BGP). When run on commodity hardware, N-BGP is fast enough to monitor a production BGP router. And deploying N-BGP at a random 10% of autonomous systems in the Internet suffices to guarantee security for 80% of Internet routes where both endpoints are monitored by N-BGP.

British Homoeopathic journal, 1980

In recent years there has been a reaction against the traditional approach of Kent and Hahnemann'... more In recent years there has been a reaction against the traditional approach of Kent and Hahnemann's Organon. These are both felt to be out of tune with modern medicine, both in theme and in their archaic presentation. The result, however, has been poor performance amongst the younger trainees in the out-patient department. In their enthusiasm they tend to introduce drugs on relatively short-term indications which spoil the overall management which should be planned from the first consultation with the patient. I would like to review the situation with you in its modern context. During the second and subsequent consultations the approach planned on first attendance should be followed if success is to be achieved. This must depend on some form of verifiable progressive data sheet which is sufficiently flexible to allow for reassessment of the patient's condition and allows for the totality of the problem to be considered. I consider that it should be dealt with under three separate headings: 1 Assessment of progress 2 Review and confirmation of the facts 3 Management of the patient

Benchmarking Databases "On-The-Go

tpc technology conference, 2019

Performance Evaluation and Benchmarking for the Era of Cloud(s), 2020

ArXiv, 2016

Proceedings of the 1st International Conference on Operations Research and Enterprise Systems, 2012

Proceedings of the 2019 International Conference on Management of Data, 2019

Certain answers are a principled method for coping with uncertainty that arises in many practical... more Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under-and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notion of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.

Proceedings of the 2021 International Conference on Management of Data, 2021

Small Data

2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017

Proceedings of the VLDB Endowment, 2019

Proceedings of the VLDB Endowment, 2018

BMJ (Clinical research ed.), Nov 22, 2017

Lecture Notes in Computer Science, 2016

Proceedings of the 16th International Workshop, Feb 12, 2015

Promoting Interdisciplinary Learning in a Practical Environment Using the Formula Sae Competition

British Homoeopathic journal, 1980