Liuba Shrira - Profile on Academia.edu (original) (raw)
Papers by Liuba Shrira
The Vldb Journal, Jan 11, 2022
Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workload... more Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workloads for main-memory transactional databases. Contention causes OCC's performance to degrade, however, and recent concurrency control designs, such as hybrid OCC/locking systems and variations on multiversion concurrency control (MVCC), have claimed to outperform the best OCC systems. We evaluate several concurrency control designs under varying contention and varying workloads, including TPC-C, and find that implementation choices unrelated to concurrency control may explain much of OCC's previously-reported degradation. When these implementation choices are made sensibly, OCC performance does not collapse on high-contention TPC-C. We also present two optimization techniques, commit-time updates and timestamp splitting, that can dramatically improve the highcontention performance of both OCC and MVCC. Though these techniques are known, we apply them in a new context and highlight their potency: when combined, they lead to performance gains of 3.4× for MVCC and 3.6× for OCC in a TPC-C workload.
To provide high availability for services such as mail or bulletin boards, data must be replicate... more To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. In this paper, we propose lazy replication as a way to preserve consistency by exploiting the semantics of the service's operations to relax the constraints on ordering. Three kinds of operations are supported: operations for which the clients define the required order dynamically during the execution. operations for which the service defines the order, and operations that must be globally ordered with respect to both client ordered and service ordered operations. The method performs well in terms of response time, amount of stored state, number of messages, and availability. It is especially well suited to applications in which most operations require only the client-defined order.
Persistent object stores require a way to automatically upgrade persistent objects, to change the... more Persistent object stores require a way to automatically upgrade persistent objects, to change their code and storage representation. Automatic upgrades are a challenge for such systems. Upgrades must be performed in a way that is efficient both in space and time, and that does not stop application access to the store. In addition, however, the approach must be modular: it must allow programmers to reason locally about the correctness of their upgrades similar to the way they would reason about regular code. This paper provides solutions to both problems. The paper first defines upgrade modularity conditions that any upgrade system must satisfy to support local reasoning about upgrades. The paper then describes a new approach for executing upgrades efficiently while satisfying the upgrade modularity conditions. The approach exploits object encapsulation properties in a novel way. The paper also describes a prototype implementation and shows that our upgrade system imposes only a small overhead on application performance.
Retro: a methodology for retrospection everywhere
Changes in data over time are increasingly important in business intelligence and auditing applic... more Changes in data over time are increasingly important in business intelligence and auditing applications. Having access to past states enables users to detect trends and anomalies, verify assumptions based on previous calculations, recover from input errors, test the efficacy of decisions, and make predictions about the future. Retro is a new methodology and portable design for systematically adding the ability to save and access past states to an existing transactional data store. We developed efficient algorithms for in-memory and on-disk indexing of past states, including Skippy, an index for past state metadata. We designed novel protocol extensions to leverage existing data store multi-version concurrency control and recovery. We prototyped Retro in Berkeley DB. A key performance metric is non-disruptiveness; whether saving snapshots with Retro interferes with database performance. Retro can non-disruptively save snapshots with high frequency (after every transaction) with minimal impact to update throughput, with an average increase to checkpoint length of about 15%. Retro provides retrospection, a novel and simple interface that allows any read-only query to execute "as of" a snapshot. Existing query languages and programming APIs can be used with retrospection, simplifying the use of Retro for query programmers. In our experiments using selected TPC-H queries and custom SQL queries, running an I/O-bound query with retrospection is anywhere from 2.5x to 19x slower compared to running the same query without Retro. This slow down is due to declustered I/O from copy-on-write (COW) snapshots, a cost comparable to other state-of-the-art approaches that keep the current state intact and copy snapshots out using COW. New storage technologies such as flash memory offer an opportunity to avoid the main source of slow-down (cost of seeks due to declustering), while still taking advantage of the benefits of Retro. This thesis presents the design, implementation, and evaluation of a solution to the problem integrating a snapshot system into an existing transactional data store without disrupting performance. A data store with Retro offers snapshots that stay "out of the way" when they aren't needed, and retrospection at any time on any snapshot using any query code.
Timebox: a high performance archive for split snapshots
Timebox is a novel high-performance snapshot system for object storage systems. The goal is to pr... more Timebox is a novel high-performance snapshot system for object storage systems. The goal is to provide a snapshot service that is efficient enough to permit "back-in-time execution" where read-only programs run against application-specified snapshots. Back-in-time execution allows to answer questions about what happened in the past by performing analysis that is often impossible to perform against rapidly evolving current state because of interference or because the questions are posed in retrospect. A key innovation in Timebox is that, unlike earlier systems, Timebox supports update-in-place storage systems rather than no-overwriting systems and provides snapshots that are transactionally consistent, yet non-disruptive. It uses novel in-memory data structures to ensure that frequent snapshots do not block applications from accessing the storage system, and do not cause unnecessary disk operations. Timebox takes a novel approach to dealing with snapshot meta-data using a new technique that supports both incremental meta-data creation and efficient meta-data reconstruction. A basic problem of storing the captured past states over long time scales is how to manage snapshot storage so that important snapshots can be kept for required time, even when archive space becomes limited. Timebox addresses this problem by providing a new capability---selective snapshot reclamation. The application specifies which snapshots are important. When archive space becomes limited, the system reclaims unimportant snapshots retaining the important ones, efficiently. Timebox uses novel snapshot storage management techniques that take advantage of flexible snapshot representation to avoid archive space fragmentation and to maintain efficient access to retained snapshots. We have implemented a Timebox prototype and analyzed its performance. Experimental results show that providing snapshots for back-in-time activities has low impact on system performance even when snapshots are frequent. Experimental results also show that for expected update workloads, Timebox reclaims selected unwanted snapshots with minimal impact on the storage system.
International Workshop on Distributed Algorithms, Jul 8, 1987
In this paper we offer a formal, rigorous proof of the correctness of Awerbuch's algorithm for ne... more In this paper we offer a formal, rigorous proof of the correctness of Awerbuch's algorithm for network synchronization. We specify both the algorithm and the correctness condition using the I/O automaton model, which has previously been used to describe and verify algorithms for concurrency control and resource allocation. We show that the model is also a powerful tool for reasoning about distributed graph algorithms. Our proof of correctness follows closely the intuitive arguments made by the designer of the algorithm by exploiting the model's natural support for such important design techniques as stepwise refinement and modularity. In particular, since the algorithm uses simpler algorithms for synchronization within and between 'clusters' of nodes, our proof can import as lemmas the correctness of these simpler algorithms. 1 Overview 1.1 Verification methods and models As computer science has matured as a discipline, its activity has broadened from writing programs to include reasoning about those programs: proving their correctness and efficiency, and proving bounds on the performance of any program that accomplishes the same task. Recently distributed computing has begun to broaden in this way (albeit a decade or two later than the part of computer science concerned with sequential, uniproeessor algorithms). There are several reasons why particular care is necessary to prove the correctness of algorithms when the algorithms
Lecture Notes in Computer Science, 1986
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.
Persistent object stores require a way to automatically upgrade persistent objects, to change the... more Persistent object stores require a way to automatically upgrade persistent objects, to change their code and storage representation. Automatic upgrades are a challenge for such systems. Upgrades must be performed in a way that is efficient both in space and time, and that does not stop application access to the store. In addition, however, the approach must be modular: it must allow programmers to reason locally about the correctness of their upgrades similar to the way they would reason about regular code. This paper provides solutions to both problems. The paper first defines upgrade modularity conditions that any upgrade system must satisfy to support local reasoning about upgrades. The paper then describes a new approach for executing upgrades efficiently while satisfying the upgrade modularity conditions. The approach exploits object encapsulation properties in a novel way. The paper also describes a prototype implementation and shows that our upgrade system imposes only a small overhead on application performance.
The Vldb Journal, Aug 20, 2021
Modern distributed data management systems face a new challenge: how can autonomous, mutually dis... more Modern distributed data management systems face a new challenge: how can autonomous, mutually distrusting parties cooperate safely and effectively? Addressing this challenge brings up familiar questions from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize concurrent access to data. Nevertheless, each of these issues requires rethinking when participants are autonomous and potentially adversarial. We propose the notion of a cross-chain deal, a new way to structure complex distributed computations that manage assets in an adversarial setting. Deals are inspired by classical atomic transactions, but are necessarily different, in important ways, to accommodate the decentralized and untrusting nature of the exchange. We describe novel safety and liveness properties, along with two alternative protocols for implementing cross-chain deals in a system of independent blockchain ledgers. One protocol, based on synchronous communication, is fully decentralized, while the other, based on semi-synchronous communication, requires a globally shared ledger. We also prove that some degree of centralization is required in the semi-synchronous communication model. 1 Introduction The emerging domain of electronic commerce spanning multiple blockchains is a kind of fun-house mirror of classical distributed computing: familiar features are recognizable, but distorted. For example, atomic transactions are often described in terms of the well-known ACID properties [29]: atomicity, consistency, isolation, and durability. We will see that cross-chain commerce requires structures superficially similar to, but fundamentally different from, atomic transactions. In particular, the notions of correctness for atomic transactions must be rethought. Here we propose the notion of a cross-chain deal, a new computational abstraction for structuring interactions Supported by NSF grant 1917990.
Proceedings of the VLDB Endowment, 2020
Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workload... more Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workloads for main-memory transactional databases. Contention causes OCC's performance to degrade, however, and recent concurrency control designs, such as hybrid OCC/locking systems and variations on multiversion concurrency control (MVCC), have claimed to outperform the best OCC systems. We evaluate several concurrency control designs under varying contention and varying workloads, including TPC-C, and find that implementation choices unrelated to concurrency control may explain much of OCC's previously-reported degradation. When these implementation choices are made sensibly, OCC performance does not collapse on high-contention TPC-C. We also present two optimization techniques, commit-time updates and timestamp splitting, that can dramatically improve the highcontention performance of both OCC and MVCC. Though these techniques are known, we apply them in a new context and highlight their potency: when combined, they lead to performance gains of 3.4× for MVCC and 3.6× for OCC in a TPC-C workload.
Proceedings of the 4th Annual International Conference on Systems and Storage
International Conference on Systems, May 30, 2011
Journal of WSCG, 2017
The concept of a visualization pipeline is central to many applications providing scientific visu... more The concept of a visualization pipeline is central to many applications providing scientific visualization. In practical usage scenarios, when the pipelines fuse multiple datasets and combine various visualization methods they can easily evolve into complex visualization networks directing data flow. Creating and managing complex visualization networks, especially when data itself is time-dependent and requires time-dependent adjustment of multiple visualization parameters, is a tedious manual task with potential for improvement. Here we discuss the benefits of using Berkeley Database (BDB) snapshots to make it easier to create and manage visualization networks for timedependent data. The idea is to represent visualization network states as BDB snapshots accessed via the widely used Hierarchical Data Format (HDF5), and exploit the snapshot indexing system to flexibly navigate through the high-dimensional space of visualization parameters. This enables us to support useful visualization system features, such as dynamic interpolation of visualization parameters between time points and flexible adjustments of camera parameters per time point. The former allows fast continuous navigation of the parameter space to increase animation frame rate and the latter supports multi-viewpoint renderings when generating Virtual Reality panorama movies. The paper describes how the snapshot approach and the new features can be conveniently integrated into modern visualization systems, such as the Visualization Shell (Vish), and presents an evaluation study indicating that the performance penalty of this convenience compared to maintaining visualization networks in HDF5 files is negligible.
Applications need to analyze the past state of their data to provide auditing and other forms of ... more Applications need to analyze the past state of their data to provide auditing and other forms of fact checking. Retrospective snapshot systems that support computations over data store snapshots, allow applications using simple data stores like Berkeley DB or SQLite, to provide past state analysis in a convenient way. Current snapshot systems however, offer no satisfactory support for computations that analyze multiple snapshots. We have developed a Retrospective Query Language (RQL), a simple declarative extension to SQL that allows to specify and run multi-snapshot computations conveniently in a snapshot system, using a small number of simple mechanisms defined in terms of relational constructs familiar to programmers. We describe RQL mechanisms, explain how they translate into SQL computations in a snapshot system, and show how to express a number of common analysis patterns with illustrative examples. We also describe how we implemented RQL in a simple way utilizing SQLite UDF f...
Despite the increasing importance of protecting confidential data, building secure software remai... more Despite the increasing importance of protecting confidential data, building secure software remains as challenging as ever. This paper describes Aeolus, a new platform for building secure distributed applications. Aeolus uses information flow control to provide confidentiality and data integrity. It differs from previous information flow control systems in a way that we believe makes it easier to understand and use. Aeolus uses a new, simpler security model, the first to combine a standard principal-based scheme for authority management with thread-granularity information flow tracking. The principal hierarchy matches the way developers already reason about authority and access control, and the coarse-grained information flow tracking eases the task of defining a program's security restrictions. In addition, Aeolus provides a number of new mechanisms (authority closures, compound tags, boxes, and shared volatile state) that support common design patterns in secure application design.
To provide high availability for services such as mail or bulletin boards, data must be replicate... more To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. In this paper, we propose lazy replication as a way to preserve consistency by exploiting the semantics of the service's operations to relax the constraints on ordering. Three kinds of operations are supported: operations for which the clients define the required order dynamically during the execution. operations for which the service defines the order, and operations that must be globally ordered with respect to both client ordered and service ordered operations. The method performs well in terms of response time, amount of stored state, number of messages, and availability. It is especially well suited to applications in which most operations require only the client-defined order.
Mobilebuddy: consistent client-to-client exchange in disconnected collaborative groups
The thesis presents MobileBuddy, a novel data caching system that improves availability and perfo... more The thesis presents MobileBuddy, a novel data caching system that improves availability and performance for collaborative applications while maintaining data consistency in a disconnected environment or over a Wide Area Network. MobileBuddy supports mobile exchange—a novel capability for disconnected client-to-client data and reservation transfer. Mobile exchange makes mobile computing more useful because it allows users to accomplish collaborative work that would be impossible in other systems. Collaborators disconnected from servers can acquire missing or more recent data, and can obtain reservations that guarantee independent transaction validation. The thesis describes the new techniques we developed to implement mobile exchange. The technique for data and reservation transfer combines support for coarse-grained data transfer with fine-grained validation, in a way that avoids the problem of false sharing. The technique for supporting type-specific reservations is more flexible a...
RID: Deduplicating Snapshot Computations
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020
One can audit SQL applications by running SQL programs over sequences of persistent snapshots, bu... more One can audit SQL applications by running SQL programs over sequences of persistent snapshots, but care is needed to avoid wasteful duplicate computation. This paper describes the design, implementation, and performance of RID, the first language-independent optimization framework that eliminates duplicate computations in SQL programs running over low-level snapshots by exploiting snapshot metadata efficiently.
Proceedings of the VLDB Endowment, 2019
Modern distributed data management systems face a new challenge: how can autonomous, mutually-dis... more Modern distributed data management systems face a new challenge: how can autonomous, mutually-distrusting parties cooperate safely and effectively? Addressing this challenge brings up questions familiar from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize concurrent access to data. Nevertheless, each of these issues requires rethinking when participants are autonomous and potentially adversarial. We propose the notion of a cross-chain deal , a new way to structure complex distributed computations that manage assets in an adversarial setting. Deals are inspired by classical atomic transactions, but are necessarily different, in important ways, to accommodate the decentralized and untrusting nature of the exchange. We describe novel safety and liveness properties, along with two alternative protocols for implementing cross-chain deals in a system of independent blockchain ledgers. One protoc...
The Vldb Journal, Jan 11, 2022
Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workload... more Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workloads for main-memory transactional databases. Contention causes OCC's performance to degrade, however, and recent concurrency control designs, such as hybrid OCC/locking systems and variations on multiversion concurrency control (MVCC), have claimed to outperform the best OCC systems. We evaluate several concurrency control designs under varying contention and varying workloads, including TPC-C, and find that implementation choices unrelated to concurrency control may explain much of OCC's previously-reported degradation. When these implementation choices are made sensibly, OCC performance does not collapse on high-contention TPC-C. We also present two optimization techniques, commit-time updates and timestamp splitting, that can dramatically improve the highcontention performance of both OCC and MVCC. Though these techniques are known, we apply them in a new context and highlight their potency: when combined, they lead to performance gains of 3.4× for MVCC and 3.6× for OCC in a TPC-C workload.
To provide high availability for services such as mail or bulletin boards, data must be replicate... more To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. In this paper, we propose lazy replication as a way to preserve consistency by exploiting the semantics of the service's operations to relax the constraints on ordering. Three kinds of operations are supported: operations for which the clients define the required order dynamically during the execution. operations for which the service defines the order, and operations that must be globally ordered with respect to both client ordered and service ordered operations. The method performs well in terms of response time, amount of stored state, number of messages, and availability. It is especially well suited to applications in which most operations require only the client-defined order.
Persistent object stores require a way to automatically upgrade persistent objects, to change the... more Persistent object stores require a way to automatically upgrade persistent objects, to change their code and storage representation. Automatic upgrades are a challenge for such systems. Upgrades must be performed in a way that is efficient both in space and time, and that does not stop application access to the store. In addition, however, the approach must be modular: it must allow programmers to reason locally about the correctness of their upgrades similar to the way they would reason about regular code. This paper provides solutions to both problems. The paper first defines upgrade modularity conditions that any upgrade system must satisfy to support local reasoning about upgrades. The paper then describes a new approach for executing upgrades efficiently while satisfying the upgrade modularity conditions. The approach exploits object encapsulation properties in a novel way. The paper also describes a prototype implementation and shows that our upgrade system imposes only a small overhead on application performance.
Retro: a methodology for retrospection everywhere
Changes in data over time are increasingly important in business intelligence and auditing applic... more Changes in data over time are increasingly important in business intelligence and auditing applications. Having access to past states enables users to detect trends and anomalies, verify assumptions based on previous calculations, recover from input errors, test the efficacy of decisions, and make predictions about the future. Retro is a new methodology and portable design for systematically adding the ability to save and access past states to an existing transactional data store. We developed efficient algorithms for in-memory and on-disk indexing of past states, including Skippy, an index for past state metadata. We designed novel protocol extensions to leverage existing data store multi-version concurrency control and recovery. We prototyped Retro in Berkeley DB. A key performance metric is non-disruptiveness; whether saving snapshots with Retro interferes with database performance. Retro can non-disruptively save snapshots with high frequency (after every transaction) with minimal impact to update throughput, with an average increase to checkpoint length of about 15%. Retro provides retrospection, a novel and simple interface that allows any read-only query to execute "as of" a snapshot. Existing query languages and programming APIs can be used with retrospection, simplifying the use of Retro for query programmers. In our experiments using selected TPC-H queries and custom SQL queries, running an I/O-bound query with retrospection is anywhere from 2.5x to 19x slower compared to running the same query without Retro. This slow down is due to declustered I/O from copy-on-write (COW) snapshots, a cost comparable to other state-of-the-art approaches that keep the current state intact and copy snapshots out using COW. New storage technologies such as flash memory offer an opportunity to avoid the main source of slow-down (cost of seeks due to declustering), while still taking advantage of the benefits of Retro. This thesis presents the design, implementation, and evaluation of a solution to the problem integrating a snapshot system into an existing transactional data store without disrupting performance. A data store with Retro offers snapshots that stay "out of the way" when they aren't needed, and retrospection at any time on any snapshot using any query code.
Timebox: a high performance archive for split snapshots
Timebox is a novel high-performance snapshot system for object storage systems. The goal is to pr... more Timebox is a novel high-performance snapshot system for object storage systems. The goal is to provide a snapshot service that is efficient enough to permit "back-in-time execution" where read-only programs run against application-specified snapshots. Back-in-time execution allows to answer questions about what happened in the past by performing analysis that is often impossible to perform against rapidly evolving current state because of interference or because the questions are posed in retrospect. A key innovation in Timebox is that, unlike earlier systems, Timebox supports update-in-place storage systems rather than no-overwriting systems and provides snapshots that are transactionally consistent, yet non-disruptive. It uses novel in-memory data structures to ensure that frequent snapshots do not block applications from accessing the storage system, and do not cause unnecessary disk operations. Timebox takes a novel approach to dealing with snapshot meta-data using a new technique that supports both incremental meta-data creation and efficient meta-data reconstruction. A basic problem of storing the captured past states over long time scales is how to manage snapshot storage so that important snapshots can be kept for required time, even when archive space becomes limited. Timebox addresses this problem by providing a new capability---selective snapshot reclamation. The application specifies which snapshots are important. When archive space becomes limited, the system reclaims unimportant snapshots retaining the important ones, efficiently. Timebox uses novel snapshot storage management techniques that take advantage of flexible snapshot representation to avoid archive space fragmentation and to maintain efficient access to retained snapshots. We have implemented a Timebox prototype and analyzed its performance. Experimental results show that providing snapshots for back-in-time activities has low impact on system performance even when snapshots are frequent. Experimental results also show that for expected update workloads, Timebox reclaims selected unwanted snapshots with minimal impact on the storage system.
International Workshop on Distributed Algorithms, Jul 8, 1987
In this paper we offer a formal, rigorous proof of the correctness of Awerbuch's algorithm for ne... more In this paper we offer a formal, rigorous proof of the correctness of Awerbuch's algorithm for network synchronization. We specify both the algorithm and the correctness condition using the I/O automaton model, which has previously been used to describe and verify algorithms for concurrency control and resource allocation. We show that the model is also a powerful tool for reasoning about distributed graph algorithms. Our proof of correctness follows closely the intuitive arguments made by the designer of the algorithm by exploiting the model's natural support for such important design techniques as stepwise refinement and modularity. In particular, since the algorithm uses simpler algorithms for synchronization within and between 'clusters' of nodes, our proof can import as lemmas the correctness of these simpler algorithms. 1 Overview 1.1 Verification methods and models As computer science has matured as a discipline, its activity has broadened from writing programs to include reasoning about those programs: proving their correctness and efficiency, and proving bounds on the performance of any program that accomplishes the same task. Recently distributed computing has begun to broaden in this way (albeit a decade or two later than the part of computer science concerned with sequential, uniproeessor algorithms). There are several reasons why particular care is necessary to prove the correctness of algorithms when the algorithms
Lecture Notes in Computer Science, 1986
DOI to the publisher's website. • The final author version and the galley proof are versions of t... more DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.
Persistent object stores require a way to automatically upgrade persistent objects, to change the... more Persistent object stores require a way to automatically upgrade persistent objects, to change their code and storage representation. Automatic upgrades are a challenge for such systems. Upgrades must be performed in a way that is efficient both in space and time, and that does not stop application access to the store. In addition, however, the approach must be modular: it must allow programmers to reason locally about the correctness of their upgrades similar to the way they would reason about regular code. This paper provides solutions to both problems. The paper first defines upgrade modularity conditions that any upgrade system must satisfy to support local reasoning about upgrades. The paper then describes a new approach for executing upgrades efficiently while satisfying the upgrade modularity conditions. The approach exploits object encapsulation properties in a novel way. The paper also describes a prototype implementation and shows that our upgrade system imposes only a small overhead on application performance.
The Vldb Journal, Aug 20, 2021
Modern distributed data management systems face a new challenge: how can autonomous, mutually dis... more Modern distributed data management systems face a new challenge: how can autonomous, mutually distrusting parties cooperate safely and effectively? Addressing this challenge brings up familiar questions from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize concurrent access to data. Nevertheless, each of these issues requires rethinking when participants are autonomous and potentially adversarial. We propose the notion of a cross-chain deal, a new way to structure complex distributed computations that manage assets in an adversarial setting. Deals are inspired by classical atomic transactions, but are necessarily different, in important ways, to accommodate the decentralized and untrusting nature of the exchange. We describe novel safety and liveness properties, along with two alternative protocols for implementing cross-chain deals in a system of independent blockchain ledgers. One protocol, based on synchronous communication, is fully decentralized, while the other, based on semi-synchronous communication, requires a globally shared ledger. We also prove that some degree of centralization is required in the semi-synchronous communication model. 1 Introduction The emerging domain of electronic commerce spanning multiple blockchains is a kind of fun-house mirror of classical distributed computing: familiar features are recognizable, but distorted. For example, atomic transactions are often described in terms of the well-known ACID properties [29]: atomicity, consistency, isolation, and durability. We will see that cross-chain commerce requires structures superficially similar to, but fundamentally different from, atomic transactions. In particular, the notions of correctness for atomic transactions must be rethought. Here we propose the notion of a cross-chain deal, a new computational abstraction for structuring interactions Supported by NSF grant 1917990.
Proceedings of the VLDB Endowment, 2020
Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workload... more Optimistic concurrency control, or OCC, can achieve excellent performance on uncontended workloads for main-memory transactional databases. Contention causes OCC's performance to degrade, however, and recent concurrency control designs, such as hybrid OCC/locking systems and variations on multiversion concurrency control (MVCC), have claimed to outperform the best OCC systems. We evaluate several concurrency control designs under varying contention and varying workloads, including TPC-C, and find that implementation choices unrelated to concurrency control may explain much of OCC's previously-reported degradation. When these implementation choices are made sensibly, OCC performance does not collapse on high-contention TPC-C. We also present two optimization techniques, commit-time updates and timestamp splitting, that can dramatically improve the highcontention performance of both OCC and MVCC. Though these techniques are known, we apply them in a new context and highlight their potency: when combined, they lead to performance gains of 3.4× for MVCC and 3.6× for OCC in a TPC-C workload.
Proceedings of the 4th Annual International Conference on Systems and Storage
International Conference on Systems, May 30, 2011
Journal of WSCG, 2017
The concept of a visualization pipeline is central to many applications providing scientific visu... more The concept of a visualization pipeline is central to many applications providing scientific visualization. In practical usage scenarios, when the pipelines fuse multiple datasets and combine various visualization methods they can easily evolve into complex visualization networks directing data flow. Creating and managing complex visualization networks, especially when data itself is time-dependent and requires time-dependent adjustment of multiple visualization parameters, is a tedious manual task with potential for improvement. Here we discuss the benefits of using Berkeley Database (BDB) snapshots to make it easier to create and manage visualization networks for timedependent data. The idea is to represent visualization network states as BDB snapshots accessed via the widely used Hierarchical Data Format (HDF5), and exploit the snapshot indexing system to flexibly navigate through the high-dimensional space of visualization parameters. This enables us to support useful visualization system features, such as dynamic interpolation of visualization parameters between time points and flexible adjustments of camera parameters per time point. The former allows fast continuous navigation of the parameter space to increase animation frame rate and the latter supports multi-viewpoint renderings when generating Virtual Reality panorama movies. The paper describes how the snapshot approach and the new features can be conveniently integrated into modern visualization systems, such as the Visualization Shell (Vish), and presents an evaluation study indicating that the performance penalty of this convenience compared to maintaining visualization networks in HDF5 files is negligible.
Applications need to analyze the past state of their data to provide auditing and other forms of ... more Applications need to analyze the past state of their data to provide auditing and other forms of fact checking. Retrospective snapshot systems that support computations over data store snapshots, allow applications using simple data stores like Berkeley DB or SQLite, to provide past state analysis in a convenient way. Current snapshot systems however, offer no satisfactory support for computations that analyze multiple snapshots. We have developed a Retrospective Query Language (RQL), a simple declarative extension to SQL that allows to specify and run multi-snapshot computations conveniently in a snapshot system, using a small number of simple mechanisms defined in terms of relational constructs familiar to programmers. We describe RQL mechanisms, explain how they translate into SQL computations in a snapshot system, and show how to express a number of common analysis patterns with illustrative examples. We also describe how we implemented RQL in a simple way utilizing SQLite UDF f...
Despite the increasing importance of protecting confidential data, building secure software remai... more Despite the increasing importance of protecting confidential data, building secure software remains as challenging as ever. This paper describes Aeolus, a new platform for building secure distributed applications. Aeolus uses information flow control to provide confidentiality and data integrity. It differs from previous information flow control systems in a way that we believe makes it easier to understand and use. Aeolus uses a new, simpler security model, the first to combine a standard principal-based scheme for authority management with thread-granularity information flow tracking. The principal hierarchy matches the way developers already reason about authority and access control, and the coarse-grained information flow tracking eases the task of defining a program's security restrictions. In addition, Aeolus provides a number of new mechanisms (authority closures, compound tags, boxes, and shared volatile state) that support common design patterns in secure application design.
To provide high availability for services such as mail or bulletin boards, data must be replicate... more To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. In this paper, we propose lazy replication as a way to preserve consistency by exploiting the semantics of the service's operations to relax the constraints on ordering. Three kinds of operations are supported: operations for which the clients define the required order dynamically during the execution. operations for which the service defines the order, and operations that must be globally ordered with respect to both client ordered and service ordered operations. The method performs well in terms of response time, amount of stored state, number of messages, and availability. It is especially well suited to applications in which most operations require only the client-defined order.
Mobilebuddy: consistent client-to-client exchange in disconnected collaborative groups
The thesis presents MobileBuddy, a novel data caching system that improves availability and perfo... more The thesis presents MobileBuddy, a novel data caching system that improves availability and performance for collaborative applications while maintaining data consistency in a disconnected environment or over a Wide Area Network. MobileBuddy supports mobile exchange—a novel capability for disconnected client-to-client data and reservation transfer. Mobile exchange makes mobile computing more useful because it allows users to accomplish collaborative work that would be impossible in other systems. Collaborators disconnected from servers can acquire missing or more recent data, and can obtain reservations that guarantee independent transaction validation. The thesis describes the new techniques we developed to implement mobile exchange. The technique for data and reservation transfer combines support for coarse-grained data transfer with fine-grained validation, in a way that avoids the problem of false sharing. The technique for supporting type-specific reservations is more flexible a...
RID: Deduplicating Snapshot Computations
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020
One can audit SQL applications by running SQL programs over sequences of persistent snapshots, bu... more One can audit SQL applications by running SQL programs over sequences of persistent snapshots, but care is needed to avoid wasteful duplicate computation. This paper describes the design, implementation, and performance of RID, the first language-independent optimization framework that eliminates duplicate computations in SQL programs running over low-level snapshots by exploiting snapshot metadata efficiently.
Proceedings of the VLDB Endowment, 2019
Modern distributed data management systems face a new challenge: how can autonomous, mutually-dis... more Modern distributed data management systems face a new challenge: how can autonomous, mutually-distrusting parties cooperate safely and effectively? Addressing this challenge brings up questions familiar from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize concurrent access to data. Nevertheless, each of these issues requires rethinking when participants are autonomous and potentially adversarial. We propose the notion of a cross-chain deal , a new way to structure complex distributed computations that manage assets in an adversarial setting. Deals are inspired by classical atomic transactions, but are necessarily different, in important ways, to accommodate the decentralized and untrusting nature of the exchange. We describe novel safety and liveness properties, along with two alternative protocols for implementing cross-chain deals in a system of independent blockchain ledgers. One protoc...