Writes that fall in the forest and make no sound: Semantics-based adaptive data consistency (original) (raw)
Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties-something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. The key insight is to allow the state of the system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. In contrast to previous work, our approach uses program analysis to extract semantic information about permissible levels of inconsistency and is fully automated. We then employ a novel homeostasis protocol to allow sites to operate independently, without communicating, as long as any inconsistency is governed by appropriate treaties between the nodes. We discuss mechanisms for optimizing treaties based on workload characteristics to minimize communication, as well as a prototype implementation and experiments that demonstrate the benefits of our approach on common transactional benchmarks. * Work done while at Cornell University. trade-off between responsiveness and consistency, we demonstrate that by carefully analyzing applications, it is possible to achieve the best of both worlds: strong consistency and low latency in the common case. The key idea is to exploit the semantics of the transactions involved in the execution of an application in a way that is safe and completely transparent to programmers. It is well known that strong consistency is not always required to execute transactions correctly [20, 41], and this insight has been exploited in protocols that allow transactions to operate on slightly stale replicas as long as the staleness is "not enough to affect correctness" [5, 41]. This paper takes this basic idea much further, and develops mechanisms for automatically extracting safety predicates from application source code. Our homeostasis protocol uses these predicates to allow sites to operate without communicating, as long as any inconsistency is appropriately governed. Unlike prior work, our approach is fully automated and does not require programmers to provide any information about the semantics of transactions. Example: top-k query. To illustrate the key ideas behind our approach in further detail, consider a top-k query over a distributed datastore, as illustrated in Figure 1. For simplicity we will consider the case where k = 2. This system consists of a number of item sites that each maintain a collection of (key, value) pairs that could represent data such as airline reservations or customer purchases. An aggregator site maintains a list of top-k items sorted in descending order by value. Each item site periodically receives new insertions, and the aggregator site updates the top-k list as needed. A simple algorithm that implements the top-k query is to have each item site communicate new insertions to the aggregator site, which inserts them into the current top-k list in order, and removes the smallest element of the list. However, every insertion requires a communication round with the aggregator site, even if most of the inserts are for objects not in the top-k. A better idea is to only communicate with the aggregator node if the new value is greater than the minimal value of the current top-k list. Each site can maintain a cached value of the smallest value in the top-k and only notify the aggregator site if an item with a larger value is inserted into its local state. This algorithm is illustrated in Figure 2, where each item site has a variable min with the current lowest top-k value. In expectation, most item inserts do not affect the aggregator's behavior, and consequently, it is safe for them to remain unobserved by the aggregator site. This improved top-k algorithm is essentially a simplified distributed version of the well-known threshold algorithm for top-k computation [14]. However, note that this algorithm can be ex