Replicated abstract data types: Building blocks for collaborative applications (original) (raw)

Conflict-Free Partially Replicated Data Types

2015

Designers of large user-oriented distributed applications, such as social networks and mobile applications, have adopted measures to improve the responsiveness of their applications. Latency is a major concern as people are very sensitive to it. Geo-replication is a commonly used mechanism to bring the data closer to clients. Nevertheless, reaching the closest datacenter can still be considerably slow. Thus, in order to further reduce the access latency, mobile and web applications may be forced to replicate data at the client-side. Unfortunately, fully replicating large data structures may still be a waste of resources, specially for thin-clients. We propose a replication mechanism built upon conflict-free replicated data types (CRDT) to seamlessly replicate parts of large data structures. We define partial replication and give an approach to keep the strong eventual consistency properties of CRDTs with partial replicas. We integrate our mechanism into SwiftCloud, a transactional system that brings geo-replication to clients. We evaluate the solution with a content-sharing application. Our results show improvements in bandwidth, memory, and latency over both classical geo-replication and the existing SwiftCloud solution.

On the semantics and implementation of replicated data types

Science of Computer Programming, 2018

Replicated data types (rdts) concern the specification and implementation of data structures handled by replicated data stores, i.e., distributed data stores that maintain copies of the same data item on multiple devices. A distinctive feature of rdts is that the behaviour of an operation depends on the state of the replica over which it performs, and hence, its result may differ from replica to replica. Abstractly, rdts are specified in terms of two relations, visibility and arbitration. The former establishes whether an operation observes the effects of the execution of another operation, the latter is a total order on operations used to resolve conflicts between operations executed concurrently over different replicas. Traditionally, an operation of an rdt is specified as a function mapping a visibility and an arbitration into the expected result of the operation. This paper recasts such standard approaches into a denotational framework in which a data type is a function mapping visibility into admissible arbitrations. This characterisation provides a more abstract view of rdts that (i) highlights some implicit assumptions shared in operational approaches to specification; (ii) accommodates underspecification and refinement; (iii) enables a direct characterisation of the correct implementations of an rdt in terms of a simulation relation between the states of a concrete implementation and of the abstract one determined by the specification.

Conflict-free replicated data types

2011

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in large-scale distributed systems (eg, clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflict-free Replicated Data Type (CRDT).

A comprehensive study of convergent and commutative replicated data types

2011

Abstract: Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs).

A commutative replicated data type for cooperative editing

2009

Abstract A commutative replicated data type (CRDT) is one where all concurrent operations commute. The replicas of a CRDT converge automatically, without complex concurrency control. This paper describes Treedoc, a novel CRDT design for cooperative text editing. An essential property is that the identifiers of Treedoc atoms are selected from a dense space. We discuss practical alternatives for implementing the identifier space based on an extended binary tree.

Consistent shared data types: Beyond memory

In large scale distributed systems, shared objects provide a valuable abstraction of communication. However, these objects can only be used reliably if they are specified precisely. Until now, a lot of work has been done on shared memory, to the detriment of other objects. This paper aims at extending this work to any update-query abstract data type, the types in which the operations are updates or queries. A shared object should be fully specified by two complementary aspects: a sequential specification that defines how the updates influence the queries, and a consistency criterion that discriminates which distributed histories are eligible according to the sequential specification. This paper formalizes the notions of sequential specification and consistency criterion. It then extends the definition of many consistency criteria, including causal consistency, to all update-query abstract data types. It also explores the notion of composability for consistency criteria and proves that no consistency criterion between pipelined consistency and sequential consistency is composable, which includes causal consistency.

How to design optimistic operations for peer-to-peer replication

Proceedings of the 9th Joint Conference on Information Sciences (JCIS), 2006

As collaboration over the Internet becomes an everyday affair, it is increasingly important to provide high quality of interactivity. Distributed applications can replicate collaborative objects at every site for the purpose of achieving high interactivity. Replication, however, has a fatal weakness that it is difficult to maintain consistency among replicas. This paper introduces operation commutativity as a key principle in designing operations in order to manage distributed replicas consistent. In addition, we suggest effective schemes that make operations commutative using the relations of objects and operations. Finally, we apply our approaches to some simple replicated abstract data types, and achieve their consistency without serialization and locking.

Designing a commutative replicated data type for cooperative editing systems

2008

Abstract Cooperative editing systems enable distributed users to collaborate by concurrently editing a shared document, but care is required to ensure convergence and correctness. This problem turns out to be surprisingly complex. In this paper, we present a new approach for managing replicated data in cooperative editing systems, Commutative Replicated Data Types (CRDT). In a CRDT, all concurrent operations commute.

Remove-Win: a Design Framework for Conflict-free Replicated Data Types

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Internet-scale distributed systems often replicate data within and across data centers to provide low latency and high availability despite node and network failures. Replicas are required to accept updates without coordination with each other, and the updates are then propagated asynchronously. This brings the issue of conflict resolution among concurrent updates, which is often challenging and error-prone. The Conflict-free Replicated Data Type (CRDT) framework provides a principled approach to address this challenge. This work focuses on a special type of CRDT, namely the Conflict-free Replicated Data Collection (CRDC), e.g. list and queue. The CRDC can have complex and compound data items, which are organized in structures of rich semantics. Complex CRDCs can greatly ease the development of upper-layer applications, but also makes the conflict resolution notoriously difficult. This explains why existing CRDC designs are tricky, and hard to be generalized to other data types. A design framework is in great need to guide the systematic design of new CRDCs. To address the challenges above, we propose the Remove-Win Design Framework. The remove-win strategy for conflict resolution is simple but powerful. The remove operation just wipes out the data item, no matter how complex the value is. The user of the CRDC only needs to specify conflict resolution for non-remove operations. This resolution is destructed to three basic cases and are left as open terms in the CRDC design skeleton. Stubs containing user-specified conflict resolution logics are plugged into the skeleton to obtain concrete CRDC designs. We demonstrate the effectiveness of our design framework via a case study of designing a conflict-free replicated priority queue. Performance measurements also show the efficiency of the design derived from our design framework.

Update propagation protocols for replicated databates

ACM SIGMOD Record, 1999

Replication is often used in many distributed systems to provide a higher level of performance, reliability and availability. Lazy replica update protocols, which propagate updates to replicas through independent transactions after the original transaction commits, have become popular with database vendors due to their superior performance characteristics. However, if lazy protocols are used indiscriminately, they can result in non-serializable executions. In this paper, we propose two new lazy update protocols that guarantee serializability but impose a much weaker requirement on data placement than earlier protocols. Further, many naturally occurring distributed systems, like distributed data warehouses, satisfy this requirement. We also extend our lazy update protocols to eliminate all requirements on data placement. The extension is a hybrid protocol that propagates as many updates as possible in a lazy fashion. We implemented our protocols on the Datablitz database system product developed at Bell Labs. We also conducted an extensive performance study which shows that our protocols outperform existing protocols over a wide range of workloads.