C-set: a commutative replicated data type for semantic stores (original) (raw)
Related papers
Synchronizing semantic stores with commutative replicated data types
2012
Abstract Social semantic web technologies led to huge amounts of data and information being available. The production of knowledge from this information is challenging, and major efforts, like DBpedia, has been done to make it reality. Linked data provides interconnection between this information, extending the scope of the knowledge production.
B-Set: a synchronization method for distributed semantic stores
— Nowadays, there are increasing interests in developing methods for synchronizing distributed triple-stores by ensuring eventual data consistency in distributed architecture. The most well-known of them have been designed to serve as a common replicated data type (CRDT), where all concurrent operations commute independently of the centralized control. In this context, CRDT has been proposed for semantic stores, such as SWOOKI, C-Set and SU-Set. However none of the exiting synchronization solutions mention how to ensure Causality, Consistency and Intention preservation criteria of CCI model. This paper proposes B-Set, a new CRDT for the synchronization of semantic stores. B-Set is designed not only to ensure convergence of triples replicas but also to preserve user's intentions integrated in distributed architecture. The sets of operations are also defined in order to allow concurrent editing of the same shared triple-stores.
Consistency awareness in a distributed collaborative system for semantic stores
In distributed collaborative systems for semantic stores editing, multiple users can add, delete and change RDF statements starting from the same replicas and achieving to the same results at the end of the collaborative session. To improve the performance for such systems, the development of an efficient awareness mechanism is very important in order to help users to better understand the semantic stores evolution. Moreover, maintaining the consistency in replicated architecture is one of the most significant problems. However, none of the existing approaches describes how to define the awareness mechanism for distributed semantic stores performing concurrent changes. In this paper, we propose a new powerful optimistic replication solution called AB-Set, which can ensure not only a consistency criteria when editing data but also use semantic web technologies to define an awareness mechanism for making users aware of the different status of the store they share and update regardless of the concurrency level.
An optimized conflict-free replicated set
2012
Abstract: Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types.
srCE: a collaborative editing of scalable semantic stores on P2P networks
Commutative Replicated Data Type (CRDT) is a convergence philosophy invented as a new generation of technique that ensures consistency maintenance of replica in collaborative editors without any difficulty over Peer-to-Peer (P2P) networks. This technique has been successfully applied to different data representation types in scalable collaborative editing for linear, tree document structure and semi-structured data types but not yet on set data type ensuring Causality, Consistency and Intention (CCI) preservation criteria. In this paper, we propose a srCE approach, a novel CRDT for a set structure to facilitate the collaborative and concurrent editing of Resource Description Framework (RDF) stores in large scale by different members of virtual community. Our approach ensures CCI model and is not tied to a specific case and therefore can be applied for any document that complies to set structure. A prototype implementation using Friend of a Friend (FOAF) data sets with and without the srCE model illustrates significant improvement in scalability and performance.
Col-Graph: Towards Writable and Scalable Linked Open Data
Lecture Notes in Computer Science, 2014
Linked Open Data faces severe issues of scalability, availability and data quality. These issues are observed by data consumers performing federated queries; SPARQL endpoints do not respond and results can be wrong or out-of-date. If a data consumer finds an error, how can she fix it? This raises the issue of the writability of Linked Data. In this paper, we devise an extension of the federation of Linked Data to data consumers. A data consumer can make partial copies of different datasets and make them available through a SPARQL endpoint. A data consumer can update her local copy and share updates with data providers and consumers. Update sharing improves general data quality, and replicated data creates opportunities for federated query engines to improve availability. However, when updates occur in an uncontrolled way, consistency issues arise. In this paper, we define fragments as SPARQL CONSTRUCT federated queries and propose a correction criterion to maintain these fragments incrementally without reevaluating the query. We define a coordination free protocol based on the counting of triples derivations and provenance. We analyze the theoretical complexity of the protocol in time, space and traffic. Experimental results suggest the scalability of our approach to Linked Data.
On Versioning and Archiving Semantic Web Data
This paper concerns versioning services over Semantic Web (SW) repositories. We propose a novel storage index (based on partial orders), called POI, that exploits the fact that RDF Knowledge Bases (KBs) (a) have not a unique serialization (as it happens with texts) and (b) their versions are usually related by containment (⊆). We discuss the benefits and drawbacks of this approach in terms of storage space and efficiency both analytically and experimentally in comparison with the existing approaches (including the changebased approach). We report experimental results over synthetic data sets showing that POI offers notable space saving, e.g. compression ratio (i.e. uncompressed/compressed size) ranges between 1,800% and 18,163%, as well as efficiency in various cross version operations. POI is equipped with three version insertion algorithms and could be also exploited in cases where the set of KBs does not fit in main memory. Although the focus of this work is SW data versioning, POI can be considered as a generic indexing scheme for storing set-valued data.
Ontology Consistency and Instance Checking For Real World Linked Data
Ontology Consistency and Instance Checking for Real World Linked Data, 2015
Many large ontologies have been created which make use of OWL's expressiveness for specification. However, tools to ensure that instance data is in compliance with the schema are often not well integrated with triple-stores and cannot detect certain classes of schema-instance inconsistency due to the assumptions of the OWL axioms. This can lead to lower quality, inconsistent data. We have developed a simple ontol-ogy consistency and instance checking service, SimpleConsist[8]. We also define a number of ontology design best practice constraints on OWL or RDFS schemas. Our implementation allows the user to specify which constraints should be applied to schema and instance data.