Distributed Human Computation Framework for Linked Data Co-reference Resolution (original) (raw)

Distributed Linked Data as a Framework for Human-Machine Collaboration

This paper presents a novel application of Linked Data as an indirect communication framework for human-machine collaboration. In a decentralised fashion, agents interact by publishing Linked Data resources without having access to a centralised knowledge base. This framework provides an initial set of solutions to the problems of dynamic Linked Data discovery, of querying frequently-updated distributed datasets and of guaranteeing consistency in the case of concurrent updates. As a motivation for this framework we take the use-case of human and machine agents collaborating to the execution of tasks. This use-case is based on existing real-world Linked Data representations of human instructions and research on their integration with machine functionalities.

Scalable and Distributed Methods for Resolving, Consolidating, Matching and Disambiguating Entities in Linked Data Corpora

2010

With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for: (i) entity consolidation-identifying entities which signify the same referent, aka. smushing, entity resolution, object consolidation, etc.using explicit owl:sameAs relations; (ii) extended entity consolidation based on a subset of OWL 2 RL/RDF rules-particularly over inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we purposefully avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the fecundity of the approach and the quality of the results.

Crowdsourcing tasks in Linked Data management

2011

Abstract. Many aspects of Linked Data management–including exposing legacy data and applications to semantic formats, designing vocabularies to describe RDF data, identifying links between entities, query processing, and data curation–are necessarily tackled through the combination of human effort with algorithmic techniques. In the literature on traditional data management the theoretical and technical groundwork to realize and manage such combinations is being established.

Research on linked data and co-reference resolution

2009

This project report details work carried out in collaboration between the University of Southampton and the Korea Institute of Science and Technology Information, focussing on an RDF dataset of academic authors and publications. Activities included the conversion of the dataset to produce Linked Data, the identification of co-references in and between datasets, and the development of an ontology mapping service to facilitate the integration of the dataset with an existing Semantic Web application, RKBExplorer.com.

Introduction to Linked Data

Springer eBooks, 2019

This chapter presents Linked Data, a new form of distributed data on the web which is especially suitable to be manipulated by machines and to share knowledge. By adopting the linked data publication paradigm, anybody can publish data on the web, relate it to data resources published by others and run artificial intelligence algorithms in a smooth manner. Open linked data resources may democratize the future access to knowledge by the mass of internet users, either directly or mediated through algorithms. Governments have enthusiastically adopted these ideas, which is in harmony with the broader open data movement.

Community-Driven Linked Data Authoring and Production of Consolidated Linked Data

International Journal on Semantic Web and Information Systems, 2000

User-generated content can help the growth of linked data. However, we lack interfaces enabling ordinary people to author linked data. Secondly, people have multiple perspectives on the same concept and different contexts. Thirdly, not enough ontologies exist to model various data. Therefore, we propose an approach to enable people to share various data through an easy-to-use social platform. Users define their own concepts and multiple conceptualizations are allowed. These are consolidated using semi-automatic schema alignment techniques supported by the community. Further, concepts are grouped semi-automatically by similarity. As a result of consolidation and grouping, informal lightweight ontologies emerge gradually. We have implemented social software, called StYLiD, to realize our approach. It can serve as a platform motivating people to bookmark and share different things. It may also drive vertical portals for specific communities with integrated data from multiple sources. E...

Large-scale Semantic Integration of Linked Data

ACM Computing Surveys, 2020

A large number of published datasets (or sources) that follow Linked Data principles is currently available and this number grows rapidly. However, the major target of Linked Data, i.e., linking and integration, is not easy to achieve. In general, information integration is difficult, because (a) datasets are produced, kept, or managed by different organizations using different models, schemas, or formats, (b) the same real-world entities or relationships are referred with different URIs or names and in different natural languages,(c) datasets usually contain complementary information, (d) datasets can contain data that are erroneous, out-of-date, or conflicting, (e) datasets even about the same domain may follow different conceptualizations of the domain, (f) everything can change (e.g., schemas, data) as time passes. This article surveys the work that has been done in the area of Linked Data integration, it identifies the main actors and use cases, it analyzes and factorizes the i...

Community-driven Consolidated Linked Data Community-driven Consolidated Linked Data Technologies Used

INTRODUCTION Linked data is a method of exposing, sharing and connecting data on the Semantic Web. It provides the mechanisms for publishing and interlinking structured data into a Web of Data. This forms a data commons where people and organizations can post and consume data about anything. Due to the network effect, usefulness of data increases the more it is linked with other data. Organizations benefit by being in this global data network, accessible to both people and machines. Linked data can be fully realized with existing technologies maintaining compatibility with legacy applica-ABSTRACT User-generated content can help the growth of linked data. However, there are a lack of interfaces enabling ordinary people to author linked data. Secondly, people have multiple perspectives on the same concept and different contexts. Thirdly, there are not enough ontologies to model various data. Therefore, the authors of this chapter propose an approach to enable people to share various d...

Linked Data Meets Computational Intelligence - Position paper

National Conference on Artificial Intelligence, 2010

The Web of Data (WoD) is growing at an amazing rate and it will no longer be feasible to deal with it in a global way, by centralising the data or reasoning processes making use of that data. We believe that Computational Intelligence techniques provides the adaptiveness, robustness and scalability that will be required to exploit the full value of ever growing amounts of dynamic Semantic Web data.