Knowledge Spring Process - Towards Discovering and Reusing Knowledge within Linked Open Data Foundations (original) (raw)

Linked Open Data mining for democratization of big data

2014 IEEE International Conference on Big Data (Big Data), 2014

Data is everywhere, and non-expert users must be able to exploit it in order to extract knowledge, get insights and make well-informed decisions. The value of the discovered knowledge from big data could be of greater value if it is available for later consumption and reusing. In this paper, we present an infrastructure that allows non-expert users to (i) apply user-friendly data mining techniques on big data sources, and (ii) share results as Linked Open Data (LOD). The main contribution of this paper is an approach for democratizing big data through reusing the knowledge gained from data mining processes after being semantically annotated as LOD, then obtaining Linked Open Knowledge. Our work is based on a model-driven viewpoint in order to easily deal with the wide diversity of open data formats.

Leveraging linked open data information extraction for data mining applications

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES

The linked open data cloud, with a huge volume of data from heterogeneous and interlinked datasets, has turned the Web into a large data store. It has attracted the attention of both developers and researchers in the last few years, opening up new dimensions in machine learning and knowledge discovery. Information extraction procedures for these processes use different approaches, e.g., template-based, federated to multiple sources, fixed depth link traversal, etc. They are limited by problems in online access to datasets' SPARQL endpoints, such as servers being down for maintenance, bandwidth narrowing, limited numbers of access points to datasets in particular time slots, etc., which may result in imprecise and incomplete sets of feature vector generation, affecting the quality of knowledge discovered. The work presented here addresses the disadvantages of online data retrieval by proposing a simple and automatic way to extract features from the linked open data cloud using a linked traversal approach in a local environment with previously identified and known sets of interlinked RDF datasets. The user is given the flexibility to determine the depth of the neighboring properties to be traversed for information extraction to generate the feature vector, which can be used for machine learning and knowledge discovery. The experiment is performed locally with Virtuoso Triple Store for storage of datasets and an interface developed to build the feature vector. The evaluation is performed by comparing the obtained feature vector with gold standard instances annotated manually and with a case study for estimating the effects of demography in movie production for a country. The advantage of the proposed approach lies in overcoming problems with online access of data from the linked data cloud, RDF dataset integration in both local and web environments to build feature vectors for machine learning, and generating background knowledge from the linked data cloud.

Applications, Methodologies, and Technologies for Linked Open Data

International Journal on Semantic Web and Information Systems, 2020

Advances in semantic web technologies have rocketed the volume of linked data published on the web. In this regard, linked open data (LOD) has long been a topic of great interest in a wide range of fields (e.g. open government, business, culture, education, etc.). This article reports the results of a systematic literature review on LOD. 250 articles were reviewed for providing a general overview of the current applications, technologies, and methodologies for LOD. The main findings include: i) most of the studies conducted so far focus on the use of semantic web technologies and tools applied to contexts such as biology, social sciences, libraries, research, and education; ii) there is a lack of research with regard to a standardized methodology for managing LOD; and iii) a plenty of tools can be used for managing LOD, but most of them lack of user-friendly interfaces for querying datasets.

Towards Utilizing Open Data for Interactive Knowledge Transfer

Increasing heterogeneous Open Data is an ongoing trend in the current Social Semantic Web (s2w). Generic concepts and how-tos for higher-layered reuse of the arbitrary information overload for interactive knowledge transfer and learning - mentioning the Internet of Services (IoS) - are not covered very well yet. For further, directed use of distributed services and sources, inquiry, interlinking, analysis, machine- and human-interpretable representation are as essential as lightweight user-oriented interoperation and competency in handling. In the following we introduce the qKAI application framework (qualifying Knowledge Acquisition and Inquiry) - a service-oriented, generic and hybrid approach combining knowledge related offers for convenient reuse and tweaking them with interaction for improved access with rich user experience. qKAI aims at closing some residual gaps between the "sophisticated" Semantic Web and "hands-on" Web 2.0 enabling loose-coupled knowled...

What Goes Around Comes Around — Improving Linked Open Data through On-Demand Model Creation

2010

We present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information. We exemplify this using Linked Data as background knowledge, automatic Model/Ontology creation for the IE part and a Semantic Browser for evaluation. The hermeneutic approach, however, is open to be used with other IE techniques and other evaluation methods. We will present results from the model creation and anecdotal evidence for the feasibility of "Validation through Use".

A visual exploration workflow as enabler for the exploitation of linked open data

Open Data which concisely and unambiguously describes a knowledge domain. However, the uptake of the Linked Data depends on its usefulness to non-Semantic Web experts. Failing to support data consumers to understand the added-value of Linked Data and possible exploitation opportunities could inhibit its diffusion. In this paper, we propose an interactive visual workflow for discovering and exploring Linked Open Data. We implemented the workflow considering academic library metadata and carried out a qualitative evaluation. We assessed the workflow's potential impact on data consumers which bridges the offer: published Linked Open Data; and the demand as requests for: (i) higher quality data; and (ii) more applications that re-use data. More than 70% of the 34 test users agreed that the workflow fulfills its goal: it facilitates non-Semantic Web experts to understand the potential of Linked Open Data.

Introduction to Linked Data

Springer eBooks, 2019

This chapter presents Linked Data, a new form of distributed data on the web which is especially suitable to be manipulated by machines and to share knowledge. By adopting the linked data publication paradigm, anybody can publish data on the web, relate it to data resources published by others and run artificial intelligence algorithms in a smooth manner. Open linked data resources may democratize the future access to knowledge by the mass of internet users, either directly or mediated through algorithms. Governments have enthusiastically adopted these ideas, which is in harmony with the broader open data movement.

Capturing the age of linked open data: Towards a dataset-independent framework

An increasing amount of data are published and consumed on the Web according to the Linked Data paradigm. In such scenario, understanding if the data consumed are up-to-date is crucial. Outdated data are usually considered inappropriate for many crucial tasks, such as make the consumer confident that answers returned to a query are still valid at the time the query is formulated. In this paper we present a first dataset-independent framework for assessing currency of Linked Open Data (LOD) graphs. Starting from the analysis of the 8,713,282 triples containing temporal metadata in the billion triple challenge 2011, we investigate which vocabularies are used to represent versioning metadata; we defined OntoCurrency, an ontology that integrates the most frequent properties used in this domain, and supports the collection of metadata from datasets that use different vocabularies. The proposed framework uses this ontology to assess the currency of an RDF graph/statement, by extrapolating it from the currency of the documents that describe the resources occurring in the graphs (statement). The approach has been implemented and evaluated in two different scenarios.

Adding value to Linked Open Data using a multidimensional model approach based on the RDF Data Cube Vocabulary

Computer Standards & Interfaces

Most organisations using Open Data currently focus on data processing and analysis. However, although Open Data may be available online, these data are generally of poor quality, thus discouraging others from contributing to and reusing them. This paper describes an approach to publish statistical data from public repositories by using Semantic Web standards published by the W3C, such as RDF and SPARQL, in order to facilitate the analysis of multidimensional models. We have defined a framework based on the entire lifecycle of data publication including a novel step of Linked Open Data assessment and the use of external repositories as knowledge base for data enrichment. As a result, users are able to interact with the data generated according to the RDF Data Cube vocabulary, which makes it possible for general users to avoid the complexity of SPARQL when analysing data. The use case was applied to the Barcelona Open Data platform and revealed the benefits of the application of our approach, such as helping in the decision-making process.

Towards intelligent open data platforms: Discovering relatedness in datasets

2017 Intelligent Systems Conference (IntelliSys), 2017

Open data platforms are central to the management and exploitation of data ecosystems. While existing platforms provide basic search capabilities and features for filtering search results, none of the existing platforms provide recommendations on related datasets. Knowledge of dataset relatedness is critical for determining datasets that can be mashed-up or integrated for the purpose of analysis and creation of data-driven services. When considering data platforms, such as data.gov with over 193,000 datasets or data.gv.uk with over 40,000 datasets, specifying dataset relatedness relationship manually is infeasible. In this paper, we approach the problem of discovering relatedness in datasets by employing the Kohonen Self Organsing Map (SOM) algorithm to analyze the metadata extracted from the Data Catalogue maintained on a platform. Our results show that this approach is very effective in discovering relatedness relationships among datasets. Findings also reveal that our approach could uncover interesting and valuable connections among domains of the datasets which could be further exploited for designing smarter data-driven services. Keywords-Semantic relatedness of datasets; data recommendation; open data platforms; e-government I. INTRODUCTION Open data platforms are central to data ecosystems. These data infrastructures mediate public access to the increasingly available open government and public data. In addition to providing access to available data, open data platforms enable organizations to manage their data catalogues, publish, explore, analyse and share their datasets. Currently, there are over ten known open data platforms including CKAN,