Semantic measures based on RDF projections: Application to content-based recommendation systems (short paper) (original) (raw)

2016

Linked Data allows structured data to be published in a standard manner so that datasets from diverse domains can be interlinked. By leveraging Semantic Web standards and technologies, a growing amount of semantic content has been published on the Web as Linked Open Data (LOD). The LOD cloud has made available a large volume of structured data in a range of domains via liberal licenses. The semantic content of LOD in conjunction with the advanced searching and querying mechanisms provided by SPARQL has opened up unprecedented opportunities not only for enhancing existing applications, but also for developing new and innovative semantic applications. However, SPARQL is inadequate to deal with functionalities such as comparing, prioritizing, and ranking search results which are fundamental to applications such as recommendation provision, matchmaking, social network analysis, visualization, and data clustering. This paper addresses this problem by developing a systematic measurement model of semantic similarity between resources in Linked Data. By drawing extensively on a feature-based definition of Linked Data, it proposes a generalized information content-based approach that improves on previous methods which are typically restricted to specific knowledge representation models and less relevant in the context of Linked Data. It is validated and evaluated for measuring item similarity in recommender systems. The experimental evaluation of the proposed measure shows that our approach can outperform comparable recommender systems that use conventional similarity measures.

Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes

The Linked Open Data cloud has been increasing in popularity, with DBpedia as a first-class citizen in this cloud that has been widely adopted across many applications. Measuring similarity between resources and identifying their relatedness could be used for various applications such as item-based recommender systems. To this end, several similarity measures such as LDSD (Linked Data Semantic Distance) were proposed. However, some fundamental axioms for similarity measures such as "equal self-similarity", "symmetry" or "minimality" are violated, and property similarities have been ignored. Moreover, none of the previous studies have provided a comparative study on other similarity measures. In this paper, we present a similarity measure, called Resim (Resource Similarity), based on top of a revised LDSD similarity measure. Resim aims to calculate the similarity of any resources in DBpedia by taking into account the similarity of the properties of these resources as well as satisfying the fundamental axioms. In addition, we evaluate our similarity measure with two state-of-the-art similarity measures (LDSD and Shakti) in terms of calculating the similarities for general resources (i.e., any resources without a domain restriction) in DBpedia and resources for music artist recommendations. Results show that our similarity measure can resolve some of the limitations of state-of-the-art similarity measures and performs better than them for calculating the similarities between general resources and music artist recommendations.

Using proximity to compute semantic relatedness in RDF graphs

Computer Science and Information Systems

Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terns in RDF graphs based on the notion of proximity. It proposes a formal definition of proximity in terms of the set paths connecting two concept nodes, and an algorithm for finding this set and computing proximity with a given error margin. This algorithm was implemented on a tool called Shakti that extracts relevant ontological data for a given domain from DBpedia - a community effort to extract structured data from the Wikipedia. To validate the proposed approach Shakti was used to recommend web pages on a Portuguese social site related to alternative music and the results of that experiment are also reported.

An Improvement in Measuring the Semantic Similarity Between RDF Ontologies

FAIR - NGHIÊN CỨU CƠ BẢN VÀ ỨNG DỤNG CÔNG NGHỆ THÔNG TIN - 2016, 2017

RDF (Resource Description Framework) ontologies has been playing an important role for many knowledge applications because they support a source of precisely defined terms. However, the widespread of RDF ontologies creates a demand for automatic way of assessing their similarity. In this paper, we present a novel method to measure the semantic similarity between elements in different RDF ontologies. This measure is designed so as to enable extraction of information encoded in RDF element descriptions and to take into account the element relationships with its ancestors and children. We evaluate the proposed measures in the context of matching two RDF ontologies to determine the number of matches between them and then compare with human estimation and the related methods. The experimental results show that our similarity values are better than other approaches with regard to the accuracy of semantics and structure similarities.

Computing the Semantic Relatedness of Music Genre using Semantic Web Data

2016

Computing the semantic relatedness between two entities has many applications domains. In this paper, we show a new way to compute the semantic relatedness between two resources using semantic web data. Moreover, we show how this measure can be used to compute the semantic relatedness between music genres which can be used for music recommendation systems. We first describe how to build a vector representations for resources in an ontology. Subsequently we show how these vector representations can be used to compute the semantic relatedness of two resources. Finally, as an application, we show that our measure can be used to compute the semantic relatedness of music genres. CCS Concepts •Information systems → Similarity measures; Language models;

Automated approach for quality assessment of RDF resources

BMC Medical Informatics and Decision Making

Introduction The Semantic Web community provides a common Resource Description Framework (RDF) that allows representation of resources such that they can be linked. To maximize the potential of linked data - machine-actionable interlinked resources on the Web - a certain level of quality of RDF resources should be established, particularly in the biomedical domain in which concepts are complex and high-quality biomedical ontologies are in high demand. However, it is unclear which quality metrics for RDF resources exist that can be automated, which is required given the multitude of RDF resources. Therefore, we aim to determine these metrics and demonstrate an automated approach to assess such metrics of RDF resources. Methods An initial set of metrics are identified through literature, standards, and existing tooling. Of these, metrics are selected that fulfil these criteria: (1) objective; (2) automatable; and (3) foundational. Selected metrics are represented in RDF and semantical...

A metric-driven approach for interlinking assessment of RDF graphs

2015 International Symposium on Computer Science and Software Engineering (CSSE), 2015

In recent years the web has evolved from a global information space of linked documents to one where both documents and data are linked. What supports this evolution is a set of best practices in publishing and connecting structured data on the web that is called linked data. The usefulness of linked data relies on how much related concepts are linked together. The aim of this research is to propose a metric-driven approach for interlinking assessment of a single dataset. The proposed metrics are categorized into three groups called internal linking, external linking and link-ability from other datasets. These metrics consider both graph structure (topology) and schema of datasets (semantic information) to evaluate interlinking with appropriate accuracy.

Measuring structural similarity between RDF graphs

Proceedings of the 33rd Annual ACM Symposium on Applied Computing - SAC '18, 2018

In the latest years, there has been a huge e ort to deploy large amounts of data, making it available in the form of RDF data thanks, among others, to the Linked Data initiative. In this context, using shared ontologies has been crucial to gain interoperability, and to be able to integrate and exploit third party datasets. However, using the same ontology does not su ce to successfully query or integrate external data within your own dataset: the actual usage of the vocabulary (e.g., which concepts have instances, which properties are actually populated and how, etc.) is crucial for these tasks. Being able to compare di erent RDF graphs at the actual usage level would indeed help in such situations. Unfortunately, the complexity of graph comparison is an obstacle to the scalability of many approaches. In this article, we present our structural similarity measure, designed to compare structural similarity of low-level data between two di erent RDF graphs according to the pa erns they share. To obtain such pa erns, we leverage a data mining method (KRIMP) which allows to extract the most descriptive pa erns appearing in a transactional database. We adapt this method to the particularities of RDF data, proposing two di erent conversions for an RDF graph. Once we have the descriptive pa erns, we evaluate how much two graphs can compress each other to give a numerical measure depending on the common data structures they share. We have carried out several experiments to show its ability to capture the structural di erences of actual vocabulary usage.

Charaterizing RDF graphs through graph-based measures – framework and assessment

Semantic Web

The topological structure of RDF graphs inherently differs from other types of graphs, like social graphs, due to the pervasive existence of hierarchical relations (TBox), which complement transversal relations (ABox). Graph measures capture such particularities through descriptive statistics. Besides the classical set of measures established in the field of network analysis, such as size and volume of the graph or the type of degree distribution of its vertices, there has been some effort to define measures that capture some of the aforementioned particularities RDF graphs adhere to. However, some of them are redundant, computationally expensive, and not meaningful enough to describe RDF graphs. In particular, it is not clear which of them are efficient metrics to capture specific distinguishing characteristics of datasets in different knowledge domains (e.g., Cross Domain vs. Linguistics). In this work, we address the problem of identifying a minimal set of measures that is effici...

Characterising RDF data sets

Journal of Information Science, 2017

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Semantic measures based on RDF projections: Application to content-based recommendation systems (short paper) (original) (raw)

Related papers