A Combination-based Semantic Similarity Measure using Multiple Information Sources (original) (raw)

An information theoretic approach to improve semantic similarity assessments across multiple ontologies

Semantic similarity has become, in recent years, the backbone of numerous knowledgebased applications dealing with textual data. From the different methods and paradigms proposed to assess semantic similarity, ontology-based measures and, more specifically, those based on quantifying the Information Content (IC) of concepts are the most widespread solutions due to their high accuracy. However, these measures were designed to exploit a single ontology. They thus cannot be leveraged in many contexts in which multiple knowledge bases are considered. In this paper, we propose a new approach to achieve accurate IC-based similarity assessments for concept pairs spread throughout several ontologies. Based on Information Theory, our method defines a strategy to accurately measure the degree of commonality between concepts belonging to different ontologies-this is the cornerstone for estimating their semantic similarity. Our approach therefore enables classic IC-based measures to be directly applied in a multiple ontology setting. An empirical evaluation, based on well-established benchmarks and ontologies related to the biomedical domain, illustrates the accuracy of our approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works. unambiguously retrieved from ontologies and similarities can be assessed from structured knowledge that has been explicitly formalised by human experts.

Ontology-based semantic similarity: A new feature-based approach

Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge sources as the base to perform the estimations. In recent years, ontologies have grown in interest thanks to global initiatives such as the Semantic Web, offering an structured knowledge representation. Thanks to the possibilities that ontologies enable regarding semantic interpretation of terms many ontology-based similarity measures have been developed. According to the principle in which those measures base the similarity assessment and the way in which ontologies are exploited or complemented with other sources several families of measures can be identified. In this paper, we survey and classify most of the ontology-based approaches developed in order to evaluate their advantages and limitations and compare their expected performance both from theoretical and practical points of view. We also present a new ontology-based measure relying on the exploitation of taxonomical features. The evaluation and comparison of our approach's results against those reported by related works under a common framework suggest that our measure provides a high accuracy without some of the limitations observed in other works.

Comparison of ontology-based semantic-similarity measures

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2008

Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.

Description and Evaluation of Semantic Similarity Measures Approaches

International Journal of Computer Applications, 2013

In recent years, semantic similarity measure has a great interest in Semantic Web and Natural Language Processing (NLP). Several similarity measures have been developed, being given the existence of a structured knowledge representation offered by ontologies and corpus which enable semantic interpretation of terms. Semantic similarity measures compute the similarity between concepts/terms included in knowledge sources in order to perform estimations. This paper discusses the existing semantic similarity methods based on structure, information content and feature approaches. Additionally, we present a critical evaluation of several categories of semantic similarity approaches based on two standard benchmarks. The aim of this paper is to give an efficient evaluation of all these measures which help researcher and practitioners to select the measure that best fit for their requirements.

An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology

Journal of the Association for Information Science and Technology, 2018

Finding similarity between concepts based on semantics has become a new trend in many applications (e.g., biomedical informatics, natural language processing). Measuring the Semantic Similarity (SS) with higher accuracy is a challenging task. In this context, the Information Content (IC)-based SS measure has gained popularity over the others. The notion of IC evolves from the science of information theory. Information theory has very high potential to characterize the semantics of concepts. Designing an IC-based SS framework comprises (i) an IC calculator, and (ii) an SS calculator. In this article, we propose a generic intrinsic IC-based SS calculator. We also introduce here a new structural aspect of an ontology called DCS (Disjoint Common Subsumers) that plays a significant role in deciding the similarity between two concepts. We evaluated our proposed similarity calculator with the existing intrinsic IC-based similarity calculators, as well as corpora-dependent similarity calculators using several benchmark data sets. The experimental results show that the proposed similarity calculator produces a high correlation with human evaluation over the existing state-of-the-art ICbased similarity calculators.

Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies

2006

Semantic Similarity relates to computing the similarity between concepts (terms) which are not necessarily lexically similar. We investigate approaches to computing semantic similarity by mapping terms to an ontology and by examining their relationships in that ontology. More specifically, we investigate approaches to computing the semantic similarity between natural language terms (using WordNet as the underlying reference ontology) and between medical terms (using the MeSH ontology of medical and biomedical terms). The most popular semantic similarity methods are implemented and evaluated using WordNet and MeSH. The focus of this work is also on cross ontology methods which are capable of computing the semantic similarity between terms stemming from different ontologies (WordNet and MeSH in this work). This is a far more difficult problem (than the single ontology one referred to above) which has not been investigated adequately in the literature. X-Similarity, a novel cross-ontology similarity method is also a contribution of this work. All methods examined in this work are integrated into a semantic similarity system which is accessible on the Web.

A semantic similarity method based on information content exploiting multiple ontologies

2012

The quantification of the semantic similarity between terms is an important research area that configures a valuable tool for text understanding. Among the different paradigms used by related works to compute semantic similarity, in recent years, information theoretic approaches have shown promising results by computing the information content (IC) of concepts from the knowledge provided by ontologies. These approaches, however, are hampered by the coverage offered by the single input ontology. In this paper, we propose extending IC-based similarity measures by considering multiple ontologies in an integrated way. Several strategies are proposed according to which ontology the evaluated terms belong. Our proposal has been evaluated by means of a widely used benchmark of medical terms and MeSH and SNOMED CT as ontologies. Results show an improvement in the similarity assessment accuracy when multiple ontologies are considered.

Towards the estimation of feature-based semantic similarity using multiple ontologies

A key application of ontologies is the estimation of the semantic similarity between terms. By means of this assessment, the comprehension and management of textual resources can be improved. However, most ontology-based similarity measures only support a single input ontology. If any of the compared terms do not belong to that ontology, their similarity cannot be assessed. To solve this problem, multiple ontologies can be considered. Even though there are methods that enable the multi-ontology similarity assessment by means of integrating concepts from different ontologies, most of them are based on simple terminological and/or partial matchings. This hampers similarity measures that exploit a broad set of taxonomic evidences of similarity, like feature-based ones. In this paper, we tackle this problem by proposing a method to identify all the suitable matchings between concepts of different ontologies that intervene in the similarity assessment. In addition to the obvious terminological matching, we exploit the ontological structure and the notion of concept subsumption to discover non-trivial equivalences between heterogeneous ontologies. Our final goal is to enable the accurate application of feature-based similarity measures in a multi-ontology setting. Our proposal is evaluated with regard human judgements of similarity for several benchmarks and ontologies. Results shows an improvement against related works, with similarity accuracies that even rival those obtained in an ideal mono-ontology setting.

A New Approach for Measuring Semantic Similarity in Ontology and Its Application in Information Retrieval

Lecture Notes in Computer Science, 2012

Word similarity assessment is one of the most important elements in Natural Language Processing (NLP) and information retrieval. Evaluating semantic similarity of concepts is a problem that has been extensively investigated in the literature in different areas, such as artificial intelligence, cognitive science, databases and software engineering. Semantic similarity relates to computing the similarity between conceptually similar but not necessarily lexically similar terms. Currently, its importance is growing in different settings, such as digital libraries, heterogeneous databases and in particular the Semantic Web. In this paper, authors present a search engine framework using Google API that expands the user query based on similarity scores of each term of user's query. The authors calculated the semantic similarity of noun words to obtain the related concepts described by the search query using WordNet. Users query is replaced with concepts discovered from the similarity measures. Authors present a new approach to compute the semantic similarity between words. A common data set of word pairs is used to evaluate the proposed approach: first calculate the semantic similarities of 30 word pairs, then the correlation coefficient between human judgement and three computational measures are calculated, the experimental result shows new approach is better than other existing computational models.

A Survey on Semantic Similarity Measures

Measuring the semantic similarity between words, sentences and concepts is an important task in information retrieval, document clustering, web mining and word sense disambiguation. Semantic similarity is basically a measure used to compute the extent of similarity between two concepts based on the likeliness of their meaning. This survey discusses the existing similarity measures by partitioning them into two approaches: Corpus-based and Knowledge-based. The features, performance, advantages and disadvantages of various semantic similarity measures are discussed. The aim of this paper is to provide an efficient evaluation of all these measures and help the researchers to select the best measure according to their requirement.