A semantic similarity method based on information content exploiting multiple ontologies (original) (raw)
Related papers
Semantic similarity has become, in recent years, the backbone of numerous knowledgebased applications dealing with textual data. From the different methods and paradigms proposed to assess semantic similarity, ontology-based measures and, more specifically, those based on quantifying the Information Content (IC) of concepts are the most widespread solutions due to their high accuracy. However, these measures were designed to exploit a single ontology. They thus cannot be leveraged in many contexts in which multiple knowledge bases are considered. In this paper, we propose a new approach to achieve accurate IC-based similarity assessments for concept pairs spread throughout several ontologies. Based on Information Theory, our method defines a strategy to accurately measure the degree of commonality between concepts belonging to different ontologies-this is the cornerstone for estimating their semantic similarity. Our approach therefore enables classic IC-based measures to be directly applied in a multiple ontology setting. An empirical evaluation, based on well-established benchmarks and ontologies related to the biomedical domain, illustrates the accuracy of our approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works. unambiguously retrieved from ontologies and similarities can be assessed from structured knowledge that has been explicitly formalised by human experts.
2006
Semantic Similarity relates to computing the similarity between concepts (terms) which are not necessarily lexically similar. We investigate approaches to computing semantic similarity by mapping terms to an ontology and by examining their relationships in that ontology. More specifically, we investigate approaches to computing the semantic similarity between natural language terms (using WordNet as the underlying reference ontology) and between medical terms (using the MeSH ontology of medical and biomedical terms). The most popular semantic similarity methods are implemented and evaluated using WordNet and MeSH. The focus of this work is also on cross ontology methods which are capable of computing the semantic similarity between terms stemming from different ontologies (WordNet and MeSH in this work). This is a far more difficult problem (than the single ontology one referred to above) which has not been investigated adequately in the literature. X-Similarity, a novel cross-ontology similarity method is also a contribution of this work. All methods examined in this work are integrated into a semantic similarity system which is accessible on the Web.
Comparison of ontology-based semantic-similarity measures
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2008
Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.
Journal of the Association for Information Science and Technology, 2018
Finding similarity between concepts based on semantics has become a new trend in many applications (e.g., biomedical informatics, natural language processing). Measuring the Semantic Similarity (SS) with higher accuracy is a challenging task. In this context, the Information Content (IC)-based SS measure has gained popularity over the others. The notion of IC evolves from the science of information theory. Information theory has very high potential to characterize the semantics of concepts. Designing an IC-based SS framework comprises (i) an IC calculator, and (ii) an SS calculator. In this article, we propose a generic intrinsic IC-based SS calculator. We also introduce here a new structural aspect of an ontology called DCS (Disjoint Common Subsumers) that plays a significant role in deciding the similarity between two concepts. We evaluated our proposed similarity calculator with the existing intrinsic IC-based similarity calculators, as well as corpora-dependent similarity calculators using several benchmark data sets. The experimental results show that the proposed similarity calculator produces a high correlation with human evaluation over the existing state-of-the-art ICbased similarity calculators.
2011
The estimation of the semantic similarity between terms provides a valuable tool to enable the understanding of textual resources. Many semantic similarity computation paradigms have been proposed both as general-purpose solutions or framed in concrete fields such as biomedicine. In particular, ontology-based approaches have been very successful due to their efficiency, scalability, lack of constraints and thanks to the availability of large and consensus ontologies (like WordNet or those in the UMLS). These measures, however, are hampered by the fact that only one ontology is exploited and, hence, their recall depends on the ontological detail and coverage. In recent years, some authors have extended some of the existing methodologies to support multiple ontologies. The problem of integrating heterogeneous knowledge sources is tackled by means of simple terminological matchings between ontological concepts. In this paper, we aim to improve these methods by analysing the similarity between the modelled taxonomical knowledge and the structure of different ontologies. As a result, we are able to better discover the commonalities between different ontologies and hence, improve the accuracy of the similarity estimation. Two methods are proposed to tackle this task. They have been evaluated and compared with related works by means of several widely-used benchmarks of biomedical terms using two standard ontologies (WordNet and MeSH). Results show that our methods correlate better, compared to related works, with the similarity assessments provided by experts in biomedicine.
2011
Semantic similarity estimation is an important component of analysing natural language resources like clinical records. Proper understanding of concept semantics allows for improved use and integration of heterogeneous clinical sources as well as higher information retrieval accuracy. Semantic similarity has been the focus of much research, which has led to the definition of heterogeneous measures using different theoretical principles and knowledge resources in a variety of contexts and application domains.
Semantic similarity estimation from multiple ontologies
Applied Intelligence, 2012
The estimation of semantic similarity between words is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modelled in an ontology have been proposed. However, in many domains, knowledge is dispersed through several partial and/or overlapping ontologies. Because most previous works on semantic similarity only support a unique input ontology, we propose a method to enable similarity estimation across multiple ontologies. Our method identifies different cases according to which ontology/ies input terms belong. We propose several heuristics to deal with each case, aiming to solve missing values, when partial knowledge is available, and to capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge. We evaluate and compare our method using several general purpose and biomedical benchmarks of word pairs whose similarity has been assessed by human experts, and several general purpose (WordNet) and biomedical ontologies (SNOMED CT and MeSH). Results show that our method is able to improve the accuracy of similarity estimation in comparison to single ontology approaches and against state of the art related works in multi-ontology similarity assessment.
X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies
Journal of Digital Information Management, 2006
Semantic Similarity relates to computing the similarity between concepts (terms) which are not necessarily lexically similar. We investigate approaches to computing semantic similarity by mapping terms to an ontology and by examining their relationships in that ontology. More specifically, we investigate approaches to computing the semantic similarity between natural language terms (using WordNet as the underlying reference ontology) and between medical terms (using the MeSH ontology of medical and biomedical terms). The most popular semantic similarity methods are implemented and evaluated using WordNet and MeSH. The focus of this work is also on cross ontology methods which are capable of computing the semantic similarity between terms stemming from different ontologies (WordNet and MeSH in this work). This is a far more difficult problem (than the single ontology one referred to above) which has not been investigated adequately in the literature. X-Similarity, a novel cross-ontology similarity method is also a contribution of this work. All methods examined in this work are integrated into a semantic similarity system which is accessible on the Web.
Towards the estimation of feature-based semantic similarity using multiple ontologies
A key application of ontologies is the estimation of the semantic similarity between terms. By means of this assessment, the comprehension and management of textual resources can be improved. However, most ontology-based similarity measures only support a single input ontology. If any of the compared terms do not belong to that ontology, their similarity cannot be assessed. To solve this problem, multiple ontologies can be considered. Even though there are methods that enable the multi-ontology similarity assessment by means of integrating concepts from different ontologies, most of them are based on simple terminological and/or partial matchings. This hampers similarity measures that exploit a broad set of taxonomic evidences of similarity, like feature-based ones. In this paper, we tackle this problem by proposing a method to identify all the suitable matchings between concepts of different ontologies that intervene in the similarity assessment. In addition to the obvious terminological matching, we exploit the ontological structure and the notion of concept subsumption to discover non-trivial equivalences between heterogeneous ontologies. Our final goal is to enable the accurate application of feature-based similarity measures in a multi-ontology setting. Our proposal is evaluated with regard human judgements of similarity for several benchmarks and ontologies. Results shows an improvement against related works, with similarity accuracies that even rival those obtained in an ideal mono-ontology setting.
Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies
IEEE Transactions on Systems, Man, and Cybernetics - TSMC, 2009
Most of the intelligent knowledge-based applications contain components for measuring semantic similarity between terms. Many of the existing semantic similarity measures that use ontology structure as their primary source cannot measure semantic similarity between terms and concepts using multiple ontologies. This research explores a new way to measure semantic similarity between biomedical concepts using multiple ontologies. We propose a new ontology-structure-based technique for measuring semantic similarity in single ontology and across multiple ontologies in the biomedical domain within the framework of unified medical language system (UMLS). The proposed measure is based on three features: 1) cross-modified path length between two concepts; 2) a new feature of common specificity of concepts in the ontology; and 3) local granularity of ontology clusters. The proposed technique was evaluated relative to human similarity scores and compared with other existing measures using two terminologies within UMLS framework: medical subject headings and systemized nomenclature of medicine clinical term. The experimental results validate the efficiency of the proposed technique in single and multiple ontologies, and demonstrate that our proposed measure achieves the best results of correlation with human scores in all experiments.