Unifying ontological similarity measures: A theoretical and empirical investigation (original) (raw)
Related papers
Gene ontology (GO) which described a biological concept of gene has attracted attention as an index for measuring semantic similarity of gene. This paper considers a new method for measuring the semantic similarity of GO through an extension and combination of two existing methods by Resnik and Wang et al. in order to improve their drawbacks of effects of shallow annotation. It is shown that the proposed method is superior to existing methods through experiments with pathway data.
Semantic Similarity in Biomedical Ontologies
Plos Computational Biology, 2009
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies. Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
Semantic Similarity Measures as Tools for Exploring the Gene Ontology
Pacific Symposium on Biocomputing, 2003
Many bioinformatics resources hold data in the form of sequences. Often this sequence data is associated with a large amount of annota- tion. In many cases this data has been hard to model, and has been represented as scientic natural language, which is not readily compu- tationally amenable. The development of the Gene Ontology provides us with a more accessible
gSemSim: Semantic Similarity Measure for Intra Gene Ontology Terms
International Journal of Information Technology and Computer Science, 2013
Gene Ontology (GO) is an important bioinformat ics scheme to unify the representation of gene and gene product attributes across all species. Measuring similarity or distance between GO terms is a key step for determining hidden relationship between genes. The notion of similarity between GO terms is a usual step in knowledge d iscovery related tasks. In literature various similarity measures between GO terms have been proposed. We have introduced a novel similarity measure scheme to improve three conventional similarity measures to reduce their limitat ions. The salient feature of the proposed GO Semantic Similarity (gSemSim) measure is its ability to show more realistic similarity between concepts in perspective of do main knowledge. A co mparative result with other technique has also been presented that showing an improved contextual meaning o f the proposed semantic similarity. Th is study is expected to assist the community of bio informaticians in the selection of better similarity measure required for correct annotations of genes in gene ontology.
Hepatitis Monthly, 2017
Background: Gene ontology (GO) is a well-structured knowledge of biological terms that describes roles of genes and their products in a standardized and organized controlled vocabulary format. Over the last decade, many measures are developed to exploit GO advantages to determine semantic similarities between biological entities. Using GO ontologies, there are some constraints that existing GO-based semantic similarity measures try to address them. For instance, (1) edges in a GO graph, do not indicate uniform distances and also have different densities, and (2) ignoring term levels in an ontology makes "shallow annotation" drawback, i.e., two terms with a certain distance near the root of GO graph have equal semantic similarity with two terms with the same distance but far from the root. Methods: Here, we present wAIC, a two-stage hybrid semantic similarity measure using weighted aggregation of information contents. In wAIC, the impact of each common ancestor on semantic similarity value is determined according to the location of the ancestor in the ontology graph. wAIC, also, filters (from annotating term set) terms that are in upper levels of the graph ontology to reduce shallow annotation constraints. Results: Experimental results confirm that the proposed measure is more consistent with major related constraints, such that, wAIC semantic similarity values have more correlation with both sequence similarity values and gene expression based similarity values than state-of-the-art semantic similarity measures. Conclusions: WAIC show using a weighted aggregation of common ancestors is completely consistent with the human perception and can improve accuracy of gene similarity measurement.
Journal of Biomedical Informatics, 2014
Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context. Please cite this article in press as: Harispe S et al. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. J Biomed Inform (2013), http://dx.Please cite this article in press as: Harispe S et al. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. J Biomed Inform (2013), http://dx.Please cite this article in press as: Harispe S et al. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. J Biomed Inform (2013), http://dx.
An ontology-based measure to compute semantic similarity in biomedicine
Journal of Biomedical Informatics, 2010
Proper understanding of textual data requires the exploitation and integration of unstructured and heterogeneous clinical sources, healthcare records or scientific literature, which are fundamental aspects in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing, classification and structuring of textual resources. In the past, several approaches for assessing word similarity by exploiting different knowledge sources (ontologies, thesauri, domain corpora, etc.) have been proposed. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies (such as MeSH or SNOMED CT). In this paper, these approaches are introduced and analyzed in order to determine their advantages and limitations with respect to the considered knowledge bases. After that, a new measure based on the exploitation of the taxonomical structure of a biomedical ontology is proposed. Using SNOMED CT as the input ontology, the accuracy of our proposal is evaluated and compared against other approaches according to a standard benchmark of manually ranked medical terms. The correlation between the results of the evaluated measures and the human experts' ratings shows that our proposal outperforms most of the previous measures avoiding, at the same time, some of their limitations.
2013
There is a prominent trend to augment and improve the formality of biomedical ontologies. For example, this is shown by the current effort on adding description logic axioms, such as disjointness. One of the key ontology applications that can take advantage of this effort is the conceptual (functional) similarity measurement. The presence of description logic axioms in biomedical ontologies make the current structural or extensional approaches weaker and further away from providing sound semantics-based similarity measures. Although beneficial in small ontologies, the exploration of description logic axioms by semantics-based similarity measures is computational expensive. This limitation is critical for biomedical ontologies that normally contain thousands of concepts. Thus in the process of gaining their rightful place, biomedical functional similarity measures have to take the journey of finding how this rich and powerful knowledge can be fully explored while keeping feasible com...