Evaluating wordnet-based measures of lexical semantic relatedness (original) (raw)

WordNet:: Similarity: measuring the relatedness of concepts

Demonstration Papers at HLT- …, 2004

WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures

Workshop on WordNet and Other Lexical …, 2001

Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining their performance in a real-word spelling correction system. It was found that Jiang and Conrath's measure gave the best results overall. That of Hirst and St-Onge seriously over-related, that of Resnik seriously under-related, and those of Lin and of Leacock and Chodorow fell in between.

Survay of Semantic Similarity by Wikipedia

2015

Semantic similarity or semantic relatedness is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. [1] Concretely, Semantic similarity can be estimated by defining a topological similarity, by using ontologies to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (e.g., a taxonomy), would be the shortest-path linking the two concept nodes. Based on text analyses, semantic relatedness between units of language (e.g., words, sentences) can also be estimated using statistical means such as a vector space model to correlate words and textual contexts from a suitable text corpus. An extensive survey dedicated to the notion of semantic measures and semantic similarity is proposed in: Semantic Measures for the Comparison of Units of Language, Concepts or Entities from Text and Knowledge Base Analysis. [1]

A new semantic relatedness measurement using WordNet features

Knowledge and Information Systems, 2013

Computing semantic similarity/relatedness between concepts or words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun "is a" taxonomy, the nominalization relation allowing the use of verb "is a" taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.

SEMANTIC RELATEDNESS CALCULATION METHOD // МЕТОД ВИЗНАЧЕННЯ СЕМАНТИЧНОЇ ЗВ'ЯЗНОСТІ

The article describes a new method for determination of semantic relatedness. The method is based on statistical data collected from text corpora and principles of distributive semantics. A set of basic hypotheses lies at the basis of the method. Each hypothesis is a feature of semantic relatedness itself and can be used separately from the method. Total offered more than 170 hypotheses (including sub-hypothesis). The main hypotheses can be split on the following classes: 1. Basic hypotheses - reflects the frequency characteristics of the common occurrences of the words. 2. Hypotheses with normalization by length of the context. 3. Hypotheses with normalization by the number of words. 4. Distances based hypotheses. 5. Hypotheses with normalization by the number of documents. 6. Hypotheses based on the calculation of weighted distances on a graph 7. A set of methods with variational calculation of the combined information. 8. A set of methods with logarithm of values and variational calculation of the combined information. 9. Hypotheses with different PMI modefications. 10. Hypotheses that calculate relatedness only for words with common occurrences above the certain thresholds and mixed hypotheses based on PMI, calculated over statistics from other hypotheses. The article shows graphs for Spearman and Pearson correlations for each hypotheses on the benchmark sets. The plots contain marked boundaries for hypothesis’s classes and a correlation for each class. Also, noted various behavior of the same hypothesis on different benchmark sets, which confirms our observation of the different nature of the benchmark sets. Further, on the basis of the proposed hypotheses, created aggregating model for evaluating semantic relatedness. Model measured on all benchmark sets. Received ratings exceeded the evaluation of other existing methods, this confirms the effectiveness of the proposed model.

Recent advances in methods of lexical semantic relatedness – a survey

Natural Language Engineering, 2012

Measuring lexical semantic relatedness is an important task in Natural Language Processing (NLP). It is often a prerequisite to many complex NLP tasks. Despite an extensive amount of work dedicated to this area of research, there is a lack of an up-to-date survey in the field. This paper aims to address this issue with a study that is focused on four perspectives: (i) a comparative analysis of background information resources that are essential for measuring lexical semantic relatedness; (ii) a review of the literature with a focus on recent methods that are not covered in previous surveys; (iii) discussion of the studies in the biomedical domain where novel methods have been introduced but inadequately communicated across the domain boundaries; and (iv) an evaluation of lexical semantic relatedness methods and a discussion of useful lessons for the development and application of such methods. In addition, we discuss a number of issues in this field and suggest future research direc...

Wordnet vs . Distributional determination of word similarity

2007

In this paper we propose a kind of distributional word similarity after extracting some syntactic relation in the sen tence. As a supplement to the richness of WordNet, it acquires some lexical knowl edge from large volume of corpus. It is also a powerful tool to judge word si milarity. Introduction and Aims The problem of how similar words are or how closely words are associated has important applications in several areas including a utomatic thesaurus construction (Grefenstette 1992; Hearst 1992; Riloff and Shepher d 1997; Pichon and Sébillot 1998; Berland and Charniak 1999; Caraballo 1999), w ord sense disambiguation (Dagan, Marcus et al. 1993; Lin 1997; Dominic 2003) and information retrieval (Salton 1973; Grefenstette 1992; Leacock, Towell et al. 1996). Word Association Norms ( refs) is a knowledge base available for word associatio n applications based on empirical results from Psycho logical word association experiments in which a subject is given a stimulus word and asked to...

Word Similarity In WordNet

Modeling, Simulation and Optimization of Complex Processes, 2008

We present a new information theoretic approach to measure the semantic similarity between concepts. By exploiting advantages of distance (edge-base) approach for taxonomic tree-like concepts, we enhance the strength of information theoretic (node-based) approach. Our measure therefore gives a complete view of word similarity, which cannot be achieved by solely applying node-based approaches. Our experimental measure achieves 88%, correlating with human rating.