Evaluating sense disambiguation across diverse parameter spaces (original) (raw)

A comparison between supervised learning algorithms for word sense disambiguation

Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000

This paper describes a set of comparative experiments, including cross-corpus evaluation, between five alternative algorithms for supervised Word Sense Disambiguation (WSD), namely Naive Bayes, Exemplar-based learning, SNoW, Decision Lists, and Boosting. Two main conclusions can be drawn: 1) The LazyBoosting algorithm outperforms the other four state-of-theart algorithms in terms of accuracy and ability to tune to new domains; 2) The domain dependence of WSD systems seems very strong and suggests that some kind of adaptation or tuning is required for cross-corpus application.

A New Supervised Learning Algorithm for Word Sense Disambiguation

1997

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to nd a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best{ tting model at each level of model complexity. The Naive Mix utilizes this sequence of models to de ne a probabilistic model which is then used as a probabilistic classi er to perform word{sense disambiguation. The models in this sequence are restricted to the class of decomposable log{linear models. This class of models o ers a number of computational advantages. Experiments disambiguating twelve di erent words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearest{neighbor classi cation (PEBLS).

Approaches for Word Sense Disambiguation - A Survey

International Journal of Recent Technology and Engineering, 2014

Word sense disambiguation is a technique in the field of natural language processing where the main task is to find the correct sense in which a word occurs in a particular context. It is found to be of vital help to applications such as question answering, machine translation, text summarization, text classification, information retrieval etc. This has resulted in excessive interest in approaches based on machine learning which performs classification of word senses automatically. The main motivation behind word sense disambiguation is to allow the users to make ample use of the available technologies because ambiguities present in any language provide great difficulty in the use of information technology as words in human language that occur in a particular context can be interpreted in more than one way depending on the context. In this paper we put forward a survey of supervised, unsupervised and knowledge based approaches and algorithms available in word sense disambiguation (WSD). Index Terms-Machine readable dictionary, Machine translation, Natural language processing, Wordnet, Word sense disambiguation.

Machine learning techniques for word sense disambiguation

2006

In the Natural Language Processing (NLP) community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word. These senses could be seen as the target labels of a classification problem. That is, Machine Learning (ML) seems to be a possible way to tackle this problem.

Simple features for statistical word sense disambiguation

In this paper, we describe our experiments on statistical word sense disambiguation (WSD) using two systems based on different approaches: Naïve Bayes on word tokens and Maximum Entropy on local syntactic and semantic features. In the first approach, we consider a context window and a sub-window within it around the word to disambiguate. Within the outside window, only content words are considered, but within the sub-window, all words are taken into account. Both window sizes are tuned by the system for each word to disambiguate and accuracies of 75% and 67% were respectively obtained for coarse and fine grained evaluations. In the second system, sense resolution is done using an approximate syntactic structure as well as semantics of neighboring nouns as features to a Maximum Entropy learner. Accuracies of 70% and 63% were obtained for coarse and fine grained evaluations.

Supervised Word Sense Disambiguation

2016

Word Sense Disambiguation (WSD) is the method of the correct sense for word in a context. In this paper we have researched the various approaches for WSD: Knowledge based, Supervised, Semi-supervised, Unsupervised methods. This paper has further elaborated on the supervised methods used for WSD. The methods that are compared in this paper are: Decision Trees, Decision Lists, Support Vector Machines, Neural Networks, Naïve Bayes methods, Exemplar learning.

A perspective on word sense disambiguation methods and their evaluation

1997

In this position paper, we make several observations about the state of the art in automatic word sense disambiguation. Motivated by these observations, we offer several specific proposals to the community regarding improved evaluation criteria, common training and testing resources, and the definition of sense inventories.

An Attempt to Formalize Word Sense Disambiguation: Maximizing Efficiency by Minimizing Computational Costs

Revista Espanola De Linguistica Aplicada, 2009

This paper presents an algorithm based on collocational data for word sense disambiguation (WSD). The aim of this algorithm is to maximize efficiency by minimizing (1) computational costs and (2) linguistic tagging/annotation. The formalization of our WSD algorithm is based on discriminant function analysis (DFA). This statistical technique allows us to parameterize each collocational item with its meaning, using just bare text. The parameterized data allow us to classify cases (sentences with an ambiguous word) into the values of a categorical dependent (each of the meanings of the ambiguous word). To evaluate the validity and efficiency of our WSD algorithm, we previously hand sense-tagged all the sentences containing ambiguous words and then cross-validated the hand sense-tagged data with the automatic WSD performance. Finally, we present the global results of our algorithm after applying it to a limited set of words in both languages: Spanish and English, highlighting the points...