A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation (original) (raw)

An Ensemble Approach to Corpus Based Word Sense Disambiguation

2000

This paper presents a corpus--based approach to word sense disambiguation that combines a number of Naive Bayesian classifiers into an ensemble that performs disambiguation via a majority vote. Each of the member classifiers is based on collocation and co--occurrence features found in varying sized windows of context. This approach is motivated by the observation that, in general, enhancing the feature set or learning algorithm used by a corpus--based approach does not improve disambiguation accuracy beyond ...

Naive Bayes and exemplar-based approaches to word sense disambiguation revisited

2000

This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar-based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense-tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar-based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used.

A comparison between supervised learning algorithms for word sense disambiguation

Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000

This paper describes a set of comparative experiments, including cross-corpus evaluation, between five alternative algorithms for supervised Word Sense Disambiguation (WSD), namely Naive Bayes, Exemplar-based learning, SNoW, Decision Lists, and Boosting. Two main conclusions can be drawn: 1) The LazyBoosting algorithm outperforms the other four state-of-theart algorithms in terms of accuracy and ability to tune to new domains; 2) The domain dependence of WSD systems seems very strong and suggests that some kind of adaptation or tuning is required for cross-corpus application.

A Naïve Bayes Approach for Word Sense Disambiguation

The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve many NLP problems.The Naive Bayes algorithm which is one of the supervised learning techniques has worked well in many classification problems. In the present work, WSD task to disambiguate the senses of different words from the standard corpora available in the " 1998 SENSEVAL Word Sense Disambiguation (WSD) shared task " is performed by applying Naïve Bayes machine learning technique. It is observed that senses of ambiguous word having lesser number of part-of-speeches are disambiguated more correctly. Other key observation is that with lesser number of senses to be disambiguated, the chances of words being disambiguated with correct senses are more. I. INTRODUCTION The ambiguity in the senses of the words of different languages does exist inherently in all natural languages used by humans. There are many words in every language which carry more than one meaning for the same word. For example, the word ―chair‖ has one sense which means a piece of furniture and other sense of it means a person chairing say some session. So obviously we need some context to select the correct sense given a situation. Automatically selecting the correct sense given a context is in the core of solving many ambiguity problems. The word sense disambiguation (WSD) is the task to automatically determine which of the senses of an ambiguous (target) word is chosen in the specific use of the word by taking into consideration the context of word's use [1,2]. Having an accurate and reliable word sense disambiguation has been the target of natural language community since long. The motivation and belief behind performing word sense disambiguation is that many tasks which are performed under the umbrella of NLP are highly benefitted with properly disambiguated word senses.Statistical NLP, a special approach of NLP based onthe probabilistic, stochastic and statistical methods, uses machine learning algorithms to solve many NLP problems. AS a branch ofartificial intelligence, machine learning involves computationallylearning patterns from given data, and applying to new or unseen data the pattern which were learned earlier. Machine learning is defined by Tom M.Mitchell as ―A computer program is said to learn from experience E with respect to some class of tasksT and performance measure P, if its performance at tasks in T,as measured by P, improves withexperience E [3].‖ Learning algorithms can be generally classified into three types: supervised learning, semi-supervised learning and unsupervised learning. Supervised learning technique is based on the idea of studying the features of positive and negative examples over a large collection of annotated corpus. Semi-supervised learning uses both labeled data and unlabeled data for the learning process to reduce the dependence on training data. In the unsupervised learning, decisions are made on the basis of unlabeled data. The methods of unsupervised learning are mostly built upon clustering techniques, similarity based functions and distribution statistics. For automatic WSD,supervised learningis one ofthe most successfulapproaches.

A New Supervised Learning Algorithm for Word Sense Disambiguation

1997

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to nd a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best{ tting model at each level of model complexity. The Naive Mix utilizes this sequence of models to de ne a probabilistic model which is then used as a probabilistic classi er to perform word{sense disambiguation. The models in this sequence are restricted to the class of decomposable log{linear models. This class of models o ers a number of computational advantages. Experiments disambiguating twelve di erent words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearest{neighbor classi cation (PEBLS).

Combining heterogeneous classifiers for word-sense disambiguation

Proceedings of the ACL-02 workshop on Word sense disambiguation recent successes and future directions -, 2002

This paper discusses ensembles of simple but heterogeneous classifiers for word-sense disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 English lexical sample task. First-order classifiers are combined by a second-order classifier, which variously uses majority voting, weighted voting, or a maximum entropy model. While individual first-order classifiers perform comparably to middle-scoring teams' systems, the combination achieves high performance. We discuss trade-offs and empirical performance. Finally, we present an analysis of the combination, examining how ensemble performance depends on error independence and task difficulty.

Simple features for statistical word sense disambiguation

In this paper, we describe our experiments on statistical word sense disambiguation (WSD) using two systems based on different approaches: Naïve Bayes on word tokens and Maximum Entropy on local syntactic and semantic features. In the first approach, we consider a context window and a sub-window within it around the word to disambiguate. Within the outside window, only content words are considered, but within the sub-window, all words are taken into account. Both window sizes are tuned by the system for each word to disambiguate and accuracies of 75% and 67% were respectively obtained for coarse and fine grained evaluations. In the second system, sense resolution is done using an approximate syntactic structure as well as semantics of neighboring nouns as features to a Maximum Entropy learner. Accuracies of 70% and 63% were obtained for coarse and fine grained evaluations.

Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014

We replace the overlap mechanism of the Lesk algorithm with a simple, generalpurpose Naive Bayes model that measures many-to-many association between two sets of random variables. Even with simple probability estimates such as maximum likelihood, the model gains significant improvement over the Lesk algorithm on word sense disambiguation tasks. With additional lexical knowledge from Word-Net, performance is further improved to surpass the state-of-the-art results.

Learning Rules for Large-Vocabulary Word Sense Disambiguation: A Comparison of Various Classifiers

Lecture Notes in Computer Science, 2000

In this article we compare the performance of various machine learning algorithms on the task of constructing word-sense disambiguation rules from data. The distinguishing characteristic of our work from most of the related work in the field is that we aim at the disambiguation of all content words in the text, rather than focussing on a small number of words. In an earlier study we have shown that a decision tree induction algorithm performs well on this task. This study compares decision tree induction with other popular learning methods and discusses their advantages and disadvantages. Our results confirm the good performance of decision tree induction, which outperforms the other algorithms, due to its ability to order the features used for disambiguation, according to their contribution in assigning the correct sense.

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation

2004

The success of supervised learning approaches to word sense disambiguation is largely dependent on the features used to represent the context in which an ambiguous word occurs. Previous work has reached mixed conclusions; some suggest that combinations of syntactic and lexical features will perform most effectively. However, others have shown that simple lexical features perform well on their own. This paper evaluates the effect of using different lexical and syntactic features both individually and in combination. We show that it is possible for a very simple ensemble that utilizes a single lexical feature and a sequence of part of speech features to result in disambiguation accuracy that is near state of the art.