Automatic Word Sense Disambiguation Using Cooccurrence and Hierarchical (original) (raw)
We review in detail here a polished version of the systems with which we participated in the Senseval-2 competition English tasks (all words and lexical sample). It is based on a combination of selectional preference measured over a large corpus and hierarchical information taken from WordNet, as well as some additional heuristics. We use that information to expand sense glosses of the senses in WordNet and compare the similarity between the contexts vectors and the word sense vectors in a way similar to that used by Yarowsky and Schuetze. A supervised extension of the system is also discussed. We provide new and previously unpublished evaluation over the SemCor collection, which is two orders of magnitude larger than SENSEVAL-2 collections as well as comparison with baselines. Our systems scored first among unsupervised systems in both tasks. We note that the method is very sensitive to the quality of the characterizations of word senses; glosses being much better than training exa...