Part-of-speech tagging with minimal lexicalization (original) (raw)

Feature-rich part-of-speech tagging with a cyclic dependency network

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03, 2003

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering, 2004

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

We present a Markov part-of-speech tagger for which the P (wjt) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of represen- tations of w. As word representations, string suxes of w are cut o at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suf- xes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are signican tly outperformed by our model.

Bayesian Reinforcement for a Probabilistic Neural Net Part-of-Speech Tagger

2004

The present paper introduces a novel stochastic model for Part-Of-Speech tagging of natural language texts. While previous statistical approaches, such as Hidden Markov Models, are based on theoretical assumptions that are not always met in natural language, we propose a methodology which incorporates fundamental elements of two distinct machine learning disciplines. We make use of Bayesian knowledge representation to provide a robust classifier, namely a Probabilistic Neural Network one, with additional context information in order to better infer on the correct Part-Of-Speech label. As for training material, we make use of minimal linguistic information, i.e. only a small lexicon which contains the words that belong to non-declinable POS categories and closed-class words. Such minimal information is augmented by statistical parameters generated by Bayesian networks learning and the outcome is fed into the Probabilistic Neural Network classifier for the task of Part-Of-Speech tagging. Experimental results portray satisfactory performance, in terms of 3.5%-4% error rate.

Towards a Bayesian Stochastic Part-Of-Speech and Case Tagger of Natural Language Corpora

This paper introduces and evaluates a Bayesian Network probabilistic model for automatic Part-Of-Speech tagging of Modern Greek natural language texts. The Bayesian model for the task of POS tagging is mathematically formed and is compared to that of Hidden Markov, a broadly applied methodology. Our model is trained from annotated corpora, using lexical as well as contextual information. Unlike the majority of existing taggers, it uses minimal linguistic resources, namely a small lexicon which contains the words that belong to non-declinable POS categories and closed-class words. Furthermore, the model is augmented to infer on the case of an unseen word as well. Experimental results depict accuracy in the range of 91%-96% for POS and a range of 93%-97% for case tagging. 120,000 and 250,000 words each. This material was assembled in the framework of the

Unsupervised Part-of-Speech Tagging in the Large

Research on Language and Computation, 2009

Syntactic preprocessing is a step that is widely used in NLP applications. Traditionally, rule-based or statistical Part-of-Speech (POS) taggers are employed that either need considerable rule development times or a sufficient amount of manually labeled data. To alleviate this acquisition bottleneck and to enable preprocessing for minority languages and specialized domains, a method is presented that constructs a statistical syntactic tagger model from a large amount of unlabeled text data. The method presented here is called unsupervised POS-tagging, as its application results in corpus annotation in a comparable way to what POS-taggers provide. Nevertheless, its application results in slightly different categories as opposed to what is assumed by a linguistically motivated POS-tagger. These differences hamper evaluation procedures that compare the output of the unsupervised POS-tagger to a tagging with a supervised tagger. To measure the extent to which unsupervised POS-tagging can contribute in application-based settings, the system is evaluated in supervised POStagging, word sense disambiguation, named entity recognition and chunking. Unsupervised POS-tagging has been explored since the beginning of the 1990s. Unlike in previous approaches, the kind and number of different tags is here generated by the method itself. Another difference to other methods is that not all words above a certain frequency rank get assigned a tag, but the method is allowed to exclude words from the clustering, if their distribution does not match closely enough with other words. The lexicon size is considerably larger than in previous approaches, resulting in a lower out-of-vocabulary (OOV) rate and in a more consistent tagging. The system presented here is available for download as open-source software along with tagger models for several languages, so the contributions of this work can be easily incorporated into other applications.

Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

2000

This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

An Overview of Data-Driven Part-of-Speech Tagging

2016

Over the last twenty years or so, the approaches to partof-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet to ensure that the full lexicon information is available for each wordform in the output. The paper overviews some of the major approaches to part-of-speech tagging and touches upon the tagset design, which is of crucial importance for the accuracy of the process. Key-words: ambiguity class, data sparseness, lexical ambiguity, machine learning, multilinguality, part-of-speech tagging, tagset design.

Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

2012

Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MIN-GREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to performance over the original MIN-GREEDY algorithm for both English and Italian data.

UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)

2020

English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’opera...