Joint Ensemble Model for POS Tagging and Dependency Parsing (original) (raw)
Related papers
Combining POS Tagging, Dependency Parsing and Coreferential Resolution for Bulgarian
2013
This paper proposes a combined model for POS tagging, dependency parsing and co-reference resolution for Bulgarian — a pro-drop Slavic language with rich morphosyntax. We formulate an extension of the MSTParser algorithm that allows the simultaneous handling of the three tasks in a way that makes it possible for each task to benefit from the information available to the others, and conduct a set of experiments against a treebank of the Bulgarian language. The results indicate that the proposed joint model achieves state-of-theart performance for POS tagging task, and outperforms the current pipeline solution.
A data-driven dependency parser for Bulgarian
Proc. of the 4th Workshop on Treebanks and Linguistic Theories (TLT), 2005
One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representations and employs a deterministic algorithm to construct dependency structures in a single pass over the input string, guided by a memory-based ...
Feature-rich part-of-speech tagging for morphologically complex languages: Application to bulgarian
2012
We present experiments with part-ofspeech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.
2006
Typological diversity among the natural languages of the world poses interesting challenges for the models and algorithms used in syntactic parsing. In this paper, we apply a data-driven dependency parser to Turkish, a language characterized by rich morphology and flexible constituent order, and study the effect of employing varying amounts of morpholexical information on parsing performance. The investigations show that accuracy can be improved by using representations based on inflectional groups rather than word forms, confirming earlier studies. In addition, lexicalization and the use of rich morphological features are found to have a positive effect. By combining all these techniques, we obtain the highest reported accuracy for parsing the Turkish Treebank.
An improved joint model: POS tagging and dependency parsing
Journal of Artificial Intelligence and Data Mining, 2016
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipeline models, a tagging error propagates, but the model is not able to apply useful syntactic information. The goal of joint models simultaneously reduce errors of POS tagging and dependency parsing tasks. In this research, we attempted to utilize the joint model on the Persian and English language using Corbit software. We optimized the model's features and improved its accuracy concurrently. Corbit software is an implementation of a transition-based approach for word segmentation, POS tagging and dependency parsing. In this research, the joint accuracy of POS tagging and dependency parsing over the test data on Persian, reached 85.59% for coarse-grained and 84.24% for fine-grained POS. Also, we attained 76.01% for coarse-grained and 74.34% for fine-grained POS on English.
Constituency Parsing of Bulgarian: Word- vs Class-based Parsing
2014
In this paper, we report the obtained results of two constituency parsers trained with BulTreeBank, an HPSG-based treebank for Bulgarian. To reduce the data sparsity problem, we propose using the Brown word clustering to do an off-line clustering and map the words in the treebank to create a class-based treebank. The observations show that when the classes outnumber the POS tags, the results are better. Since this approach adds on another dimension of abstraction (in comparison to the lemma), its coarse-grained representation can be used further for training statistical parsers.
Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all ...
Croatian Dependency Treebank: Recent Development and Initial Experiments
We present the current state of development of the Croatian Dependency Treebank -with special empahsis on adapting the Prague Dependency Treebank formalism to Croatian language specifics -and illustrate its possible applications in an experiment with dependency parsing using MaltParser. The treebank currently contains approximately 2870 sentences, out of which the 2699 sentences and 66930 tokens were used in this experiment. Three linear-time projective algorithms implemented by the MaltParser system -Nivre eager, Nivre standard and stack projective -running on default settings were used in the experiment. The highest performing system, implementing the Nivre eager algorithm, scored (LAS 71.31 UAS 80.93 LA 83.87) within our experiment setup. The results obtained serve as an illustration of treebank's usefulness in natural language processing research and as a baseline for further research in dependency parsing of Croatian.
The paper reports on the recent forum RU-EVALa new initiative for evaluation of Russian NLP resources, methods and toolkits. It started in 2010 with evaluation of morphological parsers, and the second event RU-EVAL 2012 (2011-2012) focused on syntactic parsing. Eight participating IT companies and academic institutions submitted their results for corpus parsing. We discuss the results of this evaluation and describe the so-called "soПt" evaluation principles that allowed us to compare output dependency trees, which varied greatly depending on theoretical approaches, parsing methods, tag sets, and dependency orientations principles, adopted by the participants.
Parsing the SynTagRus treebank of Russian
Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08, 2008
We present the first results on parsing the SYNTAGRUS treebank of Russian with a data-driven dependency parser, achieving a labeled attachment score of over 82% and an unlabeled attachment score of 89%.