On Different Approaches to Syntactic Analysis Into Bi-Lexical Dependencies An Empirical Comparison of Direct, PCFG-Based, and HPSG-Based Parsers (original) (raw)

Survey on parsing three dependency representations for English

In this paper we focus on practical issues of data representation for dependency parsing. We carry out an experimental comparison of (a) three syntactic dependency schemes; (b) three data-driven dependency parsers; and (c) the influence of two different approaches to lexical category disambiguation (aka tagging) prior to parsing. Comparing parsing accuracies in various setups, we study the interactions of these three aspects and analyze which configurations are easier to learn for a dependency parser.

Efficient Parsing of Syntactic and Semantic Dependency Stru ctures

2009

In this paper, we describe our system for the 2009 CoNLL shared task for joint parsing of syntactic and semantic dependency structures of multiple languages. Our system combines and implements efficient parsing techniques to get a high accuracy as well as very good parsing and training time. For the applications of syntactic and semantic parsing, the parsing time and memory footprint are very important. We think that also the development of systems can profit from this since one can perform more experiments in the given time. For the subtask of syntactic dependency parsing, we could reach the second place with an accuracy in average of 85.68 which is only 0.09 points behind the first ranked system. For this task, our system has the highest accuracy for English with 89.88, German with 87.48 and the out-of-domain data in average with 78.79. The semantic role labeler works not as well as our parser and we reached therefore the fourth place (ranked by the macro F1 score) in the joint task for syntactic and semantic dependency parsing.

Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation

Computational Linguistics, 2008

A number of researchers have recently conducted experiments comparing "deep" hand-crafted wide-coverage with "shallow" treebank-and machine-learning-based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit such experiments, this time using sophisticated automatic LFG f-structure annotation methodologies with surprising results. We compare various PCFG and history-based parsers to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers, RASP and XLE. We evaluate using dependency-based gold standards Computational Linguistics Volume 34, Number 1 and use the Approximate Randomization Test to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank, a statistically significant improvement of 2.18% over the most recent results of 80.55% for the hand-crafted LFG grammar and XLE parsing system and an f-score of 80.23% against the CBS 500 Dependency Bank, a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system.

Improving data-driven dependency parsing using large-scale LFG grammars

Meeting of the Association for Computational Linguistics, 2009

This paper presents experiments which combine a grammar-driven and a data- driven parser. We show how the con- version of LFG output to dependency representation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and German and show significant improvements stemming from the proposed

Inspecting the structural biases of dependency parsing algorithms

2010

This paper presents an algorithm for unsupervised co-occurrence based parsing that improves and extends existing approaches. The proposed algorithm induces a contextfree grammar of the language in question in an iterative manner. The resulting structure of a sentence will be given as a hierarchical arrangement of constituents. Although this algorithm does not use any a priori knowledge about the language, it is able to detect heads, modifiers and a phrase type's different compound composition possibilities. For evaluation purposes, the algorithm is applied to manually annotated part-of-speech tags (POS tags) as well as to word classes induced by an unsupervised part-of-speech tagger.

Does Universal Dependencies need a parsing representation? An investigation of English

2015

This paper investigates the potential of defining a parsing representation for English data in Universal Dependencies, a crosslingual dependency scheme. We investigate structural transformations that change the choices of headedness in the dependency tree. The transformations make auxiliaries, copulas, subordinating conjunctions and prepositions heads, while in UD they are dependents of a lexical head. We show experimental results for the performance of MaltParser, a data-driven transition-based parser, on the product of each transformation. While some transformed representations favor performance, inverting the transformations to obtain UD for the final product propagates errors, in part due to the nature of lexical-head representations. This prevents the transformations from being profitably used to improve parser performance in that representation.

The exploration of deterministic and efficient dependency parsing

Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006

In this paper, we propose a three-step multilingual dependency parser, which generalizes an efficient parsing algorithm at first phase, a root parser and postprocessor at the second and third stages. The main focus of our work is to provide an efficient parser that is practical to use with combining only lexical and part-ofspeech features toward language independent parsing. The experimental results show that our method outperforms Maltparser in 13 languages. We expect that such an efficient model is applicable for most languages.

A high-throughput dependency parser

2017

Dependency parsing is an important task in NLP, and it is used in many downstream tasks for analyzing the semantic structure of sentences. Analyzing very large corpora in a reasonable amount of time, however, requires a fast parser. In this thesis we develop a transitionbased dependency parser with a neural-network decision function which outperforms spaCy, Stanford CoreNLP, and MALTParser in terms of speed while having a comparable, and in some cases better, accuracy. We also develop several variations of our model to investigate the trade-off between accuracy and speed. This leads to a model with a greatly reduced feature set which is much faster but less accurate, as well as a more complex model involving a BiLSTM simultaneously trained to produce POS tags which is more accurate, but much slower. We compare the accuracy and speed of our different parser models against the three mentioned parsers on the Penn Treebank, Universal Dependencies English, and Ontonotes datasets using tw...

Improving parsing accuracy by combining diverse dependency parsers

Proceedings of the Ninth International Workshop on Parsing Technology - Parsing '05, 2005

This paper explores the possibilities of improving parsing results by combining outputs of several parsers. To some extent, we are porting the ideas of to the world of dependency structures. We differ from them in exploring context features more deeply. All our experiments were conducted on Czech but the method is language-independent. We were able to significantly improve over the best parsing result for the given setting, known so far. Moreover, our experiments show that even parsers far below the state of the art can contribute to the total improvement.

A Gold Standard Dependency Corpus for English

Language Resources and Evaluation, 2014

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.