Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser (original) (raw)

An Ensemble Method for Selection of High Quality Parses

2007

While the average performance of statistical parsers gradually improves, they still attach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as information retrieval and question answering.

Precision-biased Parsing and High-Quality Parse Selection

2012

We introduce precision-biased parsing: a parsing task which favors precision over recall by allowing the parser to abstain from decisions deemed uncertain. We focus on dependency-parsing and present an ensemble method which is capable of assigning parents to 84% of the text tokens while being over 96% accurate on these tokens. We use the precision-biased parsing task to solve the related high-quality parse-selection task: finding a subset of high-quality (accurate) trees in a large collection of parsed text. We present a method for choosing over a third of the input trees while keeping unlabeled dependency parsing accuracy of 97% on these trees. We also present a method which is not based on an ensemble but rather on directly predicting the risk associated with individual parser decisions. In addition to its efficiency, this method demonstrates that a parsing system can provide reasonable estimates of confidence in its predictions without relying on ensembles or aggregate corpus counts.

Critical Review of Automatic Parsers The Stanford Parser and The Berkeley Parser

This paper provides a critical review of two highly esteemed parsers: the Stanford and the Berkeley Parsers. The parsers are examined in terms of the constituency. Both parsers prove efficiency, especially with complex sentences, yet Berkeley's parser provides more efficient accuracy. Both parsers could not deal with common syntactic challenges, namely garden path, elliptical sentences, and topicalization. It is recommended that both parsers provide additional user-friendly interfaces and services.

Improved Fully Unsupervised Parsing with Zoomed Learning

2010

We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs T i , S i of T and S such that when the unsupervised parser is trained on a training subset T i its results on its paired test subset S i are better than when it is trained on the entire training set T . A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm's effect on the leading algorithm for the task of fully unsupervised parsing in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

Parser evaluation: a survey and a new proposal

1998

We present a critical overview of the state-of-the-art in parser evaluation methodologies and metrics. A discussion of their relative strengths and weaknesses motivates a new-and we claim more informative and generally applicable-technique of measuring parser accuracy, based on the use of grammatical relations. We conclude with some preliminary results of experiments in which we use this new scheme to evaluate a robust parser of English.

Effective self-training for parsing

Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -, 2006

We present a simple, but surprisingly effective, method of self-training a twophase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f -score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

ULISSE: an unsupervised algorithm for detecting reliable dependency parses

2011

In this paper we present ULISSE, an unsupervised linguistically-driven algorithm to select reliable parses from the output of a dependency parser. Different experiments were devised to show that the algorithm is robust enough to deal with the output of different parsers and with different languages, as well as to be used across different domains. In all cases, ULISSE appears to outperform the baseline algorithms.

Confidence driven unsupervised semantic parsing

2011

Current approaches for semantic parsing take a supervised approach requiring a considerable amount of training data which is expensive and difficult to obtain. This supervision bottleneck is one of the major difficulties in scaling up semantic parsing.

A Survey Paper for Parsing of English Sentences Using Context-Free Grammar

2022

The objective of parsing is to transform a natural language sentence it in to a standard order. and in a same way a sentence is tokenized with an appropriate format. There are certain English grammar evaluation rules and the parsing approach which is to be followed for the proper formation of a particular sentence syntactically and semantically using the parsing approach. A sentence in English language is the main element in the semantic parser, which creates a parse tree with the help of applying semantic dating technique to a number of phrases. A parser divides a token into smaller components by applying sets of guidelines that characterize and a series of the tokens to determine its structure of the language, which specified by grammar. The illustration provides easy records on grammatical connections, which can simply know and put into practice with those who have no prior knowledge of the language, such as those who need to obtain textual family members. The semantic family members represent the relationships of a number of the words in the sentence. We advocate utilizing our parser to acquire the tagged sets as well as a context-free layout grammatical representation for the source form. All pronouns, adverbs, singular, plural, nouns, verbs, people, adjectives, tenses and other words are kept in a database.

Inspecting the structural biases of dependency parsing algorithms

2010

This paper presents an algorithm for unsupervised co-occurrence based parsing that improves and extends existing approaches. The proposed algorithm induces a contextfree grammar of the language in question in an iterative manner. The resulting structure of a sentence will be given as a hierarchical arrangement of constituents. Although this algorithm does not use any a priori knowledge about the language, it is able to detect heads, modifiers and a phrase type's different compound composition possibilities. For evaluation purposes, the algorithm is applied to manually annotated part-of-speech tags (POS tags) as well as to word classes induced by an unsupervised part-of-speech tagger.