Probabilistic parsing and psychological plausibility (original) (raw)
Related papers
A Probabilistic Earley Parser as a Psycholinguistic Model
In human sentence processing, cognitive load can be defined many ways. This report considers a definition of cognitive load in terms of the total probability of structural options that have been disconfirmed at some point in a sentence: the surprisal of word w i given its prefix w 0...i−1 on a phrase-structural language model. These loads can be efficiently calculated using a probabilistic Earley parser which is interpreted as generating predictions about reading time on a word-by-word basis. Under grammatical assumptions supported by corpusfrequency data, the operation of Stolcke's probabilistic Earley parser correctly predicts processing phenomena associated with garden path structural ambiguity and with the subject/object relative asymmetry.
Wide-coverage probabilistic sentence processing
2000
This paper describes a fully implemented, broad-coverage model of human syntactic processing. The model uses probabilistic parsing techniques, which combine phrase structure, lexical category, and limited subcategory probabilities with an incremental, left-to-right "pruning" mechanism based on cascaded Markov models. The parameters of the system are established through a uniform training algorithm, which determines maximum-likelihood estimates from a parsed corpus. The probabilistic parsing mechanism enables the system to achieve good accuracy on typical, "garden-variety" language (i.e., when tested on corpora). Furthermore, the incremental probabilistic ranking of the preferred analyses during parsing also naturally explains observed human behavior for a range of garden-path structures. We do not make strong psychological claims about the specific probabilistic mechanism discussed here, which is limited by a number of practical considerations. Rather, we argue incremental probabilistic parsing models are, in general, extremely well suited to explaining this dual nature-generally good and occasionally pathological-of human linguistic performance.
Probabilistic parsing strategies
Journal of the ACM (JACM), 2006
We present new results on the relation between purely symbolic contextfree parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.
New Developments in Formal Languages and …, 2008
A paper in a previous volume [1] explained parsing, which is the process of determining the parses of an input string according to a formal grammar. Also discussed was tabular parsing, which solves the task of parsing in polynomial time by a form of dynamic programming. In passing, we also mentioned that parsing of input strings can be easily generalised to parsing of finite automata.
The Current Situation in Stochastic Parsing
The earliest corpus-based approaches to stochastic parsing (eg Sampson et al.(1989), Fujisaki et al.(1989), Sharman et al.(1990), Black (1992)) used a variety of data resources and evaluation techniques.
A memory-based model of syntactic analysis: data-oriented parsing
Journal of Experimental & Theoretical Artificial Intelligence, 1999
This paper presents a memory−based model of human syntactic processing: Data−Oriented Parsing. After a brief introduction (section 1), it argues that any account of disambiguation and many other performance phenomena inevitably has an important memory−based component (section 2). It discusses the limitations of probabilistically enhanced competence−grammars, and argues for a more principled memory−based approach (section 3). In sections 4 and 5, one particular memory−based model is described in some detail: a simple instantiation of the "Data−Oriented Parsing" approach ("DOP1"). Section 6 reports on experimentally established properties of this model, and section 7 compares it with other memory−based techniques. Section 8 concludes and points to future work.
1999
Diese Arbeit untersucht zwei komplementare Ansatze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung naturlicher Sprache. Zunachst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu reprasentieren. Dann werden Ansatze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Fur das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schatzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen. This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not ...
An alternative method of training probabilistic LR parsers
2004
Abstract We discuss existing approaches to train LR parsers, which have been used for statistical resolution of structural ambiguity. These approaches are nonoptimal, in the sense that a collection of probability distributions cannot be obtained. In particular, some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar, under the restrictions of the existing approaches to training of LR parsers.
A computational model of prediction in human parsing: Unifying locality and surprisal effects
2009
There is strong evidence that human sentence processing is incremental, i.e., that structures are built word by word. Recent experiments show that the processor also predicts upcoming linguistic material on the basis of previous input. We present a computational model of human parsing that is based on a variant of tree-adjoining grammar and includes an explicit mechanism for generating and verifying predictions, while respecting incrementality and connectedness. An algorithm for deriving a lexicon from a treebank, a fully implemented parser, and a probability model for this formalism are also presented. We devise a linking function that explains processing difficulty as a combination of prefix probability (surprisal) and verification cost. The resulting model captures locality effects such as the subject/object relative clause asymmetry, as well as surprisal effects such as prediction in either. .. or constructions.