A strategy for the syntactic parsing of corpora: from Constraint Grammar output to unification-based processing (original) (raw)

Syntactic parsing: a survey

Computers and the Humanities, 1989

The appfication of syntactic parsing to computerassisted language instruction suggests approaches and presents problems not usually associated with non-educational parsing. The article identifies these issues and presents a general overview and assessment of grammar formalisms and parsing strategies in relation to language instruction. The discussion includes error analysis, morphology, bottom-up and top-down parsing, backtracking and deterministic parsing, wait-and-see parsing, context-free and augmented phrase structure grammar, augmented transition networks, logic grammars, and categorial grammars. Language teaching applications discussed include writing aids, reading aids, and conversational programs. have worked together on "Spion," an adventure game using syntactic and semantic parsing for German language instruction, and "Syncheck," a syntactic parser-based writing aM for intermediate and advanced college German students.

Three studies of grammar-based surface-syntactic parsing of unrestricted English text. A summary and orientation

1994

The dissertation addresses the design of parsing grammars for automatic surface-syntactic analysis of unconstrained English text. It consists of a summary and three articles. {\it Morphological disambiguation} documents a grammar for morphological (or part-of-speech) disambiguation of English, done within the Constraint Grammar framework proposed by Fred Karlsson. The disambiguator seeks to discard those of the alternative morphological analyses proposed by the lexical analyser that are contextually illegitimate. The 1,100 constraints express some 23 general, essentially syntactic statements as restrictions on the linear order of morphological tags. The error rate of the morphological disambiguator is about ten times smaller than that of another state-of-the-art probabilistic disambiguator, given that both are allowed to leave some of the hardest ambiguities unresolved. This accuracy suggests the viability of the grammar-based approach to natural language parsing, thus also contributing to the more general debate concerning the viability of probabilistic vs.\ linguistic techniques. {\it Experiments with heuristics} addresses the question of how to resolve those ambiguities that survive the morphological disambiguator. Two approaches are presented and empirically evaluated: (i) heuristic disambiguation constraints and (ii) techniques for learning from the fully disambiguated part of the corpus and then applying this information to resolving remaining ambiguities.

Parsing with Principles and Classes of Information

Studies in Linguistics and Philosophy, 1996

After Chomsky's (1981) introduction of the Government and Binding (GB) theory of grammar, a research area called GB parsing developed in the mid-eighties to explore parsing architectures based on that framework. In this area, parsing is viewed as the characterization of a mental process rather than a crude mapping from strings to syntactic structures. Therefore in GB parsing there is a need to develop a motivated mapping between the postulated model of humans' knowledge of language (the grammar) and the parsing architecture, an enterprise in which psychological as well as computational issues are at stake.

Grammars and parsing

2001

2 Context-Free Grammars 13 2.1 Languages 14 2.2 Grammars 17 2.2. 1 Notational convention s 20 2.3 The language of a gramma r 21 2.3. 1 Some basic language s 22 2.4 Parse tree s 24 2.4. 1 From context-free grammars to datatype s 26 2.5 Grammar transformation s 27 2.6 Concrete and abstract synta x 32 2.7 Constructions on grammar s 35 2.7. 1 SL: an exampl e 36 2.8 Parsin g 38 2.9 Exercise s 39

SYNTAGMA. A Linguistic Approach to Parsing

SYNTAGMA is a rule-based parsing system, structured on two levels: a general grammar and a language specific grammars. The general grammar is implemented in the program; language specific grammars are resources conceived as text files which contain a lexical database with meaning-related grammatical features, a description of constituent structures, a meaning-specific syntactic constraints database and a semantic network. Since its theoretical background is principally Tesnière's Éléments de syntaxe, SYNTAGMA's grammar emphasizes the role of argument structure (valency) in constraint satisfaction, and allows also horizontal bounds, for instance treating coordination. Notions such as traces, empty categories are derived from Generative Grammar and some solutions are close to Government & Binding Theory, although they are the result of an autonomous research. These proprieties allow SYNTAGMA to manage complex syntactic configurations and well known weak points in parsing engineering. An important resource is the semantic network, which is used by SYNTAGMA in disambiguation tasks. In contrast to statistical and data driven parsers, the system's behavior may be controlled and fine-tuned, since gaps, traces and long-term relations are structurally set and its constituent generation process is not a linear left-to-right shift and reduce, but a bottom-up, rule driven procedure.