On theoretical and practical complexity of TAG parsers (original) (raw)
Related papers
Generating XTAG parsers from algebraic specifications
Proceedings of the Eighth …, 2006
In this paper, a generic system that generates parsers from parsing schemata is applied to the particular case of the XTAG English grammar. In order to be able to generate XTAG parsers, some transformations are made to the grammar, and TAG parsing schemata are extended with feature structure unification support and a simple tree filtering mechanism. The generated implementations allow us to study the performance of different TAG parsers when working with a large-scale, wide coverage grammar.
Compiling declarative specifications of parsing algorithms
Database and Expert Systems …, 2007
The parsing schemata formalism allows us to describe parsing algorithms in a simple, declarative way by capturing their fundamental semantics while abstracting low-level detail. In this work, we present a compilation technique allowing the automatic transformation of parsing schemata to efficient executable implementations of their corresponding algorithms. Our technique is general enough to be able to handle all kinds of schemata for context-free grammars, tree adjoining grammars and other grammatical formalisms, providing an extensibility mechanism which allows the user to define custom notational elements.
Practical aspects in compiling tabular TAG parsers
… of the 5th International Workshop on …, 2000
This paper describes the extension of the system DyALog to compile tabular parsers from Feature Tree Adjoining Grammars. The compilation process uses intermediary 2-stack automata to encode various parsing strategies and a dynamic programming interpretation to break automata derivations into tabulable fragments.
DPL – a computational method for describing grammars and modelling parsers
1985
constituent-pairs (ie. words or recognized phrase constructs) and (3) description of constituent surroundings in the form of two-way automata. The compilation of DPL-grammars results in e>;ecutable codes of corresponding parsers. To ease the modelling of grammars there exists a linguistically oriented programming environment, which contains e.g. tracing facility for the parsing process, grammar—sensitive lexical maintenance programs, and routines for the interactive graphic display of parse trees and grammar definitions. Translator routines are also available for the transport of compiled code between various LISP-dialects. The DPL-compiler and associated tools can be used under INTERLISP and FRANZLISPThis paper focuses on knowledge engineering issues. Linguistic argumentation we have presented in 73/ and 74/The detailed syntax of DPL with examples can be found in 727.
Generation of indexes for compiling efficient parsers from formal specifications
Computer Aided Systems …, 2007
Parsing schemata provide a formal, simple and uniform way to describe, analyze and compare different parsing algorithms. The notion of a parsing schema comes from considering parsing as a deduction process which generates intermediate results called items. An initial set of items is directly obtained from the input sentence, and the parsing process consists of the application of inference rules (called deductive steps) which produce new items from existing ones. Each item contains a piece of information about the sentence’s structure, and a successful parsing process will produce at least one final item containing a full parse tree for the sentence or guaranteeing its existence. Their abstraction of lowlevel details makes parsing schemata useful to define parsers in a simple and straightforward way. Comparing parsers, or considering aspects such as their correction and completeness or their computational complexity, also becomes easier if we think in terms of schemata. However, when we want to actually use a parser by running it on a computer, we need to implement it in a programming language, so we have to abandon the high level of abstraction and worry about implementation details that were irrelevant at the schema level. In particular, we study in this article how the source parsing schema should be analysed to decide what kind of indexes need to be generated in order to obtain an efficient parser.
On the Complexity of CCG Parsing
Computational Linguistics
We study the parsing complexity of Combinatory Categorial Grammar (CCG) in the formalism of Vijay-Shanker and Weir ( 1994 ). As our main result, we prove that any parsing algorithm for this formalism will take in the worst case exponential time when the size of the grammar, and not only the length of the input sentence, is included in the analysis. This sets the formalism of Vijay-Shanker and Weir ( 1994 ) apart from weakly equivalent formalisms such as Tree Adjoining Grammar, for which parsing can be performed in time polynomial in the combined size of grammar and input sentence. Our results contribute to a refined understanding of the class of mildly context-sensitive grammars, and inform the search for new, mildly context-sensitive versions of CCG.
Syntax-Semantics Interaction Parsing Strategies. Inside SYNTAGMA
This paper discusses SYNTAGMA, a rule based NLP system addressing the tricky issues of syntactic ambiguity reduction and word sense disambiguation as well as providing innovative and original solutions for constituent generation and constraints management. To provide an insight into how it operates, the system's general architecture and components, as well as its lexical, syntactic and semantic resources are described. After that, the paper addresses the mechanism that performs selective parsing through an interaction between syntactic and semantic information, leading the parser to a coherent and accurate interpretation of the input text. 1. SYNTAGMA's architecture This paper discusses SYNTAGMA, a rule based NLP system addressing the tricky issues of syntactic ambiguity reduction and word sense disambiguation as well as providing innovative and original solutions for constituent generation and constraints management. The first section addresses its general architecture and components to provide an insight into how it works. The second section describes the system's lexical, syntactic and semantic resources. The last section shows how the system performs selective parsing through an interaction between syntactic and semantic information, leading the parser to reach the correct interpretation of the input text. A schema of the whole architecture is given in the appendix. SYNTAGMA is the result of research into parsing strategies started by the author in 1989. The development of the current architecture began in 2011 after some experiments in data driven dependency parsing, which led the author to conceive a radically new parsing mechanism. Since its theoretical background has been discussed in a previous article 1 , the focus here is on its architecture, components and resources. The parser's core engine is language independent. All language specific rules and data are defined at the level of its lexical and syntactic resource files. Current implementation uses lexical, syntactic and semantic information which has been extracted automatically from an Italian dictionary, but these data are linked to the lexicon and the semantic net of other languages 2. The system follows a bottom-up deterministic constituency parsing method, starting from the output of the Part of Speech Tagger (PoS-Tagger), which is a list of terminal categories, and progressively builds more and more complex structures until it reaches the highest constituency level the given input text allows. At the end of the process, constituency parse trees are transformed into dependency trees. Both output formats are available. The system's behavior can be modulated by adjusting parameters, including the target language, the linguistic register (which can allow or prohibit some types of linguistic structures) and the strength of the selection mechanism during the parsing process (CSBS Constituent-Selection-By-Score).