Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages (original) (raw)
Related papers
Weighted Languages Recognizable by Weighted Tree Automata
Acta Cybernetica, 2018
Yields of recognizable weighted tree languages, yields of local weighted tree languages, and weighted context-free languages are related. It is shown that the following five classes of weighted languages are the same: (i) the class of weighted languages generated by plain weighted context-free grammars, (ii) the class of weighted languages recognized by plain weighted tree automata, (iii) the class of weighted languages recognized by deterministic and plain topdown weighted tree automata, (iv) the class of weighted languages recognized by deterministic and plain bottom-up weighted tree automata, and (v) the class of weighted languages determined by plain weighted local systems.
BIDIRECTIONAL AUTOMATA FOR TREE ADJOINING GRAMMARS
We define a new model of automata for the description of bidirectional parsing strategies for tree adjoining grammars and a tabulation mechanism that allow them to be executed in polynomial time. This new model of automata provides a modular way of describing bidirectional parsing strategies for TAG, separating the description of a strategy from its execution.
Mathematical Systems Theory, 1983
We define topdown pushdown tree automata (PDTA~s) which extend the usual string pushdown automata by allowing trees instead of strings in both the input and the stack. We prove that PDTA's recognize the class of context-free tree languages. (Quasi)realtime and deterministic PDTA's accept the classes of Greibach and deterministic tree languages, respectively. Finally, PDTA's are shown to be equivalent to restricted PDTA's, whose stack is linear: this both yields a more operational way of recognizing context-free tree languages and connects them with the class of indexed languages.
An Optimal Linear-Time Parallel Parser for Tree Adjoining Languages
SIAM Journal on Computing, 1990
An optimal parallel recognition/parsing algorithm is presented for languages generated by tree adjoining grammars (TAGs), a grammatical system for natural language. TAGs are strictly more powerful than context-free grammars (CFGs), e.g., they can generate {a"b"c"ln>-O}, which is not context-free. However, serial parsing of TAGs is also slower, having time complexity O(n6) for inputs of length n (as opposed to O(n3) for CFGs). The parallel algorithm achieves optimal speedup: it runs in linear time on a five-dimensional array of n processors. Moreover, the processors are finite-state; i.e., their function and size depends only on the underlying grammar and not on the length of the input. Key words, language recognition and parsing, tree adjoining languages, context-free languages, meshconnected processor arrays AMS(MOS) subject classifications. 68Q80, 68Q35, 68Q45, 68Q50, 68S05 1. Introduction. Language recognition and parsing are important problems that arise in many applications, e.g., compiler construction, natural language processing, and syntactic pattern recognition. Much of the work in this area has centered on context-free languages (CFLs) and its subclasses. Although many subclasses of CFLs can be parsed in linear time, the fastest known practical parsing algorithms for general CFLs (Cocke-Younger-Kasami's and Earley's algorithm) have time complexity O(n3) for inputs of length n [AHO72], [HOPC79]. An asymptotically faster algorithm that runs in O(M(n)) time has been given by Valiant [VALI75], where M(n) is the time to multiply two n x n Boolean matrices. Currently, the best-known upper bound on M(n) is O(I/2"376) [COPP87]. However, the constant of proportionality in Valiant's algorithm is too large for practical applications. Recent research has sought to decrease the time bound for CFL recognition and parsing by introducing parallelism. The parallel recognition of CFLs was first considered by Kosaraju in [KOSA75], where he showed that CFLs can be recognized by two-dimensional arrays of finite-state machines in linear time. His construction is a parallelization of the Cocke-Younger-Kasami (CYK) dynamic programming algorithm for recognizing the strings generated by a context-free grammar in Chomsky normal form (CNF). Later, Chiang and Fu [CHIA84] extended this result to the parsing problem (i.e., if the string is in the language, output a parse tree of the string). Their algorithm that performs both recognition and parsing is a parallel implementation of Earley's algorithm (that does not constrain the grammar to be in CNF) and runs in linear time on a two-dimensional systolic array of O(n2) processors. Unfortunately, for the parsing phase of the algorithm, the processors are no longer finite-state because *
Tabulation of Automata for Tree-Adjoining Languages
Grammars, 2000
We propose a modular design of tabular parsing algorithms for treeadjoining languages. The modularity is made possible by a separation of the parsing strategy from the mechanism of tabulation. The parsing strategy is expressed in terms of the construction of a nondeterministic automaton from a grammar; three distinct types of automaton will be discussed. The mechanism of tabulation leads to the simulation of these nondeterministic automata in polynomial time, independent of the parsing strategy. The proposed application of this work is the design of efficient parsing algorithms for tree-adjoining grammars and related formalisms.
Algorithms for Weighted Pushdown Automata
Cornell University - arXiv, 2022
Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing. As most existing dynamic programming algorithms are designed for contextfree grammars (CFGs), algorithms for PDAs often resort to a PDA-to-CFG conversion. In this paper, we develop novel algorithms that operate directly on WPDAs. Our algorithms are inspired by Lang's algorithm, but use a more general definition of pushdown automaton and either reduce the space requirements by a factor of |Γ| (the size of the stack alphabet) or reduce the runtime by a factor of more than | | (the number of states). When run on the same class of PDAs as Lang's algorithm, our algorithm is both more space-efficient by a factor of |Γ| and more time-efficient by a factor of | | • |Γ|.
Journal of Computer and System Sciences, 1985
This paper presents a new type of automaton called a tree pushdown automaton (a bottom-up tree automaton augmented with internal memory in the form of a tree, similar to the way a stack is added to a finite state machine to produce a pushdown automaton) and shows that the class of languages recognized by such automata is identical to the class of context-free tree languages.
An algorithm for the inference of tree grammars
International Journal of Computer & Information Sciences, 1976
An algorithm for the inference of tree grarrmlars from sample trees is presented. The procedure, which is based on the properties of self-embedding and regularity, produces a reduced tree grammar capable of generating all the samples used in the inference process as well as other trees similar in structure, The characteristics of the algorithm are illustrated by experimental results.
Error Correcting Analysis for Tree Languages
International Journal of Pattern Recognition and Artificial Intelligence, 2000
To undertake a syntactic approach to a pattern recognition problem, it is necessary to have good grammatical models as well as good parsing algorithms that allow distorted samples to be classified. There are several methods that obtain, by taking two trees as input, the editing distance between them. In the following work, a polynomial time algorithm which processes the distance between a tree and a tree automaton is presented. This measure can be used in pattern recognition problems as an error model inside a syntactic classifier.