The inference of tree languages from finite samples: an algebraic approach (original) (raw)

RAIRO - Theoretical Informatics and Applications, 2007

We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability as introduced by the author in Theoretical Computer Science, 290:1679Science, 290: -1711Science, 290: , 2003, can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified.

Inference of reversible tree languages

IEEE Transactions on Systems, Man, and Cybernetics, 2004

In this paper, we study the notion of -reversibility and -testability when regular tree languages are involved. We present an inference algorithm for learning a -testable tree language that runs in polynomial time with respect to the size of the sample used. We also study the tree language classes in relation to other well known ones, and some properties of these languages are proven.

Error-correcting tree language inference

Pattern Recognition Letters, 2002

A new tree language inference algorithm is proposed in this work. This algorithm extends a string language inference algorithm which is based on error correction (ECGI). The algorithm proposed here uses the substructures which have already been taken into account in a tree automaton, modifying the automaton in order to force it to accept the new structures presented in the identi®cation process. The proposed algorithm allows the use of more powerful representation primitives in pattern recognition tasks than the string primitives. It also takes advantage of the thoroughly tested ECGI features used in speech and planar shape recognition tasks. Ó

Stochastic inference of regular tree languages

Lecture Notes in Computer Science, 1998

We generalize a former algorithm for regular language identification from stochastic samples to the case of tree languages. It can also be used to identify context-free languages when structural information about the strings is available. The procedure identifies equivalent subtrees in the sample and outputs the hypothesis in linear time with the number of examples. The results are evaluated with a method that computes efficiently the relative entropy between the target grammar and the inferred one.

An algorithm for the inference of tree grammars

International Journal of Computer & Information Sciences, 1976

An algorithm for the inference of tree grarrmlars from sample trees is presented. The procedure, which is based on the properties of self-embedding and regularity, produces a reduced tree grammar capable of generating all the samples used in the inference process as well as other trees similar in structure, The characteristics of the algorithm are illustrated by experimental results.

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

2022

The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdownadjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiringweighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of O(n|N |) (where n is the string length and |N | is the size of the nonterminal set) and more space-efficient by a factor of O(|Γ|) (where Γ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more spaceefficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of O(|Γ| 2) and O(|Γ| 3), respectively. Finally, we give the first PAA stringsum and allsum algorithms.

Error Correcting Analysis for Tree Languages

International Journal of Pattern Recognition and Artificial Intelligence, 2000

To undertake a syntactic approach to a pattern recognition problem, it is necessary to have good grammatical models as well as good parsing algorithms that allow distorted samples to be classified. There are several methods that obtain, by taking two trees as input, the editing distance between them. In the following work, a polynomial time algorithm which processes the distance between a tree and a tree automaton is presented. This measure can be used in pattern recognition problems as an error model inside a syntactic classifier.

Tree algebras and varieties of tree languages

Theoretical Computer Science, 2007

We consider several aspects of Wilke's [T. Wilke, An algebraic characterization of frontier testable tree languages, Theoret. Comput. Sci. 154 (1996) 85-106] tree algebra formalism for representing binary labelled trees and compare it with approaches that represent trees as terms in the traditional way. A convergent term rewriting system yields normal form representations of binary trees and contexts, as well as a new completeness proof and a computational decision method for the axiomatization of tree algebras. Varieties of binary tree languages are compared with varieties of tree languages studied earlier in the literature. We also prove a variety theorem thus solving a problem noted by several authors. Syntactic tree algebras are studied and compared with ordinary syntactic algebras. The expressive power of the language of tree algebras is demonstrated by giving equational definitions for some well-known varieties of binary tree languages.

Probabilistic tree automata and context free languages

Israel Journal of Mathematics, 1970

A notion of a probabilistic tree automaton is defined and a condition is given under which it is equivalent to a usual tree automaton. A theorem about context free languages is stated.

The inference of tree languages from finite samples: an algebraic approach (original) (raw)

Related papers