First-Order Logic Definability of Free Languages (original) (raw)

Aperiodicity, Star-freeness, and First-order Logic Definability of Operator Precedence Languages

Logical Methods in Computer Science, 2023

A classic result in formal language theory is the equivalence among noncounting, or aperiodic, regular languages, and languages defined through star-free regular expressions, or first-order logic. Past attempts to extend this result beyond the realm of regular languages have met with difficulties: for instance it is known that star-free tree languages may violate the non-counting property and there are aperiodic tree languages that cannot be defined through first-order logic. We extend such classic equivalence results to a significant family of deterministic contextfree languages, the operator-precedence languages (OPL), which strictly includes the widely investigated visibly pushdown, alias input-driven, family and other structured context-free languages. The OP model originated in the '60s for defining programming languages and is still used by high performance compilers; its rich algebraic properties have been investigated initially in connection with grammar learning and recently completed with further closure properties and with monadic second order logic definition. We introduce an extension of regular expressions, the OP-expressions (OPE) which define the OPLs and, under the star-free hypothesis, define first-order definable and non-counting OPLs. Then, we prove, through a fairly articulated grammar transformation, that aperiodic OPLs are first-order definable. Thus, the classic equivalence of star-freeness, aperiodicity, and first-order definability is established for the large and powerful class of OPLs. We argue that the same approach can be exploited to obtain analogous results for visibly pushdown languages too.

Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization

SIAM Journal on Computing, 2015

Operator precedence languages were introduced half a century ago by Robert Floyd to support deterministic and efficient parsing of context-free languages. Recently, we renewed our interest in this class of languages thanks to a few distinguishing properties that make them attractive for exploiting various modern technologies. Precisely, their local parsability enables parallel and incremental parsing, whereas their closure properties make them amenable for automatic verification techniques, including model checking. In this paper we provide a fairly complete theory of this class of languages: we introduce a class of automata with the same recognizing power as the generative power of their grammars; we provide a characterization of their sentences in terms of monadic second order logic as it has been done in previous literature for more restricted language classes such as regular, parenthesis, and input-driven ones; we investigate preserved and lost properties when extending the language sentences from finite length to infinite length (ω-languages). As a result, we obtain a class of languages that enjoys many nice properties of regular languages (closure and decidability properties, logic characterization) but is considerably larger than other families-typically parenthesis and input-driven ones-with the same properties, covering "almost" all deterministic languages. 1

Precedence Automata and Languages

Lecture Notes in Computer Science, 2011

Operator precedence grammars define a classical Boolean and deterministic context-free family (called Floyd languages or FLs). FLs have been shown to strictly include the well-known visibly pushdown languages, and enjoy the same nice closure properties. We introduce here Floyd automata, an equivalent operational formalism for defining FLs. This also permits to extend the class to deal with infinite strings to perform for instance model checking.

Context-Free Grammars with Storage

arXiv (Cornell University), 2014

Context-free S grammars are introduced, for arbitrary (storage) type S, as a uniform framework for recursion-based grammars, automata, and transducers, viewed as programs. To each occurrence of a nonterminal of a context-free S grammar an object of type S is associated, that can be acted upon by tests and operations, as indicated in the rules of the grammar. Taking particular storage types gives particular formalisms, such as indexed grammars, top-down tree transducers, attribute grammars, etc. Context-free S grammars are equivalent to pushdown S automata. The context-free S languages can be obtained from the deterministic one-way S automaton languages by way of the delta operations on languages, introduced in this paper.

On the grammar of first-order logic

Determining the computational requirements of formal reasoning is a key task in implementing David Hilbert's foundational program. This paper presents a number of results on the syntactical complexity of some versions of the language of fi rst-order logic (L FOL ), some of which are already known. Two versions of L FOL with ineffi cient (tally) indexing are shown to be context-free, and the sets of their sentences (the sub-language L S-FOL ) and a version of L FOL with effi cient (positional) indexing are shown to be not context-free. The latt er is not even the intersection of a fi nite number of context-free languages but is in D-LogSpace and consequently rudimentary in Smullyan's sense.

Algebraic properties of structured context-free languages: old approaches and novel developments

Eprint Arxiv 0907 2130, 2009

The historical research line on the algebraic properties of structured CF languages initiated by McNaughton's Parenthesis Languages has recently attracted much renewed interest with the Balanced Languages, the Visibly Pushdown Automata languages (VPDA), the Synchronized Languages, and the Height-deterministic ones. Such families preserve to a varying degree the basic algebraic properties of Regular languages: boolean closure, closure under reversal, under concatenation, and Kleene star. We prove that the VPDA family is strictly contained within the Floyd Grammars (FG) family historically known as operator precedence. Languages over the same precedence matrix are known to be closed under boolean operations, and are recognized by a machine whose pop or push operations on the stack are purely determined by terminal letters. We characterize VPDA's as the subclass of FG having a peculiarly structured set of precedence relations, and balanced grammars as a further restricted case. The non-counting invariance property of FG has a direct implication for VPDA too.

Aperiodicity, Star-freeness, and First-order Definability of Structured Context-Free Languages

ArXiv, 2020

A classic result in formal language theory is the equivalence among noncounting, or aperiodic, regular languages, and languages defined through star-free regular expressions, or first-order logic. Together with first-order completeness of linear temporal logic these results constitute a theoretical foundation for model-checking algorithms. Extending these results to structured subclasses of context-free languages, such as tree-languages did not work as smoothly: for instance W. Thomas showed that there are star-free tree languages that are counting. We show, instead, that investigating the same properties within the family of operator precedence languages leads to equivalences that perfectly match those on regular languages. The study of this old family of context-free languages has been recently resumed to enhance not only parsing (the original motivation of its inventor R. Floyd) but also to exploit their algebraic and logic properties. We have been able to reproduce the classic r...

Generalizing input-driven languages: theoretical and practical benefits

arXiv (Cornell University), 2017

Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families.

A Formalisation of the Normal Forms of Context-Free Grammars in HOL4

Lecture Notes in Computer Science, 2010

For a reasonable sound and complete proof calculus for first-order logic consider the problem to decide, given a sentence ϕ of first-order logic and a natural number n, whether ϕ has no proof of length ≤ n. We show that there is a nondeterministic algorithm accepting this problem which, for fixed ϕ, has running time bounded by a polynomial in n if and only if there is an optimal proof system for the set TAUT of tautologies of propositional logic. This equivalence is an instance of a general result linking the complexity of so-called slicewise monotone parameterized problems with the existence of an optimal proof system for TAUT.

Higher-Order Operator Precedence Languages

Electronic Proceedings in Theoretical Computer Science

Floyd's Operator Precedence (OP) languages are a deterministic context-free family having many desirable properties. They are locally and parallely parsable, and languages having a compatible structure are closed under Boolean operations, concatenation and star; they properly include the family of Visibly Pushdown (or Input Driven) languages. OP languages are based on three relations between any two consecutive terminal symbols, which assign syntax structure to words. We extend such relations to k-tuples of consecutive terminal symbols, by using the model of strictly locally testable regular languages of order k ≥ 3. The new corresponding class of Higher-order Operator Precedence languages (HOP) properly includes the OP languages, and it is still included in the deterministic (also in reverse) context free family. We prove Boolean closure for each subfamily of structurally compatible HOP languages. In each subfamily, the top language is called max-language. We show that such languages are defined by a simple cancellation rule and we prove several properties, in particular that max-languages make an infinite hierarchy ordered by parameter k. HOP languages are a candidate for replacing OP languages in the various applications where they have have been successful though sometimes too restrictive.