Nelma Moreira - Academia.edu (original) (raw)
Papers by Nelma Moreira
International Journal of Foundations of Computer Science, Aug 1, 2009
Antimirov and Mosses proposed a rewrite system for deciding the equivalence of two (extended) reg... more Antimirov and Mosses proposed a rewrite system for deciding the equivalence of two (extended) regular expressions. They argued that this method could lead to a better average-case algorithm than those based on the comparison of the equivalent minimal deterministic finite automata. In this paper we present a functional approach to that method, prove its correctness, and give some experimental comparative results. Besides an improved functional version of Antimirov and Mosses's algorithm, we present an alternative one using partial derivatives. Our preliminary results lead to the conclusion that, indeed, these methods are feasible and, most of the time, faster than the classical methods.
Lecture Notes in Computer Science, 2018
We study the computational power of parsing expression grammars (PEGs). We begin by constructing ... more We study the computational power of parsing expression grammars (PEGs). We begin by constructing PEGs with unexpected behaviour, and surprising new examples of languages with PEGs, including the language of palindromes whose length is a power of two, and a binary-counting language. We then propose a new computational model, the scaffolding automaton, and prove that it exactly characterises the computational power of parsing expression grammars (PEGs). Several consequences will follow from this characterisation: (1) we show that PEGs are computationally "universal", in a certain sense, which implies the existence of a PEG for a P-complete language; (2) we show that there can be no pumping lemma for PEGs; and (3) we show that PEGs are strictly more powerful than online Turing machines which do o(n/(log n) 2) steps of computation per input symbol.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, etc. in this publication does... more The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Theoretical Computer Science
In coding and information theory, it is desirable to construct maximal codes that can be either v... more In coding and information theory, it is desirable to construct maximal codes that can be either variable length codes or error control codes of fixed length. However deciding code maximality boils down to deciding whether a given NFA is universal, and this is a hard problem (including the case of whether the NFA accepts all words of a fixed length). On the other hand, it is acceptable to know whether a code is 'approximately' maximal, which then boils down to whether a given NFA is 'approximately' universal. Here we introduce the notion of a (1 − ε)universal automaton and present polynomial randomized approximation algorithms to test NFA universality and related hard automata problems, for certain natural probability distributions on the set of words. We also conclude that the randomization aspect is necessary, as approximate universality remains hard for any polynomially computable ε.
Journal of Logical and Algebraic Methods in Programming
Descriptional Complexity of Formal Systems, 2021
Although regular expressions do not correspond univocally to regular languages, it is still worth... more Although regular expressions do not correspond univocally to regular languages, it is still worthwhile to study their properties and algorithms. For the average case analysis one often relies on the uniform random generation using a specific grammar for regular expressions, that can represent regular languages with more or less redundancy. Generators that are uniform on the set of expressions are not necessarily uniform on the set of regular languages. Nevertheless, it is not straightforward that asymptotic estimates obtained by considering the whole set of regular expressions are different from those obtained using a more refined set that avoids some large class of equivalent expressions. In this paper we study a set of expressions that avoid a given absorbing pattern. It is shown that, although this set is significantly smaller than the standard one, the asymptotic average estimates for the size of the Glushkov automaton for these expressions does not differ from the standard case.
Developments in Language Theory, 2017
We contribute new relations to the taxonomy of different conversions from regular expressions to ... more We contribute new relations to the taxonomy of different conversions from regular expressions to equivalent finite automata. In particular, we are interested in ordinary transformations that construct automata such as, the follow automaton, the partial derivative automaton, the prefix automaton, the automata based on pointed expressions recently introduced and studied, and last but not least the position, or Glushkov automaton (APOS), and their double reversed construction counterparts. We deepen the understanding of these constructions and show that with the artefacts used to construct the Glushkov automaton one is able to capture most of them. As a byproduct we define a dual version A←−− POS of the position automaton which plays a similar role as APOS but now for the reverse expression. It turns out that although the conversion of regular expressions and reversal of regular expressions to finite automata seems quite similar, there are significant differences.
We define the class of forward injective finite automata (FIFA) and study some of their propertie... more We define the class of forward injective finite automata (FIFA) and study some of their properties. Each FIFA has a unique canonical representation up to isomorphism. Using this representation an enumeration is given and an efficient uniform random generator is presented. We provide a conversion algorithm from a nondeterministic finite automaton or regular expression into an equivalent FIFA. Finally, we present some experimental results comparing the size of FIFA with other automata.
We introduce the concept of an f -maximal error-detecting block code, for some parameter f betwee... more We introduce the concept of an f -maximal error-detecting block code, for some parameter f between 0 and 1, in order to formalize the situation where a block code is close to maximal with respect to being error-detecting. Our motivation for this is that constructing a maximal error-detecting code is a computationally hard problem. We present a randomized algorithm that takes as input two positive integers N, `, a probability value f , and a specification of the errors permitted in some application, and generates an error-detecting, or error-correcting, block code having up to N codewords of length `. If the algorithm finds less than N codewords, then those codewords constitute a code that is f -maximal with high probability. The error specification is modelled as a (nondeterministic) transducer, which allows one to model any rational combination of substitution and synchronization errors. We also present some elements of our implementation of various error-detecting properties and t...
J. Autom. Lang. Comb., 2017
Extended regular expressions (with complement and intersection) are used in many applications due... more Extended regular expressions (with complement and intersection) are used in many applications due to their succinctness. In particular, regular expressions extended with intersection only (also called semi-extended) can already be exponentially smaller than standard regular expressions or equivalent nondeterministic finite automata. For practical purposes it is important to study the average behaviour of conversions between these models. In this paper, we focus on the conversion of regular expressions with intersection to nondeterministic finite automata, using partial derivatives and the notion of support. We give a tight upper bound of 2O(n) for the worst-case number of states of the resulting partial derivative automaton, where n is the size of the expression. Using the framework of analytic combinatorics, we establish an upper bound of (1.056+o(1))n for its asymptotic average-state complexity, which is significantly smaller than the one for the worst case. Some experimental resu...
The distinguishability language of a regular language L is the set of words distinguishing betwee... more The distinguishability language of a regular language L is the set of words distinguishing between pairs of words under the Myhill-Nerode equivalence induced by L, i.e., between pairs of distinct left quotients of L. The similarity relation induced by a language L is a similarity relation inspired by the Myhill-Nerode equivalence and it was used to obtain compact representation of automata for a finite language L, i.e., deterministic finite cover automata, which are deterministic finite automata accepting all the words of L and possibly some other words that are longer than any word of L. The dissimilarity language of a finite language L is defined as the set of words that separate a pair of words which are not similar w.r.t. to a (finite) language L. In this paper we extend the study of distinguishability operation on regular languages to l-dissimilarity, for l ∈ N, and the dissimilarity operation on finite languages. We examine their properties, the state complexity, and relations...
GUItar is a GPL-licensed, cross-platform, graphical user interface for automata drawing and manip... more GUItar is a GPL-licensed, cross-platform, graphical user interface for automata drawing and manipulation, written in C++ and Qt5. This tool offers support for styling, automatic layouts, several format exports and interface with any foreign finite automata manipulation library that can parse the serialized XML or JSON produced. In this paper we describe a new redesign of the GUItar framework and specially the method used to interface GUItar with automata manipulation libraries.
Formal Methods. FM 2019 International Workshops, 2020
This talk builds on Berry’s personal professional history as it attempts to explain why formal me... more This talk builds on Berry’s personal professional history as it attempts to explain why formal methods are not being used to develop large-scale software-intensive computer-based systems by appealing to the Reference Model for Requirements and Specifications by Gunter, Gunter, Jackson, and Zave.
Bull. EATCS, 2015
Because of their succinctness and clear syntax, regular expressions are the common choice to repr... more Because of their succinctness and clear syntax, regular expressions are the common choice to represent regular languages. Deterministic finite automata are an excellent representation for testing equivalence, containment or membership, as these problems are easily solved for this model. However, minimal deterministic finite automata can be exponentially larger than the associated regular expression, while the corresponding nondeterministic finite automata can be linearly larger. The worst case of both the complexity of the conversion algorithms, and of the size of the resulting automata, are well studied. However, for practical purposes, estimates for the average case can provide much more useful information. In this paper we review recent results on the average size of automata resulting from several constructions and suggest several directions of research. Most results were obtained within the framework of analytic combinatorics.
J. Autom. Lang. Comb., 2021
There are many different constructions when converting regular expressions to finite automata. In... more There are many different constructions when converting regular expressions to finite automata. In this paper we focus on the prefix automaton, apre\apreapre, introduced by Yamamoto in 2014. We present two different methods for the construction of apre\apreapre. First, an inductive one, based on a system of expression equations. A second one using an iterative function for computing the states and transitions. We establish relationships between apre\apreapre and other constructions, such as the position automaton, partial derivative automaton and their double reversal (dual) counterparts. We study the average size of these constructions, both experimentally and from an analytic combinatorics point of view. Finally, we extend the construction of the prefix automaton to regular expressions with intersection and show that the relationships with the other automaton constructions also hold for these expressions.
ArXiv, 2017
Descriptional complexity is the study of the conciseness of the various models representing forma... more Descriptional complexity is the study of the conciseness of the various models representing formal languages. The state complexity of a regular language is the size, measured by the number of states of the smallest, either deterministic or nondeterministic, finite automaton that recognises it. Operational state complexity is the study of the state complexity of operations over languages. In this survey, we review the state complexities of individual regularity preserving language operations on regular and some subregular languages. Then we revisit the state complexities of the combination of individual operations. We also review methods of estimation and approximation of state complexity of more complex combined operations.
Theoretical Computer Science, 2021
Abstract We are interested in regular expressions that represent word relations in an alphabet-in... more Abstract We are interested in regular expressions that represent word relations in an alphabet-invariant way—for example, the set of all word pairs ( u , v ) where v is a prefix of u independently of what the alphabet is. Current software systems of formal language objects do not have a mechanism to define such objects. Labelled graphs (transducers and automata) with alphabet-invariant and user-defined labels were considered in a recent paper. In this paper we study derivatives of regular expressions over labels (atomic objects) in some set B. These labels can be any strings as long as the strings represent subsets of a certain monoid. We show that the number of partial derivatives of any type B regular expression is linearly bounded, and that one can define partial derivative labelled graphs, whose transition labels can be elements of another label set X as long as X and B refer to the same monoid. We also show how to use derivatives directly to decide whether a given word is in the language of a regular expression over set specs. Set specs and pairing specs are label sets allowing one to express languages and relations over large alphabets in a natural and concise way such that many algorithms work directly on these labels without the need to expand these labels to linear or quadratic size expressions.
Information and Computation, 2017
We generalize the partial derivative automaton and the position automaton to regular expressions ... more We generalize the partial derivative automaton and the position automaton to regular expressions with shuffle, and study their state complexity in the worst, as well as in the average case. The number of states of the partial derivative automaton (A pd) is, in the worst case, at most 2 m , where m is the number of letters in the expression. The asymptotic average is bounded by (4 3) m. We define a position automaton (A pos) that is homogeneous, but in which several states can correspond to a same position, and we show that A pd is a quotient of A pos. The number of states of the position automaton is at most 1 + m(2 m − 1), while the asymptotic average is no more than m(4 3) m .
Lecture Notes in Computer Science, 2016
Positions and derivatives are two essential notions in the conversion methods from regular expres... more Positions and derivatives are two essential notions in the conversion methods from regular expressions to equivalent finite automata. Partial derivative based methods have recently been extended to regular expressions with intersection. In this paper, we present a position automaton construction for those expressions. This construction generalizes the notion of position making it compatible with intersection. The resulting automaton is homogeneous and has the partial derivative automaton as its quotient.
International Journal of Foundations of Computer Science, Aug 1, 2009
Antimirov and Mosses proposed a rewrite system for deciding the equivalence of two (extended) reg... more Antimirov and Mosses proposed a rewrite system for deciding the equivalence of two (extended) regular expressions. They argued that this method could lead to a better average-case algorithm than those based on the comparison of the equivalent minimal deterministic finite automata. In this paper we present a functional approach to that method, prove its correctness, and give some experimental comparative results. Besides an improved functional version of Antimirov and Mosses's algorithm, we present an alternative one using partial derivatives. Our preliminary results lead to the conclusion that, indeed, these methods are feasible and, most of the time, faster than the classical methods.
Lecture Notes in Computer Science, 2018
We study the computational power of parsing expression grammars (PEGs). We begin by constructing ... more We study the computational power of parsing expression grammars (PEGs). We begin by constructing PEGs with unexpected behaviour, and surprising new examples of languages with PEGs, including the language of palindromes whose length is a power of two, and a binary-counting language. We then propose a new computational model, the scaffolding automaton, and prove that it exactly characterises the computational power of parsing expression grammars (PEGs). Several consequences will follow from this characterisation: (1) we show that PEGs are computationally "universal", in a certain sense, which implies the existence of a PEG for a P-complete language; (2) we show that there can be no pumping lemma for PEGs; and (3) we show that PEGs are strictly more powerful than online Turing machines which do o(n/(log n) 2) steps of computation per input symbol.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, etc. in this publication does... more The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Theoretical Computer Science
In coding and information theory, it is desirable to construct maximal codes that can be either v... more In coding and information theory, it is desirable to construct maximal codes that can be either variable length codes or error control codes of fixed length. However deciding code maximality boils down to deciding whether a given NFA is universal, and this is a hard problem (including the case of whether the NFA accepts all words of a fixed length). On the other hand, it is acceptable to know whether a code is 'approximately' maximal, which then boils down to whether a given NFA is 'approximately' universal. Here we introduce the notion of a (1 − ε)universal automaton and present polynomial randomized approximation algorithms to test NFA universality and related hard automata problems, for certain natural probability distributions on the set of words. We also conclude that the randomization aspect is necessary, as approximate universality remains hard for any polynomially computable ε.
Journal of Logical and Algebraic Methods in Programming
Descriptional Complexity of Formal Systems, 2021
Although regular expressions do not correspond univocally to regular languages, it is still worth... more Although regular expressions do not correspond univocally to regular languages, it is still worthwhile to study their properties and algorithms. For the average case analysis one often relies on the uniform random generation using a specific grammar for regular expressions, that can represent regular languages with more or less redundancy. Generators that are uniform on the set of expressions are not necessarily uniform on the set of regular languages. Nevertheless, it is not straightforward that asymptotic estimates obtained by considering the whole set of regular expressions are different from those obtained using a more refined set that avoids some large class of equivalent expressions. In this paper we study a set of expressions that avoid a given absorbing pattern. It is shown that, although this set is significantly smaller than the standard one, the asymptotic average estimates for the size of the Glushkov automaton for these expressions does not differ from the standard case.
Developments in Language Theory, 2017
We contribute new relations to the taxonomy of different conversions from regular expressions to ... more We contribute new relations to the taxonomy of different conversions from regular expressions to equivalent finite automata. In particular, we are interested in ordinary transformations that construct automata such as, the follow automaton, the partial derivative automaton, the prefix automaton, the automata based on pointed expressions recently introduced and studied, and last but not least the position, or Glushkov automaton (APOS), and their double reversed construction counterparts. We deepen the understanding of these constructions and show that with the artefacts used to construct the Glushkov automaton one is able to capture most of them. As a byproduct we define a dual version A←−− POS of the position automaton which plays a similar role as APOS but now for the reverse expression. It turns out that although the conversion of regular expressions and reversal of regular expressions to finite automata seems quite similar, there are significant differences.
We define the class of forward injective finite automata (FIFA) and study some of their propertie... more We define the class of forward injective finite automata (FIFA) and study some of their properties. Each FIFA has a unique canonical representation up to isomorphism. Using this representation an enumeration is given and an efficient uniform random generator is presented. We provide a conversion algorithm from a nondeterministic finite automaton or regular expression into an equivalent FIFA. Finally, we present some experimental results comparing the size of FIFA with other automata.
We introduce the concept of an f -maximal error-detecting block code, for some parameter f betwee... more We introduce the concept of an f -maximal error-detecting block code, for some parameter f between 0 and 1, in order to formalize the situation where a block code is close to maximal with respect to being error-detecting. Our motivation for this is that constructing a maximal error-detecting code is a computationally hard problem. We present a randomized algorithm that takes as input two positive integers N, `, a probability value f , and a specification of the errors permitted in some application, and generates an error-detecting, or error-correcting, block code having up to N codewords of length `. If the algorithm finds less than N codewords, then those codewords constitute a code that is f -maximal with high probability. The error specification is modelled as a (nondeterministic) transducer, which allows one to model any rational combination of substitution and synchronization errors. We also present some elements of our implementation of various error-detecting properties and t...
J. Autom. Lang. Comb., 2017
Extended regular expressions (with complement and intersection) are used in many applications due... more Extended regular expressions (with complement and intersection) are used in many applications due to their succinctness. In particular, regular expressions extended with intersection only (also called semi-extended) can already be exponentially smaller than standard regular expressions or equivalent nondeterministic finite automata. For practical purposes it is important to study the average behaviour of conversions between these models. In this paper, we focus on the conversion of regular expressions with intersection to nondeterministic finite automata, using partial derivatives and the notion of support. We give a tight upper bound of 2O(n) for the worst-case number of states of the resulting partial derivative automaton, where n is the size of the expression. Using the framework of analytic combinatorics, we establish an upper bound of (1.056+o(1))n for its asymptotic average-state complexity, which is significantly smaller than the one for the worst case. Some experimental resu...
The distinguishability language of a regular language L is the set of words distinguishing betwee... more The distinguishability language of a regular language L is the set of words distinguishing between pairs of words under the Myhill-Nerode equivalence induced by L, i.e., between pairs of distinct left quotients of L. The similarity relation induced by a language L is a similarity relation inspired by the Myhill-Nerode equivalence and it was used to obtain compact representation of automata for a finite language L, i.e., deterministic finite cover automata, which are deterministic finite automata accepting all the words of L and possibly some other words that are longer than any word of L. The dissimilarity language of a finite language L is defined as the set of words that separate a pair of words which are not similar w.r.t. to a (finite) language L. In this paper we extend the study of distinguishability operation on regular languages to l-dissimilarity, for l ∈ N, and the dissimilarity operation on finite languages. We examine their properties, the state complexity, and relations...
GUItar is a GPL-licensed, cross-platform, graphical user interface for automata drawing and manip... more GUItar is a GPL-licensed, cross-platform, graphical user interface for automata drawing and manipulation, written in C++ and Qt5. This tool offers support for styling, automatic layouts, several format exports and interface with any foreign finite automata manipulation library that can parse the serialized XML or JSON produced. In this paper we describe a new redesign of the GUItar framework and specially the method used to interface GUItar with automata manipulation libraries.
Formal Methods. FM 2019 International Workshops, 2020
This talk builds on Berry’s personal professional history as it attempts to explain why formal me... more This talk builds on Berry’s personal professional history as it attempts to explain why formal methods are not being used to develop large-scale software-intensive computer-based systems by appealing to the Reference Model for Requirements and Specifications by Gunter, Gunter, Jackson, and Zave.
Bull. EATCS, 2015
Because of their succinctness and clear syntax, regular expressions are the common choice to repr... more Because of their succinctness and clear syntax, regular expressions are the common choice to represent regular languages. Deterministic finite automata are an excellent representation for testing equivalence, containment or membership, as these problems are easily solved for this model. However, minimal deterministic finite automata can be exponentially larger than the associated regular expression, while the corresponding nondeterministic finite automata can be linearly larger. The worst case of both the complexity of the conversion algorithms, and of the size of the resulting automata, are well studied. However, for practical purposes, estimates for the average case can provide much more useful information. In this paper we review recent results on the average size of automata resulting from several constructions and suggest several directions of research. Most results were obtained within the framework of analytic combinatorics.
J. Autom. Lang. Comb., 2021
There are many different constructions when converting regular expressions to finite automata. In... more There are many different constructions when converting regular expressions to finite automata. In this paper we focus on the prefix automaton, apre\apreapre, introduced by Yamamoto in 2014. We present two different methods for the construction of apre\apreapre. First, an inductive one, based on a system of expression equations. A second one using an iterative function for computing the states and transitions. We establish relationships between apre\apreapre and other constructions, such as the position automaton, partial derivative automaton and their double reversal (dual) counterparts. We study the average size of these constructions, both experimentally and from an analytic combinatorics point of view. Finally, we extend the construction of the prefix automaton to regular expressions with intersection and show that the relationships with the other automaton constructions also hold for these expressions.
ArXiv, 2017
Descriptional complexity is the study of the conciseness of the various models representing forma... more Descriptional complexity is the study of the conciseness of the various models representing formal languages. The state complexity of a regular language is the size, measured by the number of states of the smallest, either deterministic or nondeterministic, finite automaton that recognises it. Operational state complexity is the study of the state complexity of operations over languages. In this survey, we review the state complexities of individual regularity preserving language operations on regular and some subregular languages. Then we revisit the state complexities of the combination of individual operations. We also review methods of estimation and approximation of state complexity of more complex combined operations.
Theoretical Computer Science, 2021
Abstract We are interested in regular expressions that represent word relations in an alphabet-in... more Abstract We are interested in regular expressions that represent word relations in an alphabet-invariant way—for example, the set of all word pairs ( u , v ) where v is a prefix of u independently of what the alphabet is. Current software systems of formal language objects do not have a mechanism to define such objects. Labelled graphs (transducers and automata) with alphabet-invariant and user-defined labels were considered in a recent paper. In this paper we study derivatives of regular expressions over labels (atomic objects) in some set B. These labels can be any strings as long as the strings represent subsets of a certain monoid. We show that the number of partial derivatives of any type B regular expression is linearly bounded, and that one can define partial derivative labelled graphs, whose transition labels can be elements of another label set X as long as X and B refer to the same monoid. We also show how to use derivatives directly to decide whether a given word is in the language of a regular expression over set specs. Set specs and pairing specs are label sets allowing one to express languages and relations over large alphabets in a natural and concise way such that many algorithms work directly on these labels without the need to expand these labels to linear or quadratic size expressions.
Information and Computation, 2017
We generalize the partial derivative automaton and the position automaton to regular expressions ... more We generalize the partial derivative automaton and the position automaton to regular expressions with shuffle, and study their state complexity in the worst, as well as in the average case. The number of states of the partial derivative automaton (A pd) is, in the worst case, at most 2 m , where m is the number of letters in the expression. The asymptotic average is bounded by (4 3) m. We define a position automaton (A pos) that is homogeneous, but in which several states can correspond to a same position, and we show that A pd is a quotient of A pos. The number of states of the position automaton is at most 1 + m(2 m − 1), while the asymptotic average is no more than m(4 3) m .
Lecture Notes in Computer Science, 2016
Positions and derivatives are two essential notions in the conversion methods from regular expres... more Positions and derivatives are two essential notions in the conversion methods from regular expressions to equivalent finite automata. Partial derivative based methods have recently been extended to regular expressions with intersection. In this paper, we present a position automaton construction for those expressions. This construction generalizes the notion of position making it compatible with intersection. The resulting automaton is homogeneous and has the partial derivative automaton as its quotient.