Marinella Sciortino - Academia.edu (original) (raw)

Uploads

Papers by Marinella Sciortino

Research paper thumbnail of Forbidden Factors and Fragment Assembly

Lecture Notes in Computer Science, 2002

In this paper we approach the fragment assembly problem by using the notion of minimal forbidden ... more In this paper we approach the fragment assembly problem by using the notion of minimal forbidden factors introduced in previous paper. Denoting by M(w) the set of minimal forbidden factors of a word w, we first focus on the evaluation of the size of elements in M(w) and on designing of an algorithm to recover the word w from M(w). Actually we prove that for a word w randomly generated by a memoryless source with identical symbol probabilities, the maximal length m(w) of words in M(w) is logarithmic and that the reconstruction algorithm runs in linear time. These results have an interesting application to the fragment assembly problem, i.e. reconstruct a word w from a given set I of substrings (fragments). Indeed under a suitable hypothesis on the set of fragments I, one can detect the elements of M(w) by looking at the minimal forbidden factors of elements in I and then apply the reconstruction algorithm.

Research paper thumbnail of Cyclic Complexity of Words

ABSTRACT We introduce and study a new complexity function on words, that we call \emph{cyclic com... more ABSTRACT We introduce and study a new complexity function on words, that we call \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of each given length. We extend the famous Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words having different slopes. More precisely, we prove that if xxx is a Sturmian word and yyy is a word having the same cyclic complexity of xxx, then yyy is Sturmian and, up to renaming letters, has the same language of factors of xxx.

Research paper thumbnail of Fundamenta Informaticae XX (2003) 1--15 1 IOS Press

Research paper thumbnail of Nondeterministic Moore automata and Brzozowski's algorithm

Moore automata represent a model that has many applications. In this paper we define a notion of ... more Moore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. We propose an algorithm that is a variant of Brzozowski's minimization algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice. Moreover, we explore more general classes of NMA and analyze the applicability of the algorithm. For some of such classes the algorithm does not return the minimal equivalent deterministic automaton.

Research paper thumbnail of Circular sturmian words and Hopcroft’s algorithm

Theoretical Computer Science, 2009

In order to analyze some extremal cases of Hopcroft's algorithm, we investigate the relationships... more In order to analyze some extremal cases of Hopcroft's algorithm, we investigate the relationships between the combinatorial properties of a circular sturmian word (x) and the run of the algorithm on the cyclic automaton A x associated to (x). The combinatorial properties of words taken into account make use of sturmian morphisms and give rise to the notion of reduction tree of a circular sturmian word. We prove that the shape of this tree uniquely characterizes the word itself. The properties of the run of Hopcroft's algorithm are expressed in terms of the derivation tree of the automaton, which is a tree that represents the refinement process that, in the execution of Hopcroft's algorithm, leads to the coarsest congruence of the automaton. We prove that the shape of the reduction tree of a circular sturmian word (x) coincides with that of the derivation tree T (A x ) of the automaton A x . From this we derive a recursive formula to compute the running time of Hopcroft's algorithm on the automaton A x , expressed in terms of parameters of the reduction tree of (x). As a special application, we obtain the time complexity Θ(n log n) of the algorithm in the case of automata associated to Fibonacci words.

Research paper thumbnail of Lightweight LCP construction for next-generation sequencing datasets

Research paper thumbnail of Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

Lecture Notes in Computer Science, 2003

... Theorem 3 is an assessment of the fact that the Burrows-Wheeler transform combined with Combi... more ... Theorem 3 is an assessment of the fact that the Burrows-Wheeler transform combined with Combinatorial Dependency is a general boosting method for a base compressor C. In Sections 5 and 6, we outline the compression algorithms HC and RHC that we use for boosting in ...

Research paper thumbnail of Suffix array and Lyndon factorization of a text

Journal of Discrete Algorithms, 2014

ABSTRACT

Research paper thumbnail of Sorting Conjugates and Suffixes of Words in a Multiset

International Journal of Foundations of Computer Science, 2014

ABSTRACT

Research paper thumbnail of The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words

Lecture Notes in Computer Science, 2013

ABSTRACT

Research paper thumbnail of An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression

Lecture Notes in Computer Science, 2005

... Recall that the Burrows Wheeler transform has been introduced as a tool to preprocessa word t... more ... Recall that the Burrows Wheeler transform has been introduced as a tool to preprocessa word to be compressed, in order to get a word easier to compress. ... We denote by Ca compressor that uses the transformation E as preprocessing. ...

Research paper thumbnail of Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological Data Mining

Techniques, Approaches and Applications, 2010

Research paper thumbnail of Lightweight LCP Construction for Next-Generation Sequencing Datasets

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Indexing Structures for Approximate String Matching

Lecture Notes in Computer Science, 2003

ABSTRACT In this paper we give the first, to our knowledge, structures and corresponding algorith... more ABSTRACT In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

Research paper thumbnail of Prague Stringology Conference 2013

Research paper thumbnail of Epichristoffel Words and Minimization of Moore Automata

Research paper thumbnail of Sorting suffixes of a text via its Lyndon Factorization

Research paper thumbnail of Suffixes, Conjugates and Lyndon Words

Lecture Notes in Computer Science, 2013

ABSTRACT

Research paper thumbnail of Suffix Automata and Standard Sturmian Words

Lecture Notes in Computer Science, 2007

Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w| + 1... more Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w| + 1 states and at most 2|w| − 1 states. In this paper we characterize the language L of all binary words w whose minimal suffix automaton S(w)\mathcal{S}(w) has exactly |w| + 1 states; they are precisely all prefixes of standard Sturmian words. In particular, we give

Research paper thumbnail of Universal lyndon words

A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is... more A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study universal Lyndon words, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit combinatorial and structural properties of these words. We then define particular prefix codes, which we call Hamiltonian lex-codes, and show that every Hamiltonian lex-code is in bijection with the set of the shortest unrepeated prefixes of the conjugates of a universal Lyndon word. This allows us to give an algorithm for constructing all the universal Lyndon words.

Research paper thumbnail of Forbidden Factors and Fragment Assembly

Lecture Notes in Computer Science, 2002

In this paper we approach the fragment assembly problem by using the notion of minimal forbidden ... more In this paper we approach the fragment assembly problem by using the notion of minimal forbidden factors introduced in previous paper. Denoting by M(w) the set of minimal forbidden factors of a word w, we first focus on the evaluation of the size of elements in M(w) and on designing of an algorithm to recover the word w from M(w). Actually we prove that for a word w randomly generated by a memoryless source with identical symbol probabilities, the maximal length m(w) of words in M(w) is logarithmic and that the reconstruction algorithm runs in linear time. These results have an interesting application to the fragment assembly problem, i.e. reconstruct a word w from a given set I of substrings (fragments). Indeed under a suitable hypothesis on the set of fragments I, one can detect the elements of M(w) by looking at the minimal forbidden factors of elements in I and then apply the reconstruction algorithm.

Research paper thumbnail of Cyclic Complexity of Words

ABSTRACT We introduce and study a new complexity function on words, that we call \emph{cyclic com... more ABSTRACT We introduce and study a new complexity function on words, that we call \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of each given length. We extend the famous Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words having different slopes. More precisely, we prove that if xxx is a Sturmian word and yyy is a word having the same cyclic complexity of xxx, then yyy is Sturmian and, up to renaming letters, has the same language of factors of xxx.

Research paper thumbnail of Fundamenta Informaticae XX (2003) 1--15 1 IOS Press

Research paper thumbnail of Nondeterministic Moore automata and Brzozowski's algorithm

Moore automata represent a model that has many applications. In this paper we define a notion of ... more Moore automata represent a model that has many applications. In this paper we define a notion of coherent nondeterministic Moore automaton (NMA) and show that such a model has the same computational power of the classical deterministic Moore automaton. We consider also the problem of constructing the minimal deterministic Moore automaton equivalent to a given NMA. We propose an algorithm that is a variant of Brzozowski's minimization algorithm in the sense that it is essentially structured as reverse operation and subset construction performed twice. Moreover, we explore more general classes of NMA and analyze the applicability of the algorithm. For some of such classes the algorithm does not return the minimal equivalent deterministic automaton.

Research paper thumbnail of Circular sturmian words and Hopcroft’s algorithm

Theoretical Computer Science, 2009

In order to analyze some extremal cases of Hopcroft's algorithm, we investigate the relationships... more In order to analyze some extremal cases of Hopcroft's algorithm, we investigate the relationships between the combinatorial properties of a circular sturmian word (x) and the run of the algorithm on the cyclic automaton A x associated to (x). The combinatorial properties of words taken into account make use of sturmian morphisms and give rise to the notion of reduction tree of a circular sturmian word. We prove that the shape of this tree uniquely characterizes the word itself. The properties of the run of Hopcroft's algorithm are expressed in terms of the derivation tree of the automaton, which is a tree that represents the refinement process that, in the execution of Hopcroft's algorithm, leads to the coarsest congruence of the automaton. We prove that the shape of the reduction tree of a circular sturmian word (x) coincides with that of the derivation tree T (A x ) of the automaton A x . From this we derive a recursive formula to compute the running time of Hopcroft's algorithm on the automaton A x , expressed in terms of parameters of the reduction tree of (x). As a special application, we obtain the time complexity Θ(n log n) of the algorithm in the case of automata associated to Fibonacci words.

Research paper thumbnail of Lightweight LCP construction for next-generation sequencing datasets

Research paper thumbnail of Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

Lecture Notes in Computer Science, 2003

... Theorem 3 is an assessment of the fact that the Burrows-Wheeler transform combined with Combi... more ... Theorem 3 is an assessment of the fact that the Burrows-Wheeler transform combined with Combinatorial Dependency is a general boosting method for a base compressor C. In Sections 5 and 6, we outline the compression algorithms HC and RHC that we use for boosting in ...

Research paper thumbnail of Suffix array and Lyndon factorization of a text

Journal of Discrete Algorithms, 2014

ABSTRACT

Research paper thumbnail of Sorting Conjugates and Suffixes of Words in a Multiset

International Journal of Foundations of Computer Science, 2014

ABSTRACT

Research paper thumbnail of The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words

Lecture Notes in Computer Science, 2013

ABSTRACT

Research paper thumbnail of An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression

Lecture Notes in Computer Science, 2005

... Recall that the Burrows Wheeler transform has been introduced as a tool to preprocessa word t... more ... Recall that the Burrows Wheeler transform has been introduced as a tool to preprocessa word to be compressed, in order to get a word easier to compress. ... We denote by Ca compressor that uses the transformation E as preprocessing. ...

Research paper thumbnail of Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological Data Mining

Techniques, Approaches and Applications, 2010

Research paper thumbnail of Lightweight LCP Construction for Next-Generation Sequencing Datasets

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Indexing Structures for Approximate String Matching

Lecture Notes in Computer Science, 2003

ABSTRACT In this paper we give the first, to our knowledge, structures and corresponding algorith... more ABSTRACT In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

Research paper thumbnail of Prague Stringology Conference 2013

Research paper thumbnail of Epichristoffel Words and Minimization of Moore Automata

Research paper thumbnail of Sorting suffixes of a text via its Lyndon Factorization

Research paper thumbnail of Suffixes, Conjugates and Lyndon Words

Lecture Notes in Computer Science, 2013

ABSTRACT

Research paper thumbnail of Suffix Automata and Standard Sturmian Words

Lecture Notes in Computer Science, 2007

Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w| + 1... more Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w| + 1 states and at most 2|w| − 1 states. In this paper we characterize the language L of all binary words w whose minimal suffix automaton S(w)\mathcal{S}(w) has exactly |w| + 1 states; they are precisely all prefixes of standard Sturmian words. In particular, we give

Research paper thumbnail of Universal lyndon words

A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is... more A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study universal Lyndon words, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit combinatorial and structural properties of these words. We then define particular prefix codes, which we call Hamiltonian lex-codes, and show that every Hamiltonian lex-code is in bijection with the set of the shortest unrepeated prefixes of the conjugates of a universal Lyndon word. This allows us to give an algorithm for constructing all the universal Lyndon words.