Periodicity algorithms and a conjecture on overlaps in partial words (original) (raw)

Local periods and binary partial words: an algorithm

Theoretical Computer Science, 2004

The study of the combinatorial properties of strings of symbols from a finite alphabet (also referred to as words) is profoundly connected to numerous fields such as biology, computer science, mathematics, and physics. Research in combinatorics on words goes back roughly a century. There is a renewed interest in combinatorics on words as a result of emerging new application areas such as molecular biology. Partial words were recently introduced in this context. The motivation behind the notion of a partial word is the comparison of genes (or proteins). Alignment of two genes (or two proteins) can be viewed as a construction of partial words that are said to be compatible. While a word can be described by a total function, a partial word can be described by a partial function. More precisely, a partial word of length n over a finite alphabet A is a partial function from {1, . . . , n} into A. Elements of {1, . . . , n} without an image are called holes. A word is just a partial word without holes. The notion of period of a word is central in combinatorics on words. In the case of partial words, there are two notions: one is that of period, the other is that of local period. This paper extends to partial words with one hole the well known result of Guibas and Odlyzko which states that for every word u, there exists a word v of same length as u over the alphabet {0, 1} such that the set of all periods of u coincides with the set of all periods of v. Our result states that for every partial word u with one hole, there exists a partial word v of same length as u with at most one hole over the alphabet {0, 1} such that the set of all periods of u coincides with the set of all periods of v and the set of all local periods of u coincides with the set of all local periods of v. To prove our result, we use the technique of Halava, Harju and Ilie which they used * This material is based upon work supported by the National Science Foundation under Grants CCR-9700228 and CCR-0207673. A Research Assignment from the University of North Carolina at Greensboro is gratefully acknowledged. I thank Phuongchi Thi Le for very valuable comments and suggestions. She received a research assistantship from the University of North Carolina at Greensboro to work with me on this project.

Periodic-like words, periodicity, and boxes

Acta Informatica, 2001

We introduce the notion of periodic-like word. It is a word whose longest repeated prefix is not right special. Some different characterizations of this concept are given. In particular, we show that a word w is periodic-like if and only if it has a period not larger than |w| − R w , where R w is the least non-negative integer such that any prefix of w of length ≥ R w is not right special. We derive that if a word w has two periods p, q ≤ |w| − R w , then also the greatest common divisor of p and q is a period of w. This result is, in fact, an improvement of the theorem of Fine and Wilf. We also prove that the minimal period of a word w is equal to the sum of the minimal periods of its components in a suitable canonical decomposition in periodic-like subwords. Moreover, we characterize periodic-like words having the same set of proper boxes, in terms of the important notion of root-conjugacy. Finally, some new uniqueness conditions for words, related to the maximal box theorem are given.

Computing Weak Periods of Partial Words EXTENDED ABSTRACT

Fine and Wilf's well-known theorem states that any word having periods p, q and length at least p + q − gcd(p, q) also has gcd(p, q), the greatest common divisor of p and q, as a period. Moreover, the length p + q − gcd(p, q) is critical since counterexamples can be provided for shorter words. This result has since been extended to partial words, or finite sequences that may contain a number of "do not know" symbols or "holes." More precisely, any partial word u with H holes having weak periods p, q and length at least the so-denoted l H (p, q) also has strong period gcd(p, q) provided u is not (H,(p, q))-special. This extension was done for one hole by Berstel and Boasson (where the class of (1,(p, q))-special partial words is empty), for two or three holes by Blanchet-Sadri and Hegstrom, and for an arbitrary number of holes by Blanchet-Sadri. In this paper, we further extend these results, allowing an arbitrary number of weak periods. In addition to speciality, the concepts of intractable period sets and interference between periods play a role. * This material is based upon work supported by the National Science Foundation under Grant No. DMS-0452020. We thank the referees of a preliminary version of this paper for their very valuable comments and suggestions.

Repetitions in strings: algorithms and combinatorics

The article is an overview of basic issues related to repetitions in strings, concentrating on algorithmic and combinatorial aspects. This area is important both from theoretical and practical point of view. Repetitions are highly periodic factors (substrings) in strings and are related to periodicities, regularities, and compression. The repetitive structure of strings leads to higher compression rates, and conversely, some compression techniques are at the core of fast algorithms for detecting repetitions. There are several types of repetitions in strings: squares, cubes, and maximal repetitions also called runs. For these repetitions, we distinguish between the factors (sometimes qualified as distinct) and their occurrences (also called positioned factors). The combinatorics of repetitions is a very intricate area, full of open problems. For example we know that the number of (distinct) primitively-rooted squares in a string of length n is no more than 2n − Θ(log n), conjecture to be n, and that their number of occurrences can be Θ(n log n). Similarly we know that there are at most 1.029 n and at least 0.944 n maximal repetitions and the conjecture is again that the exact bound is n. We know almost everything about the repetitions in Sturmian words, but despite the simplicity of these words, the results are nontrivial. One of the main motivations for writing this text is the development during the last couple of years of new techniques and results about repetitions. We report both the progress which has been achieved and which we expect to happen.

Relationally Periodic Sequences and Subword Complexity

Lecture Notes in Computer Science, 2008

By the famous theorem of Morse and Hedlund, a word is ultimately periodic if and only if it has bounded subword complexity, i.e., for sufficiently large n, the number of factors of length n is constant. In this paper we consider relational periods and relationally periodic sequences, where the relation is a similarity relation on words induced by a compatibility relation on letters. We investigate what would be a suitable definition for a relational subword complexity function such that it would imply a Morse and Hedlund-like theorem for relationally periodic words. We consider strong and weak relational periods and two candidates for subword complexity functions.

On the maximal number of highly periodic runs in a string

A run is a maximal occurrence of a repetition v with a period p such that 2p ≤ |v|. The maximal number of runs in a string of length n was studied by several authors and it is known to be between 0.944n and 1.029n. We investigate highly periodic runs, in which the shortest period p satisfies 3p ≤ |v|. We show the upper bound 0.5n on the maximal number of such runs in a string of length n and construct a sequence of words for which we obtain the lower bound 0.406n.

Partial words and a theorem of Fine and Wilf revisited

Theoretical Computer Science, 2002

A word of length n over a ÿnite alphabet A is a map from {0; : : : ; n − 1} into A. A partial word of length n over A is a partial map from {0; : : : ; n − 1} into A. In the latter case, elements of {0; : : : ; n − 1} without image are called holes (a word is just a partial word without holes). In this paper, we extend a fundamental periodicity result on words due to Fine and Wilf to partial words with two or three holes. This study was initiated by Berstel and Boasson for partial words with one hole. Partial words are motivated by molecular biology.

New and Efficient Approaches to the Quasiperiodic Characterisation of a String

2012

A factor u of a string y is a cover of y if every letter of y lies within some occurrence of u in y; thus every cover u is also a border-both prefix and suffix-of y. A string y covered by u thus generalises the idea of a repetition; that is, a string composed of exact concatenations of u. Even though a string is coverable somewhat more frequently than it is a repetition, still a string that can be covered by a single u is rare. As a result, seeking to find a more generally applicable and descriptive notion of cover, many articles were written on the computation of a minimum k-cover of y; that is, the minimum cardinality set of strings of length k that collectively cover y. Unfortunately, this computation turns out to be NP-hard. Therefore, in this article, we propose new, simple, easily-computed, and widely applicable notions of string covering that provide an intuitive and useful characterisation of a string and its prefixes: the enhanced cover and the enhanced cover array.

On periodic properties of circular words

Discrete Mathematics, 2016

The conjugacy relation defines a partition of words into equivalence classes. We call these classes circular words. Periodic properties of circular words are investigated in this article. The Periodicity Theorem of Fine and Wilf does not hold for weak periods of circular words; instead we give a strict upper bound on the length of a non-unary circular word that has two given relatively prime weak periods. Weak periods also lead to a way of representing circular words in a more compact form. We investigate in which cases are these representations unique or minimal. We will also analyze weak periods of circular Thue-Morse, Fibonacci and Christoffel words.

An extension of the Lyndon-Schützenberger result to pseudoperiodic words

Information and Computation/information and Control, 2011

One of the particularities of information encoded as DNA strands is that a string u contains basically the same information as its Watson-Crick complement, denoted here as θ(u). Thus, any expression consisting of repetitions of u and θ(u) can be considered in some sense periodic. In this paper, we give a generalization of Lyndon and Schützenberger's classical result about equations of the form u l = v n w m , to cases where both sides involve repetitions of words as well as their complements. Our main results show that, for such extended equations, if l 5, n, m 3, then all three words involved can be expressed in terms of a common word t and its complement θ(t). Moreover, if l 5, then n = m = 3 is an optimal bound. These results are established based on a complete characterization of all possible overlaps between two expressions that involve only some word u and its complement θ(u), which is also obtained in this paper. Crown