On a special class of primitive words (original) (raw)

Language-theoretic aspects of DNA complematarity

Theoretical Computer Science, 2001

The optimism about the possibilities of DNA computing is based on two central issues: the Watson-Crick complementarity and the massive parallelism of DNA strands. While the latter issue renders exhaustive searches possible and thus may settle problems previously considered intractable, the former issue is the cause behind the universality of many models of DNA computing. Moreover, complementarity can be viewed as a purely language-theoretic operation: undesirable circumstances in a string trigger a transition to the complementary string. This aspect of complementarity is investigated in the present paper, mainly from the point of view of L systems. New types of word sequences will be discovered. Sometimes the resulting decision problems are equivalent to well-known open problems from other areas.

An extension of the Lyndon-Schützenberger result to pseudoperiodic words

Information and Computation/information and Control, 2011

One of the particularities of information encoded as DNA strands is that a string u contains basically the same information as its Watson-Crick complement, denoted here as θ(u). Thus, any expression consisting of repetitions of u and θ(u) can be considered in some sense periodic. In this paper, we give a generalization of Lyndon and Schützenberger's classical result about equations of the form u l = v n w m , to cases where both sides involve repetitions of words as well as their complements. Our main results show that, for such extended equations, if l 5, n, m 3, then all three words involved can be expressed in terms of a common word t and its complement θ(t). Moreover, if l 5, then n = m = 3 is an optimal bound. These results are established based on a complete characterization of all possible overlaps between two expressions that involve only some word u and its complement θ(u), which is also obtained in this paper. Crown

DNA-words and word posets

In the paper two variants of a combinatorial problem for the set F n q of sequences of length n over the alphabet F q = {0, 1, .., q − 1} are considered, with some applications. The original problem was the following: what is the smallest k such that every word v ∈ F n q is uniquely determined by the set of its subwords of length up to k. This problem was solved by Lothaire [1]. We consider the following variant of this problem: the n-letter word w = w 1 ...w n (which is called a DNA-word) is composed over an alphabet consisting of q complement pairs:{i,ī : i = 0, .., q − 1}; and denote by w * its reverse complement, i.e. w * =w n ...w 1. A DNA-word u is called a subword of w if it is a subword of either w or w *. As above, we're looking for the smallest k. We give a simple proof for k = n − 1, and apply this result for determining the automorphism group of the poset of DNA-words of length at most n, partially ordered by the above subword relation. Furthermore we give a sharp result k ∼ 2n/3, which is an analogue of the former result [1].

Primitive Sets of Words

arXiv (Cornell University), 2020

Given a (finite or infinite) subset X of the free monoid A * over a finite alphabet A, the rank of X is the minimal cardinality of a set F such that X ⊆ F *. We say that a submonoid M generated by k elements of A * is k-maximal if there does not exist another submonoid generated by at most k words containing M. We call a set X ⊆ A * primitive if it is the basis of a |X|-maximal submonoid. This definition encompasses the notion of primitive word-in fact, {w} is a primitive set if and only if w is a primitive word. By definition, for any set X, there exists a primitive set Y such that X ⊆ Y *. We therefore call Y a primitive root of X. As a main result, we prove that if a set has rank 2, then it has a unique primitive root. To obtain this result, we prove that the intersection of two 2-maximal submonoids is either the empty word or a submonoid generated by one single primitive word. For a single word w, we say that the set {x, y} is a bi-root of w if w can be written as a concatenation of copies of x and y and {x, y} is a primitive set. We prove that every primitive word w has at most one bi-root {x, y} such that |x| + |y| < |w|. That is, the bi-root of a word is unique provided the word is sufficiently long with respect to the size (sum of lengths) of the root. Our results are also compared to previous approaches that investigate pseudo-repetitions, where a morphic involutive function θ is defined on A *. In this setting, the notions of θ-power, θ-primitive and θ-root are defined, and it is shown that any word has a unique θ-primitive root. This result can be obtained with our approach by showing that a word w is θ-primitive if and only if {w, θ(w)} is a primitive set.

Watson-Crick Conjugate and Commutative Words

Lecture Notes in Computer Science, 2008

This paper is a theoretical study of notions in combinatorics of words motivated by information being encoded as DNA strands in DNA computing. We generalize the classical notions of conjugacy and commutativity of words to incorporate the notion of an involution function, a formalization of the Watson-Crick complementarity of DNA single-strands. We define and study properties of Watson-Crick conjugate and commutative words, as well as Watson-Crick palindromes. We obtain, for example, a complete characterization of the set of all words that are not Watson-Crick palindromes. Our results hold for more general functions, such as arbitrary morphic and antimorphic involutions. They generalize classical results in combinatorics of words, while formalizing concepts meaningful for DNA computing experiments.

WATSON-CRICK BORDERED WORDS AND THEIR SYNTACTIC MONOID

International Journal of Foundations of Computer Science, 2008

DNA strands that, mathematically speaking, are finite strings over the alphabet {A, G, C, T } are used in DNA computing to encode information. Due to the fact that A is Watson-Crick complementary to T and G to C, DNA single strands that are Watson-Crick complementary can bind to each other or to themselves in either intended or unintended ways. One of the structures that is usually undesirable for biocomputation, since it makes the affected DNA string unavailable for future interactions, is the hairpin: If some subsequences of a DNA single string are complementary to each other, the string will bind to itself forming a hairpin-like structure. This paper studies a

Partial Words for DNA Coding

Lecture Notes in Computer Science, 2005

A very basic problem in all DNA computations is finding a good encoding. Apart from the fact that they must provide a solution, the strands involved should not exhibit any undesired behaviour, especially they should not form secondary structures. Various combinatorial properties like repetition-freeness and involution-freeness have been proposed to exclude such misbehaviour. Another option, which has been considered, is requiring a big Hamming distance between the codewords. We propose to consider partial words for the solution of the coding problem. They, in some sense, already include the Hamming distance in the definition of compatibility and are investigated for many combinatorial properties. Thus, they can be used to guarantee a desired distance and simultaneously other properties. As the investigations on partial words are attracting more and more attention, they might be able to provide an ever-growing toolbox for finding good DNA encodings.

Iterated morphisms with complementarity on the DNA alphabet

Watson-Crick complementarity can be viewed as a language-theoretic operation: \bad" words obtained through a generative process are replaced by their complementary ones. This idea seems particularly suitable for Lindenmayer systems. D0L systems augmented with a speci c complementarity transition, Watson-Crick D0L systems, have turned out to be a most interesting model and have already been extensively studied. In the present paper, attention is focused on Watson-Crick D0L systems, where the alphabet is the original four-letter DNA alphabet fA; G; T; Cg. Growth functions and various decision problems will be investigated. Previously known cases about growth functions that are not Z{rational deal with alphabets bigger than the DNA alphabet, and it has been an open problem whether similar constructions can be carried out for the DNA alphabet. The main result in this paper shows that this is indeed the case.

Transducer Descriptions of DNA Code Properties and Undecidability of Antimorphic Problems

Lecture Notes in Computer Science, 2015

This work concerns formal descriptions of DNA code properties, and builds on previous work on transducer descriptions of classic code properties and on trajectory descriptions of DNA code properties. This line of research allows us to give a property as input to an algorithm, in addition to any regular language, which can then answer questions about the language and the property. Here we define DNA code properties via transducers and show that this method is strictly more expressive than that of trajectories, without sacrificing the efficiency of deciding the satisfaction question. We also show that the maximality question can be undecidable. Our undecidability results hold not only for the fixed DNA involution but also for any fixed antimorphic permutation. Moreover, we also show the undecidability of the antimorphic version of the Post Corresponding Problem, for any fixed antimorphic permutation. 2 Basic Notions and Background Information In this section we lay down our notation for formal languages, (anti-)morphic permutations, transducers, and language properties. We assume the reader to be familiar with the fundamental concepts of language theory; see e. g., [12,26]. Then, in Sect. 2.2 we recall the method of transducers for describing classic code properties, and in Sect. 2.3 we recall the method of trajectories for describing DNA-related properties. 2.1 Formal Languages and (Anti-)morphic Permutations An alphabet A is a finite set of letters; A * is the set of all words or strings over A; ε denotes the empty word ; and A + = A * \ {ε}. A language L over A is a subset L ⊆ A * ; the complement L c of L is the language A * \ L. For an integer m ∈ N we let A ≤m denote the set of words whose length is at most m; i. e., A ≤m = i≤m A i. The DNA alphabet is ∆ = {A, C, G, T}. Often it is convenient to consider the generic alphabet A k = {0, 1,. .. , k − 1} of size k rather than a general alphabet; note that A 2 ⊆ A 3 ⊆ A 4 ⊆ • • •. Throughout this paper we only consider alphabets with at least two letters because our investigations would become trivial over unary alphabets. Let w ∈ A * be a word. Unless confusion arises, by w we also denote the singleton language {w}, e. g., L ∪ w means L ∪ {w}. If w = xyz for some x, y, z ∈ A * , then x, y, and z are called prefix, infix (or factor), and suffix of w, respectively. For a language L ⊆ A * , the set Pref(L) = {x ∈ A * | ∃y ∈ A * : xy ∈ L} denotes the language containing all prefixes of words in L. If w = a 1 a 2 • • • a n for letters a 1 , a 2 ,. .. , a n ∈ A, then |w| = n is the length of w; for b ∈ A, |w| b = |{i | a i = b, 1 ≤ i ≤ n}| is the tally of b occurring in w; the i-th

Relative Watson-Crick Primitivity of Words

J. Autom. Lang. Comb., 2018

We introduce the concept of relative Watson-Crick primitivity of words and its generalization, the relative θ-primitivity of words, where θ is a morphic or an antimorphic involution. Similar to relatively prime integers which do not share any common factors, we call two words u and v relatively θ-primitive if they do not share a common θ-primitive root. We study some combinatorial properties of relatively θ-primitive words, as well as establish relations between each of the two words u and v and the result of some binary word operation between u and v, from this perspective.