Experimental Algorithms Research Papers - Academia.edu (original) (raw)

Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern... more

Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H0(S) + 1), and H0(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. The new method is applied to text compression. We also propose average case optimal string matching algorithms.

We consider the following string matching problem. Pattern p 0p 1p 2 ... p m − − 1 (δ,α)-matches the text substring ti0ti1ti2ldotstim−1t_{i_0} t_{i_1} t_{i_2} \ldots t_{i_{m-1}}ti0ti1ti2ldotstim1 , if ∣pj−tij∣leqdelta|p_j-t_{i_j}|\leq\deltapjtijleqdelta for j ∈ {0,..., m–1}, where 0 < i j + 1 – i j... more

We consider the following string matching problem. Pattern p 0p 1p 2 ... p m − − 1 (δ,α)-matches the text substring ti0ti1ti2ldotstim−1t_{i_0} t_{i_1} t_{i_2} \ldots t_{i_{m-1}}ti0ti1ti2ldotstim1 , if ∣pj−tij∣leqdelta|p_j-t_{i_j}|\leq\deltapjtijleqdelta for j ∈ {0,..., m–1}, where 0 < i j + 1 – i j ≤ α + 1. The task is then to find all text positions i m − − 1 that (δ,α)-match the pattern. For a text of length n, the best previously known algorithms for this string matching problem run in time O(nm) and in time O(n⌈mα/w⌉), where the former is based on dynamic programming, and the latter on bit-parallelism with w bits in computer word (32 or 64 typically). We improve these to take O(nδ + ⌈n/w⌉m) and O(n ⌈m log(α)/w⌉), respectively, worst case time using bit-parallelism. On average the algorithms run in O(⌈n/w⌉⌈αδ/σ⌉ + n)and O(n) time. Our experimental results show that the algorithms work extremely well in practice. Our algorithms handle general gaps as well, having important applications in computational biology.

Robustness is a property that pervades all aspects of nature. The ability of a system to adapt to perturbations due to internal and external agents, aging, wear, or to environmental changes is one of the driving forces of evolution. At... more

Robustness is a property that pervades all aspects of nature. The ability of a system to adapt to perturbations due to internal and external agents, aging, wear, or to environmental changes is one of the driving forces of evolution. At the molecular level, understanding the ro-bustness of a protein has a great impact on the in-silico design of polypep-tide chains and drugs. The chance of computationally checking the ability of a protein to preserve its structure in the native state may lead to the design of new compounds that can work in a living cell more effectively. Inspired by the well known robustness analysis framework used in Electronic Design Automation, we introduce a formal definition of robustness for proteins and a dimensionless quantity, called yield, to quantify the robustness of a protein. Then, we introduce a new robustness-centered protein design algorithm called Design-For-Yield. The aim of the algorithm is to discover new conformations with a specific functionality and high yield values. We present extensive characterizations of the robust-ness properties of many peptides, proteins, and drugs. Finally, we apply the DFY algorithm on the Crambin protein (1CRN) and on the Oxic-itin drug (DB00107). The obtained results confirm that the algorithm is able to discover a Crambin-like protein that is 23.61% more robust than the wild type. Concerning the Oxicitin drug a new protein sequence and the corresponding protein structure was discovered with an improved robustness of 3% at the global level.