Suffix Tree Research Papers - Academia.edu (original) (raw)

Top Papers
Most Cited Papers
Most Downloaded Papers
Newest Papers
People
- by
- •
- Suffix Tree, Efficient Algorithm for ECG Coding, Auxiliary information, Binary Search Tree
- by Wolfgang Gerlach
- •
- Applied Mathematics, Pure Mathematics, Sequence Analysis, Data Structure

The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.

- by Essam Mansour
- •
- Data Compression, Time series analysis, Data Structure, Shared memory
- by Arlindo Oliveira
- •
- Engineering, Algorithms, Data Structure, Mathematical Sciences
- by Lorna Love
- •
- Applied Mathematics, Data Structure, Suffix Tree, Empirical evidence
- by Wojciech Rytter and +1
- •
- Multidisciplinary, Data Structure, Combinatorial Problems, Text Processing
- by Massimiliano Ruocco
- •
- Event Detection, Image Retrieval, Text Analysis, Image Annotation

Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems --- in particular online string searching. Here we investigate the potential of suitably adapted binary search trees as competitors in this context. The suffix binary search tree (SBST) and its balanced counterpart, the suffix AVL-tree, are conceptually simple, relatively easy to implement, and offer time and space efficiency to rival suffix trees and suffix arrays, with distinct advantages in some circumstances --- for instance in cases where only a subset of the suffixes need be represented. Construction of a suffix BST can be achieved in O(L) time, where L is the path length of the tree, and in the case of a suffix AVL-tree this is O(n log n), where n is the length of the input string. Searching for an m- long substring requires O(m + l) time, where l is the length of the ...

- by Lorna Love
- •
- Mathematics, Applied Mathematics, Computer Science, Data Structure
- by Muhammad Rafi
- •
- Data Mining, Semantics, Computational Modeling, Clustering Algorithms

In this chapter we deal with various string manipulation problems which originate from the field of computational biology and mu- sicology. These problems are: "approximate string matching with gaps", "inference of maximal pairs in a set of strings" and "handling of weighted sequences". We provide new upper bounds for solving these problems and for the third we propose a novel

- by Katerina Perdikuri
- •
- Computational Biology, Data Structure, Suffix Tree, Upper Bound
- by Chris Upton
- •
- Suffix Tree
- by Yannis Panagis
- •
- Information Systems, Web Mining, Data and Knowledge Modeling, Work Environment

A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. All the existing practical algorithms perform random access to the input string, thus requiring that the input be small enough to be kept in main memory.

- by Marina Barsky
- •
- Computer Science, Design, Data Structure, External memory algorithms
- by Matthias Petri
- •
- Data Structure, Suffix Tree, Data storage, Query processing
- by Michael Rodeh
- •
- Suffix Tree, Electrical And Electronic Engineering
- by Shmuel Tomi Klein
- •
- Information Systems, Data Compression, Library and Information Studies, Suffix Tree
- by Francisco Fernandes junior
- •
- Computer Science, Information Theory, Medicine, Statistical Significance
- by Andrew McCallum
- •
- Reinforcement Learning, Suffix Tree, Memory Based Learning, Short Term Memory
- by Lorna Love
- •
- Suffix Tree, Efficient Algorithm for ECG Coding, Auxiliary information, Binary Search Tree
- by Matthias Petri
- •
- Data Structure, Suffix Tree, Data storage, Query processing
- by Mario Cannataro
- •
- Bioinformatics, Grid Computing, Mass Spectrometry, Performance Evaluation
- by Livio Colussi
- •
- Engineering, Data Structure, Mathematical Sciences, Suffix Tree
- by Raphael Finkel
- •
- Plagiarism Detection, Suffix Tree, World Wide Web
- by Anindya Poddar
- •
- Algorithms, Computational Biology, Molecular Evolution, Pattern Recognition
- by Filipo Mignosi
- •
- Data Compression, Text, CPM, Suffix Tree
- by Marina Barsky
- •
- Design, Database Management Systems, Suffix Tree, Random Access
- by Kashyap Dixit
- •
- Applied Mathematics, Pure Mathematics, Sequence Analysis, Data Structure
- by Chris Upton and +2
- •
- Suffix Tree, Random Access, Indexation
- by Hamid A Basit
- •
- Software Maintenance, Reverse Engineering, Data Structure, Suffix Tree
- by Rajesh Pampapathi
- •
- Cognitive Science, Machine Learning, SPAM, Data Structure
- by Alexander Bolshoy
- •
- Bioinformatics, Genome Size, Biological Sciences, Mathematical Sciences
- by Marinella Sciortino
- •
- Data Compression, Parameter estimation, Data Structure, Probabilistic Model Checking

Abstract. We propose a new algorithm called the MCCM (Match Chaining-based cDNA Mapping) algorithm that allows mapping cDNAs to the genomes efficiently and accurately, utilizing local matches called MUMs (maximal unique matches) or MRMs (maximal rare matches) obtained with suffix trees. From the MUMs (or MRMs), our algorithm selects appropriate matches which are related to the cDNA mapping. We call the selection the match chaining problem. Several O(k log k)-time algorithms are known where k is the number of the input matches, but they do not permit overlaps of the matches. We propose a new O(k log k)-time algorithm for the problem with provision for overlaps. Previously, only an O(k 2)-time algorithm existed. Furthermore, we also incorporate a restriction on the distances between matches for accurate cDNA mapping. We examine the performance of our algorithm through computational experiments using sequences of the FANTOM mouse cDNA database and the mouse genome. According to the exp...

- by Igor Kurochkin
- •
- Suffix Tree, Wabi
- by Fangrui Ma
- •
- Genetics, Suffix Tree, Time Complexity, System Sciences
- by mohamed ibrahim
- •
- Applied Mathematics, Comparative Genomics, Data Structure, Suffix Tree
- by Basith T H Thekkar
- •
- Software Maintenance, Reverse Engineering, Data Structure, Suffix Tree
- by Usman Rafi
- •
- Data Mining, Semantics, Computational Modeling, Clustering Algorithms

Finding motifs in biological sequences is one of the most intriguing problems for string algorithms designers as it is necessary to deal with approximations and this complicates the problem. Existing algorithms run in time linear with the input size. Nevertheless, the output size can be very large due to the approximation. This makes the output often unreadable, next to slowing down the inference itself. Since only a subset of the motifs, i.e. the maximal motifs, could be enough to give the information of all of them, in this paper, we aim at removing such redundancy. We define notions of maximality that we characterize in the suffix tree data structure. Given that this is used by a whole class of motifs extraction tools, we show how these tools can be modified to include the maximality requirement on the fly without changing the asymptotical complexity.

- by Maria Federico
- •
- Molecular Biology, Complexity, Theoretical Computer Science, Data Structure
- by norma herrera
- •
- Suffix Tree, Text Indexing
- by Srinivasa Rao
- •
- Algorithms, Suffix Tree, Search Algorithm, Indexation
- by Mohammed Zaki
- •
- Bioinformatics, Functional Analysis, Distributed Computing, Molecular Biology
- by Wing-kai Hon
- •
- Information Retrieval, Data Structure, Suffix Tree, Succinct Data Structures
- by Wing-kai Hon
- •
- Data Structure, Suffix Tree, Pattern Matching, Time Complexity
- by Wing-kai Hon
- •
- Suffix Tree, Pattern Matching, Time Complexity, Total Length

- by Amin Allam
- •
- Data Compression, Time series analysis, Data Structure, Shared memory
- by C. Epifanio
- •
- Data Compression, CPM, Suffix Tree, Indexation
- by Ancha Baranova and +2
- •
- Gene expression, real time PCR, Suffix Tree, Efficient Algorithm for ECG Coding

In this paper we consider the problem of web page usage prediction in a web site by modeling users’ navigation history with weighted suffix trees. This user’s navigation prediction can be exploited either in an on-line recommendation system in a website or in a web-page cache system. The method proposed has the advantage that it demands a constant amount of

- by Evangelos Theodoridis
- •
- Web Mining, Work Environment, Suffix Tree, World Wide Web
- by Arno Buchner
- •
- Biological Sciences, Software, Environmental Sciences, Suffix Tree