SWAMP: Smith-Waterman using associative massive parallelism (original) (raw)

A local sequence alignment algorithm using an associative model of parallel computation

IASTED Computational and Systems …

Local sequence alignment is widely used to discover structural and hence, functional similarities between biological sequences. While the faster heuristic methods like BLAST and FASTA are useful to compare a single sequence to hundreds or even thousands of sequences in genetic databases such as GenBank, EMBL, and DDBJ, this work yields pairwise alignments with a high sensitivity. The heuristic methods are ideal for narrowing down the number of "good" sequences. Rigorous alignment can then be utilized for an in-depth comparison between the query sequence and the newly found sequence subset. A data-parallel algorithm for local sequence alignment based on the Smith-Waterman algorithm has been adapted for an associative model of parallel computation known as ASC. The algorithm finds the best local alignment in O(m + n) time using m + 1 processing elements.

A survey of Parallel models for Sequence Alignment using Smith Waterman Algorithm

Nowadays stack of biological data growing steeply, so there is need of smart way to handle and process these data to extract meaningful information related to biological life. The purpose of this survey is to study various parallel models which perform alignment of the sequences as fast as possible, which is a big challenge for both engineer and biologist. The various parallel models discussed in this paper are: implementation using associative massive parallelism contain architecture such as associative computing, ClearSpeed coprocessor and Convey Computer. Some parallel programming models such as MPI, OpenMP and hybrid (combination of both). Then the implementation of alignment using systolic array and lastly uses single and multi-graphics processors, that is, using graphics processing units.

A Parallel Pairwise Local Sequence Alignment Algorithm

IEEE Transactions on NanoBioscience, 2000

Researchers are compelled to use heuristic-based pairwise sequence alignment tools instead of Smith-Waterman (SW) algorithm due to space and time constraints, thereby losing significant amount of sensitivity. Parallelization is a possible solution, though, till date, the parallelization is restricted to database searching through database fragmentation. In this paper, the power of a cluster computer is utilized for developing a parallel algorithm, RPAlign, involving, first, the detection of regions that are potentially alignable, followed by their actual alignment. RPAlign is found to reduce the timing requirement by a factor of upto 9 and 99 when used with the basic local alignment search tool (BLAST) and SW, respectively, while keeping the sensitivity similar to the corresponding method. For distantly related sequences, which remain undetected by BLAST, RPAlign with SW can be used. Again, for megabase-scale sequences, when SW becomes computationally intractable, the proposed method can still align them reasonably fast with high sensitivity. Index Terms-Basic local alignment search tool (BLAST), message passing interface (MPI), parallel computing, Smith-Waterman (SW).

FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment

Algorithmica, 2006

Pairwise sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein sequences of length ¢ and £ , full-matrix (FM), dynamic programming alignment algorithms such as Needleman-Wunsch and Smith-Waterman take O(¢ ¥ ¤ £ ) time and use a possibly prohibitive O(¢ ¥ ¤ ¦ £ ) space. Hirschberg's algorithm reduces the space requirements to O(¢ § © £ ¢ £ ) , but requires approximately twice the number of operations required by the FM algorithms. The Fast Linear Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the amount of available memory. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster than Hirschberg and the FM algorithms. We have also parallelized FastLSA using a simple but effective form of wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups.

A parallel strategy for biological sequence alignment in restricted memory space

Journal of Parallel and …, 2008

The algorithm proposed by Smith-Waterman is an exact method that obtains optimal local alignments in quadratic space and time. For long sequences, quadratic complexity makes the use of this algorithm impractical. In this scenario, parallel computing is a very attractive alternative. In this paper, we propose and evaluate z-align, a parallel exact strategy based on the divergence concept to locally align long biological sequences using an affine gap function. Zalign runs in limited memory space, where the amount of memory used can be defined by the user. The results collected in a cluster with 16 processors presented very good speedups for long real DNA sequences. By comparing the results obtained with z-align and BLAST, it is clear that z-align is able to produce longer and more significant alignments.

An Overview of Multiple Sequence Alignment Parallel Tools

Multiple sequence alignment is a key problem to most bioinformatics applications. The last ten years have witnessed a big improvement to existing multiple alignment tools and the development of new ones. Various parallel architectures have been experimented for reaching the highest level of accuracy and speed. This paper surveys most popular tools to clarify how parallelism accelerates the processing of large biological data set and improves alignment accuracy. It aims at guiding biologists/scientists to the appropriate software.

Fast and Exact Sequence Alignment with the Smith-Waterman Algorithm: The SwissAlign Webserver

2013

It is demonstrated earlier that the exact Smith-Waterman algorithm yields more accurate results than the members of the heuristic BLAST family of algorithms. Unfortunately, the Smith-Waterman algorithm is much slower than the BLAST and its clones. Here we present a technique and a webserver that uses the exact Smith-Waterman algorithm, and it is approximately as fast as the BLAST algorithm. The technique unites earlier methods of extensive preprocessing of the target sequence database, and CPU-specific coding of the Smith-Waterman algorithm. The SwissAlign webserver is available at the http://swissalign.pitgroup.org address.

A Survey of Multiple Sequence Alignment Parallel Tools

Motivation: Multiple sequence alignment is a key problem to most bioinformatics applications, from evolutionary studies to prediction of protein structure, molecular function, intermolecular interactions, gene finding and phylogenitic analysis. The last ten years have witnessed a big improvement to existing multiple alignment tools and the development of new ones. Varieties of parallel architectures have been experimented such as supercomputer, cluster, grid, cloud, and multi-core machine for the purpose of reaching the highest level of accuracy and speed. Results: This paper surveys most popular tools to clarify how parallelism accelerates the processing of large biological data set and improve alignment accuracy. It also introduces a comparative study that aims at guiding biologists to choose the appropriate software depending on their requirements and their hardware potentials.

A Parallel Algorithm for Multiple Biological Sequence Alignment

Lecture Notes in Computer Science, 2012

The search of a multiple sequence alignment (MSA) is a well-known problem in bioinformatics that consists in finding a sequence alignment of three or more biological sequences. In this paper, we propose a parallel iterative algorithm for the global alignment of multiple biological sequences. In this algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. It uses a Longest Common Subsequence (LCS) technique in order to generate a first MSA. An iterative process improves the MSA by applying a number of operators that have been implemented to produce more accurate alignments. Simulations were made using sequences from the UniProKB protein database. A preliminary performance analysis and comparison with several common methods for MSA shows promising results. The implementation was developed on a cluster platform through the use of the standard Message Passing Interface (MPI) library.

A Tree-Based Method of Sequence Alignment

2010

We describe an original fast algorithm of sequence alignment and its computer realization. Here in examples the aligned are regions upstream the same gene in different genomes. An alignment is constructed with the algorithm, which uses a binary tree representation of distances between any pair of sequences from corresponding genomes (organisms). If the binary tree is unknown, it is inferred by resolving all nonbinary nodes in a given non-binary tree, which is usually known. Thus, the algorithm realizes fast generation of binary trees compatible with a given non-binary tree and produces the best alignment by sampling the generated tree space. The algorithm was tested with biological data and simulations.

SWAMP: Smith-Waterman using associative massive parallelism (original) (raw)

Related papers