Parallel Processing in Sequence Matching (original) (raw)

Parallel Computation of Gene Sequence Matching

One of the main challenges in bioinformatics nowadays is to create a framework to compare efficiently new DNA sequence information to large existing sequence and structure databases. Optimal methods, such as the Smith-Waterman algorithm, provide more sensitive results than heuristic algorithms such as the Dot matrix plot, FASTA and BLAST, with the drawback of increased computational complexity. FPGA implementations of Smith-Waterman exploit the intrinsic parallelism of the algorithm and achieve reductions in computation time of several orders of magnitude. In this paper we propose an implementation of the Smith-Waterman algorithm based on a linear systolic array that doubles the speed of current approaches with a minimum increase of area. The design was performed taking into account the bus I/O bottleneck (i.e. PCI), so the processing speed improvement is still available even when the systolic array is connected to a bus. The implementation results on Xilinx Virtex and Virtex2 FPGA ...

A survey of Parallel models for Sequence Alignment using Smith Waterman Algorithm

Nowadays stack of biological data growing steeply, so there is need of smart way to handle and process these data to extract meaningful information related to biological life. The purpose of this survey is to study various parallel models which perform alignment of the sequences as fast as possible, which is a big challenge for both engineer and biologist. The various parallel models discussed in this paper are: implementation using associative massive parallelism contain architecture such as associative computing, ClearSpeed coprocessor and Convey Computer. Some parallel programming models such as MPI, OpenMP and hybrid (combination of both). Then the implementation of alignment using systolic array and lastly uses single and multi-graphics processors, that is, using graphics processing units.

A Comparison of Computation Techniques for Dna Sequence Comparison

This Project shows a comparison survey done on DNA sequence comparison techniques. The various techniques implemented are sequential comparison, multithreading on a single computer and multithreading using parallel processing. This Project shows the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general purpose parallel computing platform Tiling is an important technique for extraction of parallelism. Informally, tiling consists of partitioning the iteration space into several chunks of computation called tiles (blocks) such that sequential traversal of the tiles covers the entire iteration space. The idea behind tiling is to increase the granularity of computation and decrease the amount of communication incurred between processors. This makes tiling more suitable for distributed memory architectures where communication startup costs are very high and hence frequent communication is undesirable. Our work to develop sequencecomparison mechanism and software supports the identification of sequences of DNA.

Parallelisation of sequence comparison algorithms using hybridised parallel techniques

2009

The aim of this work is how to speed up the process of the biological (DNA and proteins) sequence comparison process by using a hybrid parallelisation technique of combining different parallel methods. Smith-Waterman algorithm has been known as the most optimal algorithm for doing the sequence comparison. Unfortunately, this algorithm is considered slow due to its quadratic time complexity. Multiple Instruction Multiple Data (MIMD), Single Instruction Multiple Data (SIMD), and Single Program Multiple Data (SPMD) methods were chosen because of their efficiency, wide-availability in off-the-shelf inexpensive machines and simple network distributed systems. Based on the results, the combined (hybrid) algorithm has succeeded in reducing the overall algorithm execution time.

The implementation of bit-parallelism for DNA sequence alignment

Journal of Physics: Conference Series, 2017

Dynamic Programming (DP) remain the central algorithm of biological sequence alignment. Matching score computation is the most time-consuming process. Bit-parallelism is one of approximate string matching techniques that transform DP matrix cell unit processing into word unit (groups of cell). Bit-parallelism computate the scores column-wise. Adopting from word processing in computer system work, this technique promise reducing time in score computing process in DP matrix. In this paper, we implement bit-parallelism technique for DNA sequence alignment. Our bit-parallelism implementation haveless time for score computational process but still need improvement for thereconstruction process.

Long DNA Sequence Comparison on Multicore Architectures

Lecture Notes in Computer Science, 2010

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.

DNA sequence matching system based on hardware accelerators utilized efficiently in a multithreaded environment

2009

Bio Informatics has emerged as one of those sciences in which if knowledge, if exploited ethically, will result in the general benefit of mankind. The enormity of DNA strand data has been revealed to be of humongous proportions. It is imperative to em ploy the art of parallel and distributed supercomputing in order to process such magnanimous magnitudes of data. We have deployed a scalable array of linearly connected hardware accelerators for the solution of the Smith-Waterman Algorithm; a technique used to resolve sequence alignment of DNA strands. We have synthesized the system on a reconfigurable platform and carried out a performance analysis of the speedup factor accomplished. The system is further connected to a powerful embedded microprocessor which, in a multithreaded environment, serves as an interface to the World Wide Web. This effort is in a bid to bring High-Performance Computing, in this domain, to the doorstep of scientists and enthusiasts alike in a costeffective manner, thereby, triggering an avalanche of discoveries and providing much needed impetus to scientific work in this area.

Exploiting Different Levels of Parallelism In the Biological Sequence Comparison Problem

sarc-ip.org

In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse-and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multiprocessor machine like the current IBM Cell BE.

Scalable multicore architectures for long DNA sequence comparison

Concurrency and Computation: Practice and Experience, 2011

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.