Efficient Parallelization of a Protein Sequence Comparison Algorithm on Manycore Architecture (original) (raw)

Design and implementation of a parallel architecture for biological sequence comparison

Lecture Notes in Computer Science, 1996

New generations of scienti c codes trend to mix di erent types of parallelism. Algorithms are de ned as a set of modules, with data parallelism inside modules and task parallelism between them. With high speed networks, tasks running on a heterogeneous computing environment can exchange data in a reasonable delay. Therefore dataparallel tasks distributed on di erent parallel computers can interact e ciently by reading or writing Data Parallel Objects. These objects are distributed on the physical nodes according to the mapping directives. Migrations of data parallel objects from one parallel computer to another lead us to de ne e cient algorithms for runtime array redistribution. In this work, we have specially cared about the ability to handle distinct source and target processor sets while performing redistribution and the ability to overlap communications and computations. Performance results on a farm of ALPHA processors are discussed.

Scalable multicore architectures for long DNA sequence comparison

Concurrency and Computation: Practice and Experience, 2011

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.

Long DNA Sequence Comparison on Multicore Architectures

Lecture Notes in Computer Science, 2010

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.

Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++

2011 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Biological Sequence Comparison is an important operation in Bioinformatics that is often used to relate organisms. Smith and Waterman proposed an exact algorithm (SW) that compares two sequences in quadratic time and space. Due to high computing and memory requirements, SW is usually executed on HPC platforms such as multicore clusters and CellBEs. Since HPC architectures exhibit very different hardware characteristics, porting an application between them is an error-prone time-consuming task. BSP++ is an implementation of BSP that aims to reduce the effort to write parallel code. In this paper, we propose and evaluate a parallel BSP++ strategy to execute SW in multiple platforms like MPI, OpenMP, MPI/OpenMP, CellBE and MPI/CellBE. The results obtained with real DNA sequences show that the performance of our versions is comparable to the ones in the literature, evidencing the appropriateness and flexibility of our approach.

Exploiting Different Levels of Parallelism In the Biological Sequence Comparison Problem

sarc-ip.org

In the last years the fast growth of bioinformatics field has atracted the attention of computer scientists. At the same time, de exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work, we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse-and fine-grain parallelism. As a case of analysis, we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm (SW), which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the application makes it ideal for architectures supporting multiple dimensions of parallelism (thread-level parallelism, TLP; data-level parallelism, DLP; instruction-level parallelism, ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, IBM Cell BE and MareNostrum machines. Our study includes a qualitative analysis of the parallelization opportunities and also the quantification of the performance in terms of speedup and execution time. These measures are collected taking into account the specific characteristics of each architecture. As an example, our results show that a share memory multiprocessor architecture (SMP) like the PowerPC 970MP of Marenostrum machine can surpasses a heterogeneous multiprocessor machine like the current IBM Cell BE.

Parallel protein sequence matching on multicore computers

Soft Computing and Pattern Recognition ( …, 2010

STRIKE was introduced and implemented to predict protein-protein interactions where proteins interact if they contain similar substrings of amino acids. On the yeast protein interaction literature, STRIKE was shown to improve upon the existing state-of-the-art methods for protein-protein interaction prediction. Herein, we describe the parallelization of STRIKE and its multithreaded implementation and performance enhancement on multicore systems. On large protein sequence sets, the execution time of a 16-thread implementation of this bioinformatics algorithm was reduced from about a week on a unithreaded implementation on a serial uniprocessor machine to 1.5 days on one quad core x86 machine, down to 4.5 hours on 8 such quad core machines. Key optimizations to the implementation are also discussed.

Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture

Parallel Computing, 2011

Current computer engineering evolves at an accelerated pace, with hardware advancing towards new chip multiprocessors (CMP) architectures and with supporting software gearing towards new programming and abstraction paradigms, to obtain the maximum efficiency of the hardware at a low cost. In this context, Tilera Corporation has developed a brand new CMP architecture with 64 cores (tiles) called Tile64, and has launched several Peripheral Component Interconnect Express (PCIe) cards to be used and monitored from a host Personal Computer (PC). These cards may execute parallel applications built in C/ C++ and compiled with the Tile-GCC compiler. We have previously demonstrated the usefulness of the Tile64 architecture for bioinformatics [S. Gálvez, D. Díaz, P. Hernández, F.J. Esteban, J.A. Caballero, G. Dorado, Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment, Bioinformatics, 26 (2010) 683-686]. We have chosen a bioinformatics algorithm to test this many-core Tile64 architecture because of actual bioinformatics challenging needs: data-intensive workloads, space and time-consuming requirements and massive calculation. This algorithm, known as Needleman-Wunsch/Smith-Waterman (NW/SW), obtains an optimal sequence alignment in quadratic time and space cost, yet requires to be optimized to take full advantage of computing parallelization. In this paper we redesign, implement and finetune this algorithm, introducing key optimizations and changes that take advantage of specific Tile64 characteristics: RISC architecture, local tile's cache, length of memory word, shared memory usage, RAM file system, tile's intercommunication and job selection from a pool. The resulting algorithm -named MC64-NW/SW for Multicore64 Needleman-Wunsch/Smith-Waterman -achieves a gain of $1000% when compared with the same algorithm on a Â86 multi-core architecture. As far as we know, our NW/SW implementation is the fastest ever published for a standalone PC when aligning a pair of sequences larger than 20 kb.

6 Sequence Alignment Application Model For Multi And Manycore A

Exponential growth in biological sequence data combined with the computationally intensive nature of bioinformatics applications results in a continuously rising demand for computational power. In this paper, we propose a performance model that captures the behavior and performance scalability of HMMER, a bioinformatics application that identifies similarities between protein sequences and a protein family model. With our analytical model, the optimal master-worker ratio for any specific user scenario can be estimated. The model is evaluated and is found accurate with error lower than 2%. We applied our model to a widely used heterogeneous multicore architecture, the Cell BE, using the PPE and SPEs as master and workers respectively. Experimental results show that for the current parallelization strategy, the I/O speed to read the database from the disk and the inputs pre-processing are the two most limiting factors in the Cell BE case.

Many-Core Processor Bioinformatics and Next-Generation Sequencing

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2012

The new massive DNA sequencing methods demand both computer hardware and bioinformatics software capable of handling huge amounts of data. This paper shows how the many-core processors (in which each core can execute a whole operating system) can be exploited to address problems which previously required expensive supercomputers. Thus, the Needleman-Wunsch/Smith-Waterman pairwise alignments will be described using long DNA sequences (>100 kb), including the implications for progressive multiple alignments. Likewise, assembling algorithms used to generate contigs on sequencing projects (therefore, using short sequences) and the future in peptide (protein) folding computing methods will be also described. Our study also integrates the last trends in many-core processors and their applications in the field of bioinformatics.