Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster (original) (raw)

A distributed CPU-GPU framework for pairwise alignments on large-scale sequence datasets

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, 2013

Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of graphics processing units (GPUs) makes them an appealing choice for accelerating these computations. As such, CPU-GPU clusters can enable all-againstall comparisons on large datasets.

Accurate Sequence Alignment using Distributed Filtering on GPU Clusters

2011

Abstract: Advent of next generation gene sequencing machines has led to computationally intensive alignment problems that can take many hours on a modern computer. Considering the fast increasing rate of introduction of new short sequences that are sequenced, the large number of existing sequences and inaccuracies in the sequencing machines, short sequence alignment has become a major challenge in High Performance Computing.

Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space

Journal of Signal Processing Systems, 2014

Several problems in computational biology require the all-against-all pairwise comparisons of tens of thousands of individual biological sequences. Each such comparison can be performed with the well-known Needleman-Wunsch alignment algorithm. However, with the rapid growth of biological databases, performing all possible comparisons with this algorithm in serial becomes extremely time-consuming. The massive computational power of graphics processing units (GPUs) makes them an appealing choice for accelerating these computations. As such, CPU-GPU clusters can enable all-against-all comparisons on large datasets. In this work, we present four GPU implementations for large-scale pairwise sequence alignment: TiledDScan-mNW, DScan-mNW, RScan-mNW and LazyRScan-mNW. The proposed GPU kernels exhibit different parallelization patterns: we discuss how each parallelization strategy affects the memory accesses and the utilization of the underlying GPU hardware. We evaluate our implementations on a variety of low-and high-end GPUs with different compute capabilities. Our results show that all the proposed solutions outperform the existing open-source implementation from the Rodinia Benchmark Suite, and LazyRScan-mNW is the preferred solution for applications that require performing the traceback operation only on a subset of the considered sequence pairs (for example, the pairs whose alignment score exceeds a predefined threshold). Finally, we discuss the integration of the proposed GPU kernels into a hybrid MPI-CUDA framework for deployment on CPU-GPU clusters. In particular, our proposed distributed design targets both homogeneous and heterogeneous clusters with nodes that differ amongst themselves in their hardware configuration.

Accelerating Smith-Waterman Local Sequence Alignment on GPU Cluster

Proceedings of the Annual International Conference on Advances in Distributed and Parallel Computing ADPC 2010 ADPC 2010, 2010

With a high accuracy, the Smith-Waterman local sequence alignment algorithm requires a very large amount of memory and computation, making implementations on common computing systems become less practical. In this paper, we present swGPUCluster-an implementation of the Smith-Waterman algorithm on a cluster equipped with NVIDIA GPU graphics cards (called GPU Cluster). Our test was performed on a cluster of two nodes, one node is equipped with a dual graphics card NVIDIA GeForce GTX 295, a Tesla C1060 card, and the remaining node is equipped with 2 dual graphics cards NVIDIA GeForce GTX 295. Results show that the performance has increased significantly compared with the previous best implementations such as SWPS3 or CUDASW++. swGPUCluster's performance has increased along with the lengths of query sequences, from 37.328 GCUPS to 46.706 GCUPS. These results demonstrate the great computing power of graphics cards and their high applicability in solving bioinformatics problems.