Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs (original) (raw)

DNA sequences alignment in multi-GPUs: acceleration and energy payoff

BMC Bioinformatics, 2018

Background: We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal pairwise alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method. Results: Our study includes acceleration factors, performance, scalability, power efficiency and energy costs. We also quantify the influence of the contents of the compared sequences, identify potential scenarios for energy savings on speculative executions, and calculate performance and energy usage differences among distinct GPU generations and models. For a sequence alignment on chromosome-wide scale (around 2 Petacells), we are able to reduce execution times from 9.5 h on a Kepler GPU to just 2.5 h on a Pascal counterpart, with energy costs cut by 60%. Conclusions: We find GPUs to be an order of magnitude ahead in performance per watt compared to Xeon Phis. Finally, versus typical low-power devices like FPGAs, GPUs keep similar GFLOPS/w ratios in 2017 on a five times faster execution.

Long DNA Sequence Comparison on Multicore Architectures

Lecture Notes in Computer Science, 2010

Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the growth of biological databases, sequence comparison is becoming an important challenge for high performance computing, especially when very long sequences are compared. The Smith-Waterman (SW) algorithm is an exact method based on dynamic programming to quantify local similarity between sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). In this work, we show how long sequences comparison takes advantage of current and future multicore architectures. We analyze two different SW implementations on the CellBE and use simulation tools to study the performance scalability in a multicore architecture. We study the memory organization that delivers the maximum bandwidth with the minimum cost. Our results show that a heterogeneous architecture is an valid alternative to execute challenging bioinformatic workloads.

MASA‐OpenCL: Parallel pruned comparison of long DNA sequences with OpenCL

Concurrency and Computation: Practice and Experience, 2018

Biological sequence comparison is often used as an auxiliary task in the analysis of genetic material. Pairwise comparison algorithms like Smith-Waterman evaluate two strings representing sequences of proteins, DNA or RNA to obtain optimal alignment between them. Many applications have been proposed to address the sequence comparison problem, prioritizing the use of graphics cards and proprietary languages such as CUDA. In this paper, we propose and evaluate MASA-OpenCL, an OpenCL solution for comparing long DNA sequences that is based on the MASA sequence alignment framework, with pruning capability proportional to the similarity of the sequences compared. The results of MASA-OpenCL were compared to its CUDA counterpart (MASA-CUDAlign) and, in most cases, MASA-OpenCL achieved better performance. In order to better understand the behavior of MASA-OpenCL, we performed a statistical analysis considering 11 comparisons of sequences with high, medium and low similarity in 4 GPUs. As a result, we obtained a multiple linear regression model that considers (a) the sizes of the sequences, (b) the similarity between them, (c) the computational power of the GPU, and (d) the GPU memory bandwidth. We used this model to predict the performance in two other GPUs, with low error rates.

SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

BMC Systems Biology, 2018

Background: The Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi accelerators and Field Programmable Gate Arrays (FGPAs) to speed up large-scale workloads. Results: This paper presents and evaluates SWIFOLD: a Smith-Waterman parallel Implementation on FPGA with OpenCL for Long DNA sequences. First, we evaluate its performance and resource usage for different kernel configurations. Next, we carry out a performance comparison between our tool and other state-of-the-art implementations considering three different datasets. SWIFOLD offers the best average performance for small and medium test sets, achieving a performance that is independent of input size and sequence similarity. In addition, SWIFOLD provides competitive performance rates in comparison with GPU-based implementations on the latest GPU generation for the large dataset. Conclusions: The results suggest that SWIFOLD can be a serious contender for accelerating the SW alignment of DNA sequences of unrestricted size in an affordable way reaching on average 125 GCUPS and almost a peak of 270 GCUPS.

CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters

IEEE Transactions on Parallel and Distributed Systems, 2016

This paper proposes and evaluates CUDAlign 4.0, a parallel strategy to obtain the optimal alignment of huge DNA sequences in multi-GPU platforms, using the exact Smith-Waterman (SW) algorithm. In the first phase of CUDAlign 4.0, a huge Dynamic Programming (DP) matrix is computed by multiple GPUs, which asynchronously communicate border elements to the right neighbor in order to find the optimal score. After that, the traceback phase of SW is executed. The efficient parallelization of the traceback phase is very challenging because of the high amount of data dependency, which particularly impacts the performance and limits the application scalability. In order to obtain a multi-GPU highly parallel traceback phase, we propose and evaluate a new parallel traceback algorithm called Incremental Speculative Traceback (IST), which pipelines the traceback phase, speculating incrementally over the values calculated so far, producing results in advance. With CUDAlign 4.0, we were able to calculate SW matrices with up to 60 Peta cells, obtaining the optimal local alignments of all Human and Chimpanzee homologous chromosomes, whose sizes range from 26 Millions of Base Pairs (MBP) up to 249 MBP. As far as we know, this is the first time such comparison was made with the SW exact method. We also show that the IST algorithm is able to reduce the traceback time from 2.15⇥ up to 21.03⇥, when compared with the baseline traceback algorithm. The human⇥chimpanzee chromosome 5 comparison (180 MBP⇥183 MBP) attained 10,370.00 GCUPS (Billions of Cells Updated per Second) using 384 GPUs, with a speculation hit ratio of 98.2%.

Scalable multicore architectures for long DNA sequence comparison

Concurrency and Computation: Practice and Experience, 2011

Applying GPUs for Smith-Waterman Sequence Alignment Acceleration

GSTF INTERNATIONAL JOURNAL ON COMPUTING, 2011

The Smith-Waterman algorithm is a common local sequence alignment method which gives a high accuracy. However, it needs a high capacity of computation and a large amount of storage memory, so implementations based on common computing systems are impractical. Here, we present our implementation of the Smith-Waterman algorithm on a cluster including graphics cards (GPU cluster)-swGPUCluster. The algorithm implementation is tested on a cluster of two nodes: a node is equipped with two dual graphics cards NVIDIA GeForce GTX 295, the other node includes a dual graphics cards NVIDIA GeForce 295 and a Tesla C1060 card. Depending on the length of query sequences, the swGPUCluster performance increases from 37.33 GCUPS to 46.71 GCUPS. This result demonstrates the great computing power of GPUs and their high applicability in the bioinformatics field.

GenCodex-A Novel Algorithm for Compressing DNA sequences on Multi-cores and GPUs

Abstract—The DNA sequences are huge in size and the databases are growing at an exponential rate. For example, the human genome in raw format ranges from 2 to 30 Tera-bytes. The main reason for this is the invention of new species and increasing number of DNA profiles. The growth of the DNA affects the storage as well as bandwidth when these sequences need to be transferred. Applications such as DNA profiling, Real time DNA crime investigation require access to the DNA sequences in real time.

Hybrid Framework for pairwise DNA Sequence Alignment Using the CUDA compatible GPU

2014

This paper provides a novel framework for accelerating the solution of the pairwise DNA sequence alignment problem using CUDA parallel paradigm available on the NVIDIA GPU. The main idea is to implement a new algorithm that assigns different nucleotide weights using GPU architectures then merge the subsequences of match using CPU to get the optimum local alignment. The paper describes both the algorithm and the implementation of it using both the GPU and CPU to constitute a hybrid model for solving DNA sequence alignment problem on DNA molecules. Experimental results demonstrate a considerable reduction in run time relative to traditional Smith-Waterman implementation on traditional processors. Keywords— GPU, GPGPU, CUDA, sequence alignment algorithms, molecular biology.

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs (original) (raw)

Related papers