ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches - PubMed (original) (raw)

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

T Rognes. Nucleic Acids Res. 2001.

Abstract

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

PubMed Disclaimer

Figures

Figure 1

Computation of the diagonal scores using SIMD technology. Computation of the diagonal scores is performed efficiently in the order indicated using bands that are 32 diagonals wide.

Figure 2

Computation of the estimated gapped alignment score. The numbers within the matrix are the temporary scores (_e_i) along the diagonals from the initial computation of the diagonal scores. The numbers directly outside the matrix are the optimal ungapped alignment scores (_S_d) for each diagonal. The second numbers outside the matrix are the temporary scores (_u_d) used in the calculation of the estimated gapped alignment score (T), which in this example is 3 + 11 + 1 + 22 = 37. The BLOSUM62 matrix was used in combination with the parameters q = 11, r = 1 and c = 3 in this example. The calculations were performed in order of increasing diagonal numbers.

Figure 3

Comparison of database search sensitivity and selectivity. The sensitivity (coverage) versus the selectivity (EPQ) is plotted for a range of database search programs using either (A) the BLOSUM50 matrix and a 10 + 2_k_ gap penalty or (B) the BLOSUM62 matrix and a 11 + k gap penalty.

Figure 3

Figure 4

Comparison of database search speed. Search time versus query sequence length is plotted for the different search programs and the 11 query sequences (see Results). The search time used is the total CPU time of the fastest of three consecutive runs on a minimally loaded computer. With a database of only 29 and 128 MB of RAM, all of the database was cached in the computer’s RAM; disk reading time should then be negligible.

Cited by

Human DNA glycosylases of the bacterial Fpg/MutM superfamily: an alternative pathway for the repair of 8-oxoguanine and other oxidation products in DNA.
Morland I, Rolseth V, Luna L, Rognes T, Bjørås M, Seeberg E. Morland I, et al. Nucleic Acids Res. 2002 Nov 15;30(22):4926-36. doi: 10.1093/nar/gkf618. Nucleic Acids Res. 2002. PMID: 12433996 Free PMC article.
Overexpression of the LexA-regulated tisAB RNA in E. coli inhibits SOS functions; implications for regulation of the SOS response.
Weel-Sneve R, Bjørås M, Kristiansen KI. Weel-Sneve R, et al. Nucleic Acids Res. 2008 Nov;36(19):6249-59. doi: 10.1093/nar/gkn633. Epub 2008 Oct 1. Nucleic Acids Res. 2008. PMID: 18832374 Free PMC article.
PSimScan: algorithm and utility for fast protein similarity search.
Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. Kaznadzey A, et al. PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505522 Free PMC article.
The human ortholog of the rodent testis-specific ABC transporter Abca17 is a ubiquitously expressed pseudogene (ABCA17P) and shares a common 5' end with ABCA3.
Piehler AP, Wenzel JJ, Olstad OK, Haug KB, Kierulf P, Kaminski WE. Piehler AP, et al. BMC Mol Biol. 2006 Sep 12;7:28. doi: 10.1186/1471-2199-7-28. BMC Mol Biol. 2006. PMID: 16968533 Free PMC article.
Exploring the utility of cross-laboratory RAD-sequencing datasets for phylogenetic analysis.
Gonen S, Bishop SC, Houston RD. Gonen S, et al. BMC Res Notes. 2015 Jul 8;8:299. doi: 10.1186/s13104-015-1261-2. BMC Res Notes. 2015. PMID: 26152111 Free PMC article.

References

1. Smith T.F. and Waterman,M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197. - PubMed
1. Pearson W.R. and Lipman,D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448. - PMC - PubMed
1. Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
1. Altschul S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. - PMC - PubMed
1. Hughey R. (1996) Parallel hardware for sequence comparison and alignment. Comput. Appl. Biosci., 12, 473–479. - PubMed

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches - PubMed (original) (raw)