PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology - PubMed (original) (raw)
PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology
Per Eystein Saebø et al. Nucleic Acids Res. 2005.
Abstract
PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith-Waterman and ParAlign. The ParAlign algorithm is similar to Smith-Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known as multimedia technology that is available in modern processors, but rarely used by other bioinformatics software, has been exploited to achieve the high speed. The software is also designed to run efficiently on computer clusters using the message-passing interface standard. A public search service powered by a large computer cluster has been set-up and is freely available at www.paralign.org, where the major public databases can be searched. The software can also be downloaded free of charge for academic use.
Figures
Figure 1
The PARALIGN home page at
contains the search form where the query sequence is entered and the database and the search parameters are selected. Clicking on a question mark opens a window with detailed help for each field.
Figure 2
The search results include a graphical overview of the hits, a list of matches and the sequence alignments. In the graphical overview, the position of the matches relative to the query sequence is indicated with lines coloured according to the _E_-value of the alignment. Hypertext links are provided to further sequence information.
Figure 3
The data flow for distributed searches on the computer cluster is illustrated in this diagram. Database sequences are loaded directly from a file server into memory on each node. The query sequence and the search parameters are transferred from the user, via the web server and queuing system to the nodes. Search results from each node are collected by the first node which then generates the final output that is presented to the user.
Similar articles
- SIMAP: the similarity matrix of proteins.
Rattei T, Arnold R, Tischler P, Lindner D, Stümpflen V, Mewes HW. Rattei T, et al. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D252-6. doi: 10.1093/nar/gkj106. Nucleic Acids Res. 2006. PMID: 16381858 Free PMC article. - PROMPT: a protein mapping and comparison tool.
Schmidt T, Frishman D. Schmidt T, et al. BMC Bioinformatics. 2006 Jul 4;7:331. doi: 10.1186/1471-2105-7-331. BMC Bioinformatics. 2006. PMID: 16817977 Free PMC article. - GenoMiner: a tool for genome-wide search of coding and non-coding conserved sequence tags.
Castrignanò T, De Meo PD, Grillo G, Liuni S, Mignone F, Talamo IG, Pesole G. Castrignanò T, et al. Bioinformatics. 2006 Feb 15;22(4):497-9. doi: 10.1093/bioinformatics/bti754. Epub 2005 Nov 2. Bioinformatics. 2006. PMID: 16267081 - Graphical design of primers with PerlPrimer.
Marshall O. Marshall O. Methods Mol Biol. 2007;402:403-14. doi: 10.1007/978-1-59745-528-2_21. Methods Mol Biol. 2007. PMID: 17951808 Review. - Finding homologs to nucleotide sequences using network BLAST searches.
Ladunga I. Ladunga I. Curr Protoc Bioinformatics. 2002 Aug;Chapter 3:Unit 3.3. doi: 10.1002/0471250953.bi0303s00. Curr Protoc Bioinformatics. 2002. PMID: 18792938 Review.
Cited by
- An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. Lallemand T, et al. Genes (Basel). 2020 Sep 4;11(9):1046. doi: 10.3390/genes11091046. Genes (Basel). 2020. PMID: 32899740 Free PMC article. Review. - SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences.
Rucci E, Garcia C, Botella G, De Giusti A, Naiouf M, Prieto-Matias M. Rucci E, et al. BMC Syst Biol. 2018 Nov 20;12(Suppl 5):96. doi: 10.1186/s12918-018-0614-6. BMC Syst Biol. 2018. PMID: 30458766 Free PMC article. - In Silico Design and Experimental Validation of siRNAs Targeting Conserved Regions of Multiple Hepatitis C Virus Genotypes.
ElHefnawi M, Kim T, Kamar MA, Min S, Hassan NM, El-Ahwany E, Kim H, Zada S, Amer M, Windisch MP. ElHefnawi M, et al. PLoS One. 2016 Jul 21;11(7):e0159211. doi: 10.1371/journal.pone.0159211. eCollection 2016. PLoS One. 2016. PMID: 27441640 Free PMC article. - PSimScan: algorithm and utility for fast protein similarity search.
Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. Kaznadzey A, et al. PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505522 Free PMC article. - MODOMICS: a database of RNA modification pathways--2013 update.
Machnicka MA, Milanowska K, Osman Oglou O, Purta E, Kurkowska M, Olchowik A, Januszewski W, Kalinowski S, Dunin-Horkawicz S, Rother KM, Helm M, Bujnicki JM, Grosjean H. Machnicka MA, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D262-7. doi: 10.1093/nar/gks1007. Epub 2012 Oct 30. Nucleic Acids Res. 2013. PMID: 23118484 Free PMC article.
References
- Pearson W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. - PubMed
- Rognes T., Seeberg E. Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics. 2000;16:699–706. - PubMed
- Smith T.F., Waterman M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981;147:195–197. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials