PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology - PubMed (original) (raw)

PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology

Per Eystein Saebø et al. Nucleic Acids Res. 2005.

Abstract

PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith-Waterman and ParAlign. The ParAlign algorithm is similar to Smith-Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known as multimedia technology that is available in modern processors, but rarely used by other bioinformatics software, has been exploited to achieve the high speed. The software is also designed to run efficiently on computer clusters using the message-passing interface standard. A public search service powered by a large computer cluster has been set-up and is freely available at www.paralign.org, where the major public databases can be searched. The software can also be downloaded free of charge for academic use.

PubMed Disclaimer

Figures

Figure 1

The PARALIGN home page at

contains the search form where the query sequence is entered and the database and the search parameters are selected. Clicking on a question mark opens a window with detailed help for each field.

Figure 2

The search results include a graphical overview of the hits, a list of matches and the sequence alignments. In the graphical overview, the position of the matches relative to the query sequence is indicated with lines coloured according to the _E_-value of the alignment. Hypertext links are provided to further sequence information.

Figure 3

The data flow for distributed searches on the computer cluster is illustrated in this diagram. Database sequences are loaded directly from a file server into memory on each node. The query sequence and the search parameters are transferred from the user, via the web server and queuing system to the nodes. Search results from each node are collected by the first node which then generates the final output that is presented to the user.

Cited by

An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. Lallemand T, et al. Genes (Basel). 2020 Sep 4;11(9):1046. doi: 10.3390/genes11091046. Genes (Basel). 2020. PMID: 32899740 Free PMC article. Review.
SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences.
Rucci E, Garcia C, Botella G, De Giusti A, Naiouf M, Prieto-Matias M. Rucci E, et al. BMC Syst Biol. 2018 Nov 20;12(Suppl 5):96. doi: 10.1186/s12918-018-0614-6. BMC Syst Biol. 2018. PMID: 30458766 Free PMC article.
In Silico Design and Experimental Validation of siRNAs Targeting Conserved Regions of Multiple Hepatitis C Virus Genotypes.
ElHefnawi M, Kim T, Kamar MA, Min S, Hassan NM, El-Ahwany E, Kim H, Zada S, Amer M, Windisch MP. ElHefnawi M, et al. PLoS One. 2016 Jul 21;11(7):e0159211. doi: 10.1371/journal.pone.0159211. eCollection 2016. PLoS One. 2016. PMID: 27441640 Free PMC article.
PSimScan: algorithm and utility for fast protein similarity search.
Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. Kaznadzey A, et al. PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505522 Free PMC article.
MODOMICS: a database of RNA modification pathways--2013 update.
Machnicka MA, Milanowska K, Osman Oglou O, Purta E, Kurkowska M, Olchowik A, Januszewski W, Kalinowski S, Dunin-Horkawicz S, Rother KM, Helm M, Bujnicki JM, Grosjean H. Machnicka MA, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D262-7. doi: 10.1093/nar/gks1007. Epub 2012 Oct 30. Nucleic Acids Res. 2013. PMID: 23118484 Free PMC article.

References

1. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
1. Pearson W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990;183:63–98. - PubMed
1. Rognes T., Seeberg E. Six-fold speed-up of Smith–Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics. 2000;16:699–706. - PubMed
1. Rognes T. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Res. 2001;29:1647–1652. - PMC - PubMed
1. Smith T.F., Waterman M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981;147:195–197. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
Research Materials
- NCI CPTC Antibody Characterization Program

PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology - PubMed (original) (raw)