Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features - PubMed (original) (raw)
Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features
Timo Lassmann et al. Nucleic Acids Res. 2009 Feb.
Abstract
In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/).
Figures
Figure 1.
Running time of several multiple alignment methods on four scenarios with simulated alignments of varying evolutionary distance (PAM = 100 and PAM = 250), increasing sequence length (L = 10–2000), and number (N = 10–1500). For each case one parameter was varied (_x_-axis) while two parameters were kept constant (plot heading). Kalign2 scales much better than most of the methods, especially with increasing number of sequences. All tests were carried out on an AMD64 3200+ processor with 2GB of RAM running Linux.
Figure 2.
Accuracy on RNA alignments using the SPS score. Boxplots for the accuracy measured using the Bralibase2.1 benchmark set. (A) Alignments with an average pairwise sequence identity (APSI) <40%. (**B**) Alignments with an APSI >40%. Kalign2 was the most accurate method, especially in regions with low APSI.
Figure 3.
External feature alignment using protein secondary structure generally improves accuracy on the Balibase benchmark. An increase in the SPS score is seen mostly for cases with high structural coverage.
Similar articles
- KalignP: improved multiple sequence alignments using position specific gap penalties in Kalign2.
Shu N, Elofsson A. Shu N, et al. Bioinformatics. 2011 Jun 15;27(12):1702-3. doi: 10.1093/bioinformatics/btr235. Epub 2011 Apr 19. Bioinformatics. 2011. PMID: 21505030 Free PMC article. - Kalign--an accurate and fast multiple sequence alignment algorithm.
Lassmann T, Sonnhammer EL. Lassmann T, et al. BMC Bioinformatics. 2005 Dec 12;6:298. doi: 10.1186/1471-2105-6-298. BMC Bioinformatics. 2005. PMID: 16343337 Free PMC article. - Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment.
Lassmann T, Sonnhammer EL. Lassmann T, et al. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W596-9. doi: 10.1093/nar/gkl191. Nucleic Acids Res. 2006. PMID: 16845078 Free PMC article. - Protein multiple sequence alignment benchmarking through secondary structure prediction.
Le Q, Sievers F, Higgins DG. Le Q, et al. Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840. Bioinformatics. 2017. PMID: 28093407 Free PMC article. - PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences.
Sahraeian SM, Yoon BJ. Sahraeian SM, et al. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W8-12. doi: 10.1093/nar/gkr244. Epub 2011 Apr 22. Nucleic Acids Res. 2011. PMID: 21515632 Free PMC article.
Cited by
- Apprehending the NAD+-ADPr-Dependent Systems in the Virus World.
Iyer LM, Burroughs AM, Anantharaman V, Aravind L. Iyer LM, et al. Viruses. 2022 Sep 7;14(9):1977. doi: 10.3390/v14091977. Viruses. 2022. PMID: 36146784 Free PMC article. - ALOG domains: provenance of plant homeotic and developmental regulators from the DNA-binding domain of a novel class of DIRS1-type retroposons.
Iyer LM, Aravind L. Iyer LM, et al. Biol Direct. 2012 Nov 12;7:39. doi: 10.1186/1745-6150-7-39. Biol Direct. 2012. PMID: 23146749 Free PMC article. - Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteins.
Iyer LM, Abhiman S, Maxwell Burroughs A, Aravind L. Iyer LM, et al. Mol Biosyst. 2009 Dec;5(12):1636-60. doi: 10.1039/b917682a. Epub 2009 Oct 13. Mol Biosyst. 2009. PMID: 20023723 Free PMC article. - Evolutionarily ancient BAH-PHD protein mediates Polycomb silencing.
Wiles ET, McNaught KJ, Kaur G, Selker JML, Ormsby T, Aravind L, Selker EU. Wiles ET, et al. Proc Natl Acad Sci U S A. 2020 May 26;117(21):11614-11623. doi: 10.1073/pnas.1918776117. Epub 2020 May 11. Proc Natl Acad Sci U S A. 2020. PMID: 32393638 Free PMC article. - Profile Comparer Extended: phylogeny of lytic polysaccharide monooxygenase families using profile hidden Markov model alignments.
Voshol GP, Punt PJ, Vijgenboom E. Voshol GP, et al. F1000Res. 2019 Oct 31;8:1834. doi: 10.12688/f1000research.21104.1. eCollection 2019. F1000Res. 2019. PMID: 31956399 Free PMC article.
References
- Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 2001;270:17–30. - PubMed
- Notredame C. Recent progress in multiple sequence alignment: a survey. Pharmacogenomics. 2002;3:131–144. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources