Large-scale comparison of protein sequence alignment algorithms with structure alignments - PubMed (original) (raw)
Comparative Study
Large-scale comparison of protein sequence alignment algorithms with structure alignments
J M Sauder et al. Proteins. 2000.
Abstract
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling.
Copyright 2000 Wiley-Liss, Inc.
Similar articles
- Accuracy of structure-based sequence alignment of automatic methods.
Kim C, Lee B. Kim C, et al. BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355. BMC Bioinformatics. 2007. PMID: 17883866 Free PMC article. - A comparison of scoring functions for protein sequence profile alignment.
Edgar RC, Sjölander K. Edgar RC, et al. Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12. Bioinformatics. 2004. PMID: 14962936 - Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.
Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM. Wallqvist A, et al. Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988. Bioinformatics. 2000. PMID: 11159310 - Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Altschul SF, et al. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389. Nucleic Acids Res. 1997. PMID: 9254694 Free PMC article. Review. - Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF. Schäffer AA, et al. Nucleic Acids Res. 2001 Jul 15;29(14):2994-3005. doi: 10.1093/nar/29.14.2994. Nucleic Acids Res. 2001. PMID: 11452024 Free PMC article. Review.
Cited by
- Comparative Protein Structure Modeling Using MODELLER.
Webb B, Sali A. Webb B, et al. Curr Protoc Bioinformatics. 2016 Jun 20;54:5.6.1-5.6.37. doi: 10.1002/cpbi.3. Curr Protoc Bioinformatics. 2016. PMID: 27322406 Free PMC article. - A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm.
Shindyalov IN, Bourne PE. Shindyalov IN, et al. Nucleic Acids Res. 2001 Jan 1;29(1):228-9. doi: 10.1093/nar/29.1.228. Nucleic Acids Res. 2001. PMID: 11125099 Free PMC article. - Analysis of protein sequence/structure similarity relationships.
Gan HH, Perlow RA, Roy S, Ko J, Wu M, Huang J, Yan S, Nicoletta A, Vafai J, Sun D, Wang L, Noah JE, Pasquali S, Schlick T. Gan HH, et al. Biophys J. 2002 Nov;83(5):2781-91. doi: 10.1016/s0006-3495(02)75287-9. Biophys J. 2002. PMID: 12414710 Free PMC article. - Protein structural similarity search by Ramachandran codes.
Lo WC, Huang PJ, Chang CH, Lyu PC. Lo WC, et al. BMC Bioinformatics. 2007 Aug 23;8:307. doi: 10.1186/1471-2105-8-307. BMC Bioinformatics. 2007. PMID: 17716377 Free PMC article. - Using structure to explore the sequence alignment space of remote homologs.
Kuziemko A, Honig B, Petrey D. Kuziemko A, et al. PLoS Comput Biol. 2011 Oct;7(10):e1002175. doi: 10.1371/journal.pcbi.1002175. Epub 2011 Oct 6. PLoS Comput Biol. 2011. PMID: 21998567 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials