Embedding strategies for effective use of information from multiple sequence alignments - PubMed (original) (raw)
Embedding strategies for effective use of information from multiple sequence alignments
S Henikoff et al. Protein Sci. 1997 Mar.
Abstract
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.
Similar articles
- Comparison of methods for searching protein sequence databases.
Pearson WR. Pearson WR. Protein Sci. 1995 Jun;4(6):1145-60. doi: 10.1002/pro.5560040613. Protein Sci. 1995. PMID: 7549879 Free PMC article. - Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues.
Anand B, Gowri VS, Srinivasan N. Anand B, et al. Bioinformatics. 2005 Jun 15;21(12):2821-6. doi: 10.1093/bioinformatics/bti432. Epub 2005 Apr 7. Bioinformatics. 2005. PMID: 15817691 - A comparison of position-specific score matrices based on sequence and structure alignments.
Panchenko AR, Bryant SH. Panchenko AR, et al. Protein Sci. 2002 Feb;11(2):361-70. doi: 10.1110/ps.19902. Protein Sci. 2002. PMID: 11790846 Free PMC article. - Sensitive methods for determining the relatedness of proteins with limited sequence homology.
Argos P. Argos P. Curr Opin Biotechnol. 1994 Aug;5(4):361-71. doi: 10.1016/0958-1669(94)90044-2. Curr Opin Biotechnol. 1994. PMID: 7765168 Review. - Scores for sequence searches and alignments.
Henikoff S. Henikoff S. Curr Opin Struct Biol. 1996 Jun;6(3):353-60. doi: 10.1016/s0959-440x(96)80055-8. Curr Opin Struct Biol. 1996. PMID: 8804821 Review.
Cited by
- Identification of novel mazEF/pemIK family toxin-antitoxin loci and their distribution in the Staphylococcus genus.
Bukowski M, Hyz K, Janczak M, Hydzik M, Dubin G, Wladyka B. Bukowski M, et al. Sci Rep. 2017 Oct 18;7(1):13462. doi: 10.1038/s41598-017-13857-4. Sci Rep. 2017. PMID: 29044211 Free PMC article. - Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.
Pearson WR, Li W, Lopez R. Pearson WR, et al. Nucleic Acids Res. 2017 Apr 20;45(7):e46. doi: 10.1093/nar/gkw1207. Nucleic Acids Res. 2017. PMID: 27923999 Free PMC article. - Site-Specific Amino Acid Preferences Are Mostly Conserved in Two Closely Related Protein Homologs.
Doud MB, Ashenberg O, Bloom JD. Doud MB, et al. Mol Biol Evol. 2015 Nov;32(11):2944-60. doi: 10.1093/molbev/msv167. Epub 2015 Jul 29. Mol Biol Evol. 2015. PMID: 26226986 Free PMC article. - Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.
Licciardello C, D'Agostino N, Traini A, Recupero GR, Frusciante L, Chiusano ML. Licciardello C, et al. BMC Plant Biol. 2014 Feb 3;14:39. doi: 10.1186/1471-2229-14-39. BMC Plant Biol. 2014. PMID: 24490620 Free PMC article. - Domain analysis of symbionts and hosts (DASH) in a genome-wide survey of pathogenic human viruses.
Gonzalez MW, Spouge JL. Gonzalez MW, et al. BMC Res Notes. 2013 May 24;6:209. doi: 10.1186/1756-0500-6-209. BMC Res Notes. 2013. PMID: 23706066 Free PMC article.
References
- Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9 - PubMed
- Methods Enzymol. 1996;266:198-212 - PubMed
- Protein Sci. 1994 Jan;3(1):139-46 - PubMed
- Comput Appl Biosci. 1994 Feb;10(1):19-29 - PubMed
- Curr Opin Struct Biol. 1996 Jun;6(3):361-5 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials