Database of homology-derived protein structures and the structural meaning of sequence alignment - PubMed (original) (raw)
Database of homology-derived protein structures and the structural meaning of sequence alignment
C Sander et al. Proteins. 1991.
Free article
Abstract
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.
Similar articles
- The HSSP database of protein structure-sequence alignments.
Sander C, Schneider R. Sander C, et al. Nucleic Acids Res. 1994 Sep;22(17):3597-9. Nucleic Acids Res. 1994. PMID: 7937066 Free PMC article. - Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures.
Ouzounis C, Sander C, Scharf M, Schneider R. Ouzounis C, et al. J Mol Biol. 1993 Aug 5;232(3):805-25. doi: 10.1006/jmbi.1993.1433. J Mol Biol. 1993. PMID: 8355272 - Secondary structure prediction and protein design.
Garnier J, Levin JM, Gibrat JF, Biou V. Garnier J, et al. Biochem Soc Symp. 1990;57:11-24. Biochem Soc Symp. 1990. PMID: 2099736 Review. - Searching protein structure databases has come of age.
Holm L, Sander C. Holm L, et al. Proteins. 1994 Jul;19(3):165-73. doi: 10.1002/prot.340190302. Proteins. 1994. PMID: 7937731 Review.
Cited by
- Assessing the role of evolutionary information for enhancing protein language model embeddings.
Erckert K, Rost B. Erckert K, et al. Sci Rep. 2024 Sep 5;14(1):20692. doi: 10.1038/s41598-024-71783-8. Sci Rep. 2024. PMID: 39237735 Free PMC article. - Systematic discovery of DNA-binding tandem repeat proteins.
Hu X, Zhang X, Sun W, Liu C, Deng P, Cao Y, Zhang C, Xu N, Zhang T, Zhang YE, Liu JG, Wang H. Hu X, et al. Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710. Nucleic Acids Res. 2024. PMID: 39189466 Free PMC article. - SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects.
Ferrer Florensa A, Almagro Armenteros JJ, Nielsen H, Aarestrup FM, Clausen PTLC. Ferrer Florensa A, et al. NAR Genom Bioinform. 2024 Aug 16;6(3):lqae106. doi: 10.1093/nargab/lqae106. eCollection 2024 Sep. NAR Genom Bioinform. 2024. PMID: 39157582 Free PMC article. - Rational Approach toward COVID-19's Main Protease Inhibitors: A Hierarchical Biochemoinformatics Analysis.
Bastos RS, de Aguiar CPO, Cruz JN, Ramos RS, Kimani NM, de Souza JSN, Chaves MH, de Freitas HF, Pita SSR, Santos CBRD. Bastos RS, et al. Int J Mol Sci. 2024 Jun 18;25(12):6715. doi: 10.3390/ijms25126715. Int J Mol Sci. 2024. PMID: 38928422 Free PMC article. - Identification, classification, and characterization of alpha and beta subunits of LVP1 protein from the venom gland of four Iranian scorpion species.
Salabi F, Vazirianzadeh B, Baradaran M. Salabi F, et al. Sci Rep. 2023 Dec 14;13(1):22277. doi: 10.1038/s41598-023-49556-6. Sci Rep. 2023. PMID: 38097679 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources