Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families - PubMed (original) (raw)
Comparative Study
Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families
Olga V Kalinina et al. Protein Sci. 2004 Feb.
Abstract
The increasing volume of genomic data opens new possibilities for analysis of protein function. We introduce a method for automated selection of residues that determine the functional specificity of proteins with a common general function (the specificity-determining positions [SDP] prediction method). Such residues are assumed to be conserved within groups of orthologs (that may be assumed to have the same specificity) and to vary between paralogs. Thus, considering a multiple sequence alignment of a protein family divided into orthologous groups, one can select positions where the distribution of amino acids correlates with this division. Unlike previously published techniques, the introduced method directly takes into account nonuniformity of amino acid substitution frequencies. In addition, it does not require setting arbitrary thresholds. Instead, a formal procedure for threshold selection using the Bernoulli estimator is implemented. We tested the SDP prediction method on the LacI family of bacterial transcription factors and a sample of bacterial water and glycerol transporters belonging to the major intrinsic protein (MIP) family. In both cases, the comparison with available experimental and structural data strongly supported our predictions.
Figures
Figure 1.
The phylogenetic tree of the analyzed proteins from the MIP family. Proteins of the AQP and the GLP training sets are in bold. Eukaryotic members of the MIP family are in bold and underlined.
Figure 2.
Candidate SDP for the LacI (A) and TreR (B) repressors. Effector molecules are shown by space filling and colored yellow; SDP are shown by space filling and colored by function: red, residues in close contact with DNA; green, residues in close contact with effectors; blue, residues in close contact with the other subunit; white, residues near the DNA-binding or effector-binding region but not satisfying the contact criteria (see the legend to Table 1); gray, overprediction (residues with no obvious function).
Figure 3.
Residues making close contacts with the effector (minimal distance <5 Å) in PurR, LacI, and TreR repressors (numbering as in PurR from E. coli).
Figure 4.
The Bernoulli estimator for the training set (17 bacterial MIP proteins). Horizontal axis: k, the number of accepted positions. Vertical axis: probability that there are at least k Z-scores Z ≥ Z k.
Figure 5.
Candidate SDP for GlpF from E. coli and for bovine AQP1. (A) Structure of GlpF from E. coli with three glycerol molecules (top). (B) Structure of bovine AQP1 with several water molecules in the channel (top). Substrate molecules are shown by space filling and colored green. Candidate SDP are shown by space filling and colored yellow if they form the channel and red if they may establish subunit interactions (see the text for discussion).
Figure 6.
The phylogenetic tree of the proteins from MIP family (after the realignment, both training and test sets are included). The branch colors indicate orthology relationships: blue, bidirectional best hits (BETs) of AqpZ E. coli AQPZ_ECOLI); green, true GlpF orthologs, that is, BETs of GlpF from E. coli for gram-negative bacteria and BETs of GlpF from Bacillus subtilis for gram-positive bacteria if their genes lie in operons related to the glycerol metabolism; light green, recent GlpF paralogs whose genes lie in operons involved in the glycerol metabolism; purple, proteins homologous to PduF from Salmonella typhimurium whose genes are located in the gene cluster related to the propanediol degradation; brown, glyceroaquaporins, that is, BETs of GLA from Lactococcus lactis; magenta, true paralogs, that is, GlpF homologs whose genes lie in operons with functions other than glycerol metabolism; orange, proteins with unresolved orthology relationships. The colors of the protein names indicate the protein specificity assigned by SDP profiles: blue, proteins selected by the AQP SDP profile (W _AQP_3); green, proteins selected by the GLP SDP profile (W _GLP_3). The names of the proteins from the training sets for AQP and GLP groups are in bold. Bold red, eukaryotic MIP proteins.
Figure 7.
The protein function predicted by the SDP profile score. (A) Average identity of the test proteins to the proteins from the AQP and GLP training sets. (B) Scores of the proteins from the test set computed using the AQP and GLP profiles. For the interpretation of colors, see the legend to Figure 6 ▶.
Similar articles
- Analysis and prediction of functional sub-types from protein sequence alignments.
Hannenhalli SS, Russell RB. Hannenhalli SS, et al. J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036. J Mol Biol. 2000. PMID: 11021970 - Phylogeny-independent detection of functional residues.
Pazos F, Rausell A, Valencia A. Pazos F, et al. Bioinformatics. 2006 Jun 15;22(12):1440-8. doi: 10.1093/bioinformatics/btl104. Epub 2006 Mar 21. Bioinformatics. 2006. PMID: 16551661 - Prediction of amino acid positions specific for functional groups in a protein family based on local sequence similarity.
Karasev DA, Veselovsky AV, Oparina NY, Filimonov DA, Sobolev BN. Karasev DA, et al. J Mol Recognit. 2016 Apr;29(4):159-69. doi: 10.1002/jmr.2515. Epub 2015 Nov 8. J Mol Recognit. 2016. PMID: 26549790 - Heterotachy and functional shift in protein evolution.
Philippe H, Casane D, Gribaldo S, Lopez P, Meunier J. Philippe H, et al. IUBMB Life. 2003 Apr-May;55(4-5):257-65. doi: 10.1080/1521654031000123330. IUBMB Life. 2003. PMID: 12880207 Review. - Using evolutionary information to find specificity-determining and co-evolving residues.
Kolesov G, Mirny LA. Kolesov G, et al. Methods Mol Biol. 2009;541:421-48. doi: 10.1007/978-1-59745-243-4_18. Methods Mol Biol. 2009. PMID: 19381538 Review.
Cited by
- A conserved hymenopteran-specific family of cytochrome P450s protects bee pollinators from toxic nectar alkaloids.
Haas J, Beck E, Troczka BJ, Hayward A, Hertlein G, Zaworra M, Lueke B, Buer B, Maiwald F, Beck ME, Nebelsiek B, Glaubitz J, Bass C, Nauen R. Haas J, et al. Sci Adv. 2023 Apr 14;9(15):eadg0885. doi: 10.1126/sciadv.adg0885. Epub 2023 Apr 12. Sci Adv. 2023. PMID: 37043574 Free PMC article. - Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons.
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Oliveira LS, et al. Viruses. 2023 Feb 13;15(2):519. doi: 10.3390/v15020519. Viruses. 2023. PMID: 36851733 Free PMC article. - Transfer of knowledge from model organisms to evolutionarily distant non-model organisms: The coral Pocillopora damicornis membrane signaling receptome.
Kumar L, Brenner N, Sledzieski S, Olaosebikan M, Roger LM, Lynn-Goin M, Klein-Seetharaman R, Berger B, Putnam H, Yang J, Lewinski NA, Singh R, Daniels NM, Cowen L, Klein-Seetharaman J. Kumar L, et al. PLoS One. 2023 Feb 3;18(2):e0270965. doi: 10.1371/journal.pone.0270965. eCollection 2023. PLoS One. 2023. PMID: 36735673 Free PMC article. - Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
Pascarelli S, Laurino P. Pascarelli S, et al. PLoS Comput Biol. 2022 Apr 4;18(4):e1010016. doi: 10.1371/journal.pcbi.1010016. eCollection 2022 Apr. PLoS Comput Biol. 2022. PMID: 35377869 Free PMC article. - Bioinformatic analysis of subfamily-specific regions in 3D-structures of homologs to study functional diversity and conformational plasticity in protein superfamilies.
Timonina D, Sharapova Y, Švedas V, Suplatov D. Timonina D, et al. Comput Struct Biotechnol J. 2021 Feb 23;19:1302-1311. doi: 10.1016/j.csbj.2021.02.005. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 33738079 Free PMC article.
References
- Berg, O.G. and von Hippel, P.H. 1987. Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193 723–750. - PubMed
- Casari, G., Sander, C., and Valencia, A. 1995. A method to predict functional residues in proteins. Nat. Struct. Biol. 2 171–178. - PubMed
- Cover, T.M. and Thomas, J.A. 1991. Elements of information theory. John Wiley & Sons, New York.
- Daniel, R., Bobik, T.A., and Gottschalk, G. 1999. Biochemistry of coenzyme B12-dependent glycerol and diol dehydratases and organization of the encoding genes. FEMS Microbiol. Rev. 22 553–566. - PubMed
- Felsenstein, J. 1996. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 266 418–427. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous