Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information - PubMed (original) (raw)
Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information
Sergey Ovchinnikov et al. Elife. 2014.
Abstract
Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein-protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter.DOI: http://dx.doi.org/10.7554/eLife.02030.001\.
Keywords: protein coevolution; protein complexes; pseudo-likelihood.
Copyright © 2014, Ovchinnikov et al.
Conflict of interest statement
The authors declare that no competing interests exist.
Figures
Figure 1.. Residue pairs with high normalized coupling strengths are in contact in the 50S ribosomal subunit.
(A) Coupling strengths and inter-residue distances for each residue pair in the 50S subunit (black dots). Residue pairs with coupling strength greater than 1.5 are nearly always less than 8 Å apart. (B) Locations of coevolving (high coupling strength) residue pairs in the protein component of the 50S subunit. The monomers have been pulled apart slightly for clarity. Lines connect residue pairs with coupling strength greater than 1.5; yellow, distance less than 8 Å; orange, distance less than 12 Å. (C) Protein pairs with strong inter-residue covariation (colors) make contact in the three-dimensional structure (black boxes). For each protein pair, the sum of the coupling strength greater than 1.5 for each pair of 50S subunit proteins is indicated; black boxes indicate contacts in the crystal structure. (D) Dependence of contact prediction accuracy on coupling strength and the number of sequences in the alignments. For each of the indicated coupling strength cutoffs (colors), the frequency of contact in the 50S structure (y axis) was computed for sub alignments with different sequence depths (x axis). DOI:
http://dx.doi.org/10.7554/eLife.02030.003
Figure 1—figure supplement 1.. Determining GREMLIN scores from normalized coupling strengths.
Top row: (A) Normalized Coupling strengths. (B) GREMLIN score obtained by fitting a sigmoidal function of normalized coupling strengths to observed frequencies on the 50S ribosome (left column) evaluated on the benchmark set (complexes from the NADH dehydrogenase, middle column and the remaining, right column). (C) The GREMLIN score is well-calibrated: the fraction of predictions with a Gremlin score of x that are correct (distance <12 Å) is roughly x (x in [0, 1]). The overall behavior is similar across the three datasets. DOI:
http://dx.doi.org/10.7554/eLife.02030.004
Figure 2.. Residue covariation in complexes with known structures.
(A) Residue-pairs across protein chains with high GREMLIN scores almost always make contact across protein interfaces in experimentally determined complex structures. All contacts with GREMLIN scores greater than 0.6 are shown; the structures are pulled apart for clarity. Labels are according to chains in the PDB structure. (B) Complex I of the electron transport chain has an unusually large number of highly co-varying inter residue pairs not in contact in the crystal structure of 4HEA; these contacts may be formed in different state of the complex. Residue pairs within 8 Å are in yellow, between 8 Å and 12 Å in orange, and greater than 12 Å, in red. Distances are the minimal distances between any side chain heavy atom. Labels are according to chains in 4HEA. (C) Dependence of inter-residue distance distributions on GREMLIN score. All residue–residue pairs between subunits in the benchmark set were grouped into four bins based on their GREMLIN score (colors), and the distribution of residue–residue distances (x axis) within each bin computed from the three-dimensional structures. See Figure 2—source data 1 for the table of all the interfaces used in the calculation. DOI:
http://dx.doi.org/10.7554/eLife.02030.005
Figure 3.. Predicted residue–residue interactions across protein interfaces of unknown structure.
Strongly co-evolving residue pairs for complexes without known structure that had at least one prediction with GREMLIN score greater than or equal to 0.85. Each row shows the residue pairs, their sequence identity and the GREMLIN score. Structure models for complexes highlighted in red are shown in Figure 5. Full dataset is provided with the deposited data. DOI:
http://dx.doi.org/10.7554/eLife.02030.007
Figure 4.. Contact guided protein–protein docking on a benchmark set of 18 protein complexes.
(A) Structure models for each complex were generated by docking structures of its constituents, at least one of which (blue) was not from the structure of the complex guided by coevolution derived distance restraints. The interface C-alpha RMSD (iRMSD) of the structural model with the lowest energy to the experimentally determined structure and the fraction of native contacts are shown. Structure models for cases in red are shown in B and C and D. (B and C) Comparison between native and docked structure for the two largest failures in the benchmark: the large iRMSD is due to large conformational changes in the monomers upon docking but the interface is still modeled correctly in the region not involved in conformational change. (D) Multiple minima in the docking landscape (right) correspond to distinct interfaces in the complex (left). DOI:
http://dx.doi.org/10.7554/eLife.02030.008
Figure 4—figure supplement 1.. Docking landscapes showing iRMSD (x-axis) vs GREMLIN restraint score (y-axis).
Each point represents a structure model generated by docking the subunits guided by the GREMLIN score. Dark blue points are from calculations in which at least one subunit was solved independently of the complex; light blue points, from positive control calculations in which both subunits are from the bound complex. DOI:
http://dx.doi.org/10.7554/eLife.02030.010
Figure 4—figure supplement 2.. Bound set.
Docking landscapes with GREMLIN restraint score. X-axis, iRMSD; y-axis GREMLIN restraint score. DOI:
http://dx.doi.org/10.7554/eLife.02030.011
Figure 5.. Structure models for complexes with unknown structures.
Residue pairs with GREMLIN scores ≥ 0.60 are connected by yellow bars; the structures are pulled apart for clarity. For METQ-METI and PFLA-PFLB GREMLIN scores ≥ 0.3 are shown. For each docking calculation the docking energy landscape is shown, with iRMSD to the selected model on the x-axis. The multiple minima correspond to permutations of the labels on the subunits of the homo-oligomer complex. Predicted structures of each complex are provided with the deposited data. DOI:
http://dx.doi.org/10.7554/eLife.02030.012
Similar articles
- Protein-protein interactions leave evolutionary footprints: High molecular coevolution at the core of interfaces.
Teppa E, Zea DJ, Marino-Buslje C. Teppa E, et al. Protein Sci. 2017 Dec;26(12):2438-2444. doi: 10.1002/pro.3318. Epub 2017 Oct 25. Protein Sci. 2017. PMID: 28980349 Free PMC article. - Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.
Kamisetty H, Ovchinnikov S, Baker D. Kamisetty H, et al. Proc Natl Acad Sci U S A. 2013 Sep 24;110(39):15674-9. doi: 10.1073/pnas.1314045110. Epub 2013 Sep 5. Proc Natl Acad Sci U S A. 2013. PMID: 24009338 Free PMC article. - Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.
Rodriguez-Rivas J, Marsili S, Juan D, Valencia A. Rodriguez-Rivas J, et al. Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):15018-15023. doi: 10.1073/pnas.1611861114. Epub 2016 Dec 13. Proc Natl Acad Sci U S A. 2016. PMID: 27965389 Free PMC article. - Prediction of protein-protein interactions: unifying evolution and structure at protein interfaces.
Tuncbag N, Gursoy A, Keskin O. Tuncbag N, et al. Phys Biol. 2011 Jun;8(3):035006. doi: 10.1088/1478-3975/8/3/035006. Epub 2011 May 13. Phys Biol. 2011. PMID: 21572173 Review. - Evolution of protein structures and interactions from the perspective of residue contact networks.
Zhang X, Perica T, Teichmann SA. Zhang X, et al. Curr Opin Struct Biol. 2013 Dec;23(6):954-63. doi: 10.1016/j.sbi.2013.07.004. Epub 2013 Jul 25. Curr Opin Struct Biol. 2013. PMID: 23890840 Review.
Cited by
- Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.
Gueudré T, Baldassi C, Zamparo M, Weigt M, Pagnani A. Gueudré T, et al. Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11. Proc Natl Acad Sci U S A. 2016. PMID: 27729520 Free PMC article. - Membrane platform protein PulF of the Klebsiella type II secretion system forms a trimeric ion channel essential for endopilus assembly and protein secretion.
Guilvout I, Samsudin F, Huber RG, Bond PJ, Bardiaux B, Francetic O. Guilvout I, et al. mBio. 2024 Jan 16;15(1):e0142323. doi: 10.1128/mbio.01423-23. Epub 2023 Dec 8. mBio. 2024. PMID: 38063437 Free PMC article. - I-COMS: Interprotein-COrrelated Mutations Server.
Iserte J, Simonetti FL, Zea DJ, Teppa E, Marino-Buslje C. Iserte J, et al. Nucleic Acids Res. 2015 Jul 1;43(W1):W320-5. doi: 10.1093/nar/gkv572. Epub 2015 Jun 1. Nucleic Acids Res. 2015. PMID: 26032772 Free PMC article. - Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.
Stein RR, Marks DS, Sander C. Stein RR, et al. PLoS Comput Biol. 2015 Jul 30;11(7):e1004182. doi: 10.1371/journal.pcbi.1004182. eCollection 2015 Jul. PLoS Comput Biol. 2015. PMID: 26225866 Free PMC article. Review. - Performance and Its Limits in Rigid Body Protein-Protein Docking.
Desta IT, Porter KA, Xia B, Kozakov D, Vajda S. Desta IT, et al. Structure. 2020 Sep 1;28(9):1071-1081.e3. doi: 10.1016/j.str.2020.06.006. Epub 2020 Jul 9. Structure. 2020. PMID: 32649857 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
- P41 GM103533/GM/NIGMS NIH HHS/United States
- R01 GM092802/GM/NIGMS NIH HHS/United States
- T32 GM007270/GM/NIGMS NIH HHS/United States
- 1R01GM092802-04/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources