Identification of near-native structures by clustering protein docking conformations (original) (raw)

A new scoring function for protein-protein docking that identifies native structures with unprecedented accuracy

Physical chemistry chemical physics : PCCP, 2015

Protein-protein (P-P) 3D structures are fundamental to structural biology and drug discovery. However, most of them have never been determined. Many docking algorithms were developed for that purpose, but they have a very limited accuracy in generating native-like structures and identifying the most correct one, in particular when a single answer is asked for. With such a low success rate it is difficult to point out one docked structure as being native-like. Here we present a new, high accuracy, scoring method to identify the 3D structure of P-P complexes among a set of trial poses. It incorporates alanine scanning mutagenesis experimental data that need to be obtained a priori. The scoring scheme works by matching the computational and the experimental alanine scanning mutagenesis results. The size of the trial P-P interface area is also taken into account. We show that the method ranks the trial structures and identifies the native-like structures with unprecedented accuracy (∼94...

Consensus scoring for enriching near-native structures from protein-protein docking decoys

Proteins, 2009

The identification of near native protein–protein complexes among a set of decoys remains highly challenging. A strategy for improving the success rate of near native detection is to enrich near native docking decoys in a small number of top ranked decoys. Recently, we found that a combination of three scoring functions (energy, conservation, and interface propensity) can predict the location of binding interface regions with reasonable accuracy. Here, these three scoring functions are modified and combined into a consensus scoring function called ENDES for enriching near native docking decoys. We found that all individual scores result in enrichment for the majority of 28 targets in ZDOCK2.3 decoy set and the 22 targets in Benchmark 2.0. Among the three scores, the interface propensity score yields the highest enrichment in both sets of protein complexes. When these scores are combined into the ENDES consensus score, a significant increase in enrichment of near-native structures is found. For example, when 2000 dock decoys are reduced to 200 decoys by ENDES, the fraction of near-native structures in docking decoys increases by a factor of about six in average. ENDES was implemented into a computer program that is available for download at http://sparks.informatics.iupui.edu. Proteins 2009. © 2008 Wiley-Liss, Inc.

Evaluation of multiple protein docking structures using correctly predicted pairwise subunits

BMC Bioinformatics, 2012

Background: Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights. Methods: We generated a series of predicted models (decoys) of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys. Results and conclusion: We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD) to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å) as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.

Optimal Clustering for Detecting Near-Native Conformations in Protein Docking

Biophysical Journal, 2005

Clustering is one of the most powerful tools in computational biology. The conventional wisdom is that events that occur in clusters are probably not random. In protein docking, the underlying principle is that clustering occurs because longrange electrostatic and/or desolvation forces steer the proteins to a low free-energy attractor at the binding region. Something similar occurs in the docking of small molecules, although in this case shorter-range van der Waals forces play a more critical role. Based on the above, we have developed two different clustering strategies to predict docked conformations based on the clustering properties of a uniform sampling of low free-energy protein-protein and protein-small molecule complexes. We report on significant improvements in the automated prediction and discrimination of docked conformations by using the cluster size and consensus as a ranking criterion. We show that the success of clustering depends on identifying the appropriate clustering radius of the system. The clustering radius for protein-protein complexes is consistent with the range of the electrostatics and desolvation free energies (i.e., between 4 and 9 Å ); for protein-small molecule docking, the radius is set by van der Waals interactions (i.e., at ;2 Å ). Without any a priori information, a simple analysis of the histogram of distance separations between the set of docked conformations can evaluate the clustering properties of the data set. Clustering is observed when the histogram is bimodal. Data clustering is optimal if one chooses the clustering radius to be the minimum after the first peak of the bimodal distribution. We show that using this optimal radius further improves the discrimination of near-native complex structures.

Accounting for pairwise distance restraints in FFT-based protein–protein docking

Bioinformatics, 2016

Summary: ClusPro is a heavily used protein–protein docking server based on the fast Fourier transform (FFT) correlation approach. While FFT enables global docking, accounting for pairwise distance restraints using penalty terms in the scoring function is computationally expensive. We use a different approach and directly select low energy solutions that also satisfy the given restraints. As expected, accounting for restraints generally improves the rank of near native predictions, while retaining or even improving the numerical efficiency of FFT based docking. Availability and Implementation: The software is freely available as part of the ClusPro web-based server at http://cluspro.org/nousername.php Contact: midas@laufercenter.org or vajda@bu.edu Supplementary information : Supplementary data are available at Bioinformatics online.

ClusPro: a fully automated algorithm for protein-protein docking

Nucleic Acids Research, 2004

ClusPro (http://nrc.bu.edu/cluster) represents the first fully automated, web-based program for the computational docking of protein structures. Users may upload the coordinate files of two protein structures through ClusPro's web interface, or enter the PDB codes of the respective structures, which ClusPro will then download from the PDB server (http:// www.rcsb.org/pdb/). The docking algorithms evaluate billions of putative complexes, retaining a preset number with favorable surface complementarities. A filtering method is then applied to this set of structures, selecting those with good electrostatic and desolvation free energies for further clustering. The program output is a short list of putative complexes ranked according to their clustering properties, which is automatically sent back to the user via email.

ClusPro: an automated docking and discrimination method for the prediction of protein complexes

Bioinformatics, 2004

Predicting protein interactions is one of the most challenging problems in functional genomics. Given two proteins known to interact, current docking methods evaluate billions of docked conformations by simple scoring functions, and in addition to near-native structures yield many false positives, i.e. structures with good surface complementarity but far from the native. Results: We have developed a fast algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. The free energy filters select complexes with lowest desolvation and electrostatic energies. Clustering is then used to smooth the local minima and to select the ones with the broadest energy wells-a property associated with the free energy at the binding site. The robustness of the method was tested on sets of 2000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 Å from the native structure. The docking and discrimination method also provides good results for a number of complexes that were used as targets in the Critical Assessment of PRedictions of Interactions experiment. Availability: The fully automated docking and discrimination server ClusPro can be found at http://structure.

PIPER: An FFT-based protein docking program with pairwise potentials

Proteins: Structure, Function, and Bioinformatics, 2006

The Fast Fourier Transform (FFT) correlation approach to protein-protein docking can evaluate the energies of billions of docked conformations on a grid if the energy is described in the form of a correlation function. Here, this restriction is removed, and the approach is efficiently used with pairwise interactions potentials that substantially improve the docking results. The basic idea is approximating the interaction matrix by its eigenvectors corresponding to the few dominant eigenvalues, resulting in an energy expression written as the sum of a few correlation functions, and solving the problem by repeated FFT calculations. In addition to describing how the method is implemented, we present a novel class of structure based pairwise intermolecular potentials. The DARS (Decoys As the Reference State) potentials are extracted from structures of protein-protein complexes and use large sets of docked conformations as decoys to derive atom pair distributions in the reference state. The current version of the DARS potential works well for enzyme-inhibitor complexes. With the new FFT-based program, DARS provides much better docking results than the earlier approaches, in many cases generating 50% more near-native docked conformations. Although the potential is far from optimal for antibody-antigen pairs, the results are still slightly better than those given by an earlier FFT method. The docking program PIPER is freely available for non-commercial applications.