Automatic selection of near-native protein-ligand conformations using a hierarchical clustering and volunteer computing (original) (raw)
Related papers
Cross Docking Benchmark for automated Pose and Ranking prediction of ligand binding
Protein Science
Significant efforts have been devoted in the last decade to improving molecular docking techniques to predict both accurate binding poses and ranking affinities. Some shortcomings in the field are the limited number of standard methods for measuring docking success and the availability of widely accepted standard data sets for use as benchmarks in comparing different docking algorithms throughout the field. In order to address these issues, we have created a Cross-Docking Benchmark server. The server is a versatile cross-docking data set containing 4,399 protein-ligand complexes across 95 protein targets intended to serve as benchmark set and gold standard for state-of-the-art pose and ranking prediction in easy, medium, hard, or very hard docking targets. The benchmark along with a customizable cross-docking data set generation tool is available at http://disco.csb.pitt.edu. We further demonstrate the potential uses of the server in questions outside of basic benchmarking such as the selection of the ideal docking reference structure.
PLOS ONE, 2017
Protein-protein docking protocols aim to predict the structures of protein-protein complexes based on the structure of individual partners. Docking protocols usually include several steps of sampling, clustering, refinement and re-scoring. The scoring step is one of the bottlenecks in the performance of many state-of-the-art protocols. The performance of scoring functions depends on the quality of the generated structures and its coupling to the sampling algorithm. A tool kit, GRADSCOPT (GRid Accelerated Directly SCoring OPTimizing), was designed to allow rapid development and optimization of different knowledge-based scoring potentials for specific objectives in protein-protein docking. Different atomistic and coarsegrained potentials can be created by a grid-accelerated directly scoring dependent Monte-Carlo annealing or by a linear regression optimization. We demonstrate that the scoring functions generated by our approach are similar to or even outperform state-of-the-art scoring functions for predicting near-native solutions. Of additional importance, we find that potentials specifically trained to identify the native bound complex perform rather poorly on identifying acceptable or medium quality (near-native) solutions. In contrast, atomistic longrange contact potentials can increase the average fraction of near-native poses by up to a factor 2.5 in the best scored 1% decoys (compared to existing scoring), emphasizing the need of specific docking potentials for different steps in the docking protocol.
Selecting Molecular Docking Sites by Neighbor Selection and Various Factors
Methods for finding molecular sites in molecular docking simulation is proposed in the paper. The method distinguishes the surface/inside atoms of the receptor by selecting a suitable distance maximizing the standard deviation of corresponding neighboring degrees of the molecules. With various considerations and different set ups of the underlying parametric spaces, the searching space for the docking simulation problem can be significantly reduced. The method is implemented upon the widely employed automated molecular docking simulation software package, AutoDock. Experiments are set up to test upon Japanese encephalitis related biomolecules in virology research. In average, the proposed k-gridbox algorithm is about 2.3 flods faster. Hadoop MapReduce frameworks are used in our experiments to parallelize the underlying massive computation works corresponding to ligand-receptor pairs examined under the experiment. The experiment shows that the proposed method is much more efficient comparing to the general parametric set ups.
Journal of Cheminformatics
Background: In drug design, an efficient structure-based optimization of a ligand needs the precise knowledge of the protein-ligand interactions. In the absence of experimental information, docking programs are necessary for ligand positioning, and the choice of a reliable program is essential for the success of such an optimization. The performances of four popular docking programs, Gold, Glide, Surflex and FlexX, were investigated using 100 crystal structures of complexes taken from the Directory of Useful Decoys-Enhanced database. Results: The ligand conformational sampling was rather efficient, with a correct pose found for a maximum of 84 complexes, obtained by Surflex. However, the ranking of the correct poses was not as efficient, with a maximum of 68 top-rank or 75 top-4 rank correct poses given by Glidescore. No relationship was found between either the sampling or the scoring performance of the four programs and the properties of either the targets or the small molecules, except for the number of ligand rotatable bonds. As well, no exploitable relationship was found between each program performance in docking and in virtual screening; a wrong top-rank pose may obtain a good score that allows it to be ranked among the most active compounds and vice versa. Also, to improve the results of docking, the strengths of the programs were combined either by using a rescoring procedure or the United Subset Consensus (USC). Oddly, positioning with Surflex and rescoring with Glidescore did not improve the results. However, USC based on docking allowed us to obtain a correct pose in the top-4 rank for 87 complexes. Finally, nine complexes were scrutinized, because a correct pose was found by at least one program but poorly ranked by all four programs. Contrarily to what was expected, except for one case, this was not due to weaknesses of the scoring functions. Conclusions: We conclude that the scoring functions should be improved to detect the correct poses, but sometimes their failure may be due to other varied considerations. To increase the chances of success, we recommend to use several programs and combine their results.
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
VoteDock: Consensus docking method for prediction of protein-ligand interactions
Journal of Computational Chemistry, 2011
Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein-ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In silico, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein-ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein-ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein-ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 Å . Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5. q
Physical chemistry chemical physics : PCCP, 2015
Protein-protein (P-P) 3D structures are fundamental to structural biology and drug discovery. However, most of them have never been determined. Many docking algorithms were developed for that purpose, but they have a very limited accuracy in generating native-like structures and identifying the most correct one, in particular when a single answer is asked for. With such a low success rate it is difficult to point out one docked structure as being native-like. Here we present a new, high accuracy, scoring method to identify the 3D structure of P-P complexes among a set of trial poses. It incorporates alanine scanning mutagenesis experimental data that need to be obtained a priori. The scoring scheme works by matching the computational and the experimental alanine scanning mutagenesis results. The size of the trial P-P interface area is also taken into account. We show that the method ranks the trial structures and identifies the native-like structures with unprecedented accuracy (∼94...
A biased random key genetic algorithm for the protein–ligand docking problem
Soft Computing, 2018
Molecular docking is a valuable tool for drug discovery. Receptor and flexible Ligand docking is a very computationally expensive process due to a large number of degrees of freedom of the ligand and the roughness of the molecular binding search space. A molecular docking simulation starts with receptor and ligand unbound structures, and the algorithm tests hundreds of thousands of ligand conformations and orientations to find the best receptor-ligand binding affinity by assigning and optimizing an energy function. Although the advances in the conception of methods and computational strategies for searching the best protein-ligand binding affinity, the development of new strategies, the adaptation, and investigation of new approaches and the combination of existing and state-of-the-art computational methods and techniques to the molecular docking problem are needed. We developed a Biased Random Key Genetic Algorithm as a sampling strategy to search the protein-ligand conformational space. We use a different method to discretize the search space. The proposed method (namely, BRKGA-DOCK) has been tested on a selection of protein-ligand complexes and compared to existing tools AUTODOCK VINA, DOCKTHOR, and a multiobjective approach (jMETAL). Compared to other traditional docking software, the proposed method shows best average Root-Mean-Square Deviation. Structural results were also statistically analyzed. The proposed method proved to be efficient and a good alternative for the molecular docking problem.
An interaction-motif-based scoring function for protein-ligand docking
BMC Bioinformatics, 2010
Background: A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions. Results: A novel scoring function for protein-ligand docking, MotifScore, was developed. It is non-energy-based, and docking is, instead, scored by counting the occurrences of motifs of protein-ligand interaction networks constructed using structures of protein-ligand complexes. MotifScore has been tested on a benchmark set established by others to assess its ability to identify near-native complex conformations among a set of decoys. In this benchmark test, 84% of the highest-scored docking conformations had root-mean-square deviations (rmsds) below 2.0 Å from the native conformation, which is comparable with the best of several energy-based docking scoring functions. Many of the top motifs, which comprise a multitude of chemical groups that interact simultaneously and make a highly significant contribution to MotifScore, capture recurrent interacting patterns beyond pairwise interactions. Conclusions: While providing quite good docking scores, MotifScore is quite different from conventional energy-based functions. MotifScore thus represents a new, network-based approach for exploring problems associated with molecular docking.
Scoring docking conformations using predicted protein interfaces
Background: Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results: First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion: Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations.