Experimental validation of FINDSITE(comb) virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders (original) (raw)

{"__content__"=>"FINDSITE: A New Approach for Virtual Ligand Screening of Proteins and Virtual Target Screening of Biomolecules.", "sup"=>{"__content__"=>"comb2.0"}}

Journal of chemical information and modeling, 2018

Computational approaches for predicting protein-ligand interactions can facilitate drug lead discovery and drug target determination. We have previously developed a threading/structural-based approach, FINDSITE, for the virtual ligand screening of proteins that has been extensively experimentally validated. Even when low resolution predicted protein structures are employed, FINDSITE has the advantage of being faster and more accurate than traditional high-resolution structure-based docking methods. It also overcomes the limitations of traditional QSAR methods that require a known set of seed ligands that bind to the given protein target. Here, we further improve FINDSITE by enhancing its template ligand selection from the PDB/DrugBank/ChEMBL libraries of known protein-ligand interactions by (1) parsing the template proteins and their corresponding binding ligands in the DrugBank and ChEMBL libraries into domains so that the ligands with falsely matched domains to the targets will no...

FRAGSITE: A Fragment-Based Approach for Virtual Ligand Screening

Journal of Chemical Information and Modeling, 2021

To reduce time and cost, virtual ligand screening (VLS) often precedes experimental ligand screening in modern drug discovery. Traditionally, high-resolution structure-based docking approaches rely on experimental structures, while ligandbased approaches need known binders to the target protein and only explore their nearby chemical space. In contrast, our structure-based FINDSITE comb2.0 approach takes advantage of predicted, low-resolution structures and information from ligands that bind distantly related proteins whose binding sites are similar to the target protein. Using a boosted tree regression machine learning framework, we significantly improved FINDSITE comb2.0 by integrating ligand fragment scores as encoded by molecular fingerprints with the global ligand similarity scores of FINDSI-TE comb2.0. The new approach, FRAGSITE, exploits our observation that ligand fragments, e.g., rings, tend to interact with stereochemically conserved protein subpockets that also occur in evolutionarily unrelated proteins. FRAGSITE was benchmarked on the 102 protein DUD-E set, where any template protein whose sequence identify >30% to the target was excluded. Within the top 100 ranked molecules, FRAGSITE improves VLS precision and recall by 14.3 and 18.5%, respectively, relative to FINDSITE comb2.0. Moreover, the mean top 1% enrichment factor increases from 25.2 to 30.2. On average, both outperform state-of-the-art deep learning-based methods such as AtomNet. On the more challenging unbiased set LIT-PCBA, FRAGSITE also shows better performance than ligand similarity-based and docking approaches such as two-dimensional ECFP4 and Surflex-Dock v.3066. On a subset of 23 targets from DEKOIS 2.0, FRAGSITE shows much better performance than the boosted tree regression-based, vScreenML scoring function. Experimental testing of FRAGSITE's predictions shows that it has more hits and covers a more diverse region of chemical space than FINDSITE comb2.0. For the two proteins that were experimentally tested, DHFR, a well-studied protein that catalyzes the conversion of dihydrofolate to tetrahydrofolate, and the kinase ACVR1, FRAGSITE identified new small-molecule nanomolar binders. Interestingly, one new binder of DHFR is a kinase inhibitor predicted to bind in a new subpocket. For ACVR1, FRAGSITE identified new molecules that have diverse scaffolds and estimated nanomolar to micromolar affinities. Thus, FRAGSITE shows significant improvement over prior state-of-the-art ligand virtual screening approaches. A web server is freely available for academic users at http:/sites.gatech.edu/cssb/FRAGSITE.

FINDSITEcomb: A Threading/Structure-Based, Proteomic-Scale Virtual Ligand Screening Approach

Journal of Chemical Information and Modeling, 2013

Virtual ligand screening is an integral part of the modern drug discovery process. Traditional ligand-based, virtual screening approaches are fast but require a set of structurally diverse ligands known to bind to the target. Traditional structure-based approaches require high-resolution target protein structures and are computationally demanding. In contrast, the recently developed threading/structure-based FINDSITE-based approaches have the advantage that they are as fast as traditional ligand-based approaches and yet overcome the limitations of traditional ligand-or structurebased approaches. These new methods can use predicted lowresolution structures and infer the likelihood of a ligand binding to a target by utilizing ligand information excised from the target's remote or close homologous proteins and/or libraries of ligand binding databases. Here, we develop an improved version of FINDSITE, FINDSITE filt , that filters out false positive ligands in threading identified templates by a better binding site detection procedure that includes information about the binding site amino acid similarity. We then combine FINDSITE filt with FINDSITE X that uses publicly available binding databases ChEMBL and DrugBank for virtual ligand screening. The combined approach, FINDSITE comb , is compared to two traditional docking methods, AUTODOCK Vina and DOCK 6, on the DUD benchmark set. It is shown to be significantly better in terms of enrichment factor, dependence on target structure quality, and speed. FINDSITE comb is then tested for virtual ligand screening on a large set of 3576 generic targets from the DrugBank database as well as a set of 168 Human GPCRs. Excluding close homologues, FINDSITE comb gives an average enrichment factor of 52.1 for generic targets and 22.3 for GPCRs within the top 1% of the screened compound library. Around 65% of the targets have better than random enrichment factors. The performance is insensitive to target structure quality, as long as it has a TMscore ≥ 0.4 to native. Thus, FINDSITE comb makes the screening of millions of compounds across entire proteomes feasible. The FINDSITE comb web service is freely available for academic users at http://cssb.biology.gatech.edu/skolnick/webservice/ FINDSITE-COMB/index.html

SPOT-ligand 2: improving structure-based virtual screening by binding-homology search on an expanded structural template library

Bioinformatics (Oxford, England), 2017

The high cost of drug discovery motivates the development of accurate virtual screening tools. Binding-homology, which takes advantage of known protein-ligand binding pairs, has emerged as a powerful discrimination technique. In order to exploit all available binding data, modelled structures of ligand-binding sequences may be used to create an expanded structural binding template library. SPOT-Ligand 2 has demonstrated significantly improved screening performance over its previous version by expanding the template library 15 times over the previous one. It also performed better than or similar to other binding-homology approaches on the DUD and DUD-E benchmarks. The server is available online at http://sparks-lab.org . yaoqi.zhou@griffith.edu.au or yuedong.yang@griffith.edu.au. Supplementary data are available at Bioinformatics online.

HierVLS Hierarchical Docking Protocol for Virtual Ligand Screening of Large-Molecule Databases

Journal of Medicinal Chemistry, 2004

To provide practical means for rapidly scanning the extensive experimental combinatorial chemistry libraries now available for high-throughput screening (HTS), it is essential to establish computational virtual ligand screening (VLS) techniques to rapidly identify out of a large library all active compounds against a particular protein target. Toward this goal we developed HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using successively more accurate but more expensive descriptions of the ligand-protein-solvent interactions to filter successively fewer cases. The final step of this procedure optimizes one configuration of the ligand in the protein site using our most accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. HierVLS is based on the HierDock approach, but rather than allowing an hour or more to determine the best binding site and energy for each ligands (as in HierDock), we have adapted our procedure so that it can lead to reliable results while using only 4 min (866 MHz Pentium III processor) per ligand. To validate the accuracy for HierVLS to predict the experimentally observed binding conformation, we considered 37 cocrystal structures comprising 11 target proteins. We find that HierVLS identifies the correct binding mode for all 37 cocrystals. In addition, the calculated binding energies correlate well with available experimental binding constants. To validate how well HierVLS can identify the correct ligand in an extensive library of decoys, we considered a library of over 10 000 molecules. HierVLS identifies 26 out of the 37 cases in the top 2% ranked by binding affinity among the 10 037 molecules. The failures result from either metalcontaining sites on the protein or water-mediated ligand-protein interactions, which we anticipate can be solved within the constraints of practical VLS. We then applied HierVLS to screen a 55000-compound virtual library against the target protein-tyrosine phosphatase 1B (ptp1b). The top 250 compounds by binding affinity included all six ptp1b cocrystal ligands added to the library plus three other experimentally confirmed binders. The best (top 1) binder is an experimentally confirmed positive. We conclude that HierVLS is useful for selecting leads for a particular target out of large combinatorial databases.

Discovering high-affinity ligands from the computationally predicted structures and affinities of small molecules bound to a target: A virtual screening approach

2000

We describe a 'virtual NMR screening' method to assist in the design of inhibitors that occupy different sites within a target. We dock small molecules into the active site of an enzyme and score them. Keeping the tightest-binding lead fixed in space, we dock and score other small molecules in its presence. Using this approach, linker groups are used to join the compounds together to form a high-affinity inhibitor. We present validation of our computational approach by reproducing experimental results for FKBP and stromelysin. Docking simulations are not subject to experimental problems such as proteolysis, protein or compound insolubility, or enzyme size. Because docking is fast and our scoring method can distinguish between high-and low-affinity inhibitors, this docking procedure shows promise as integral part of a drug-design strategy.

Virtual ligand screening: strategies, perspectives and limitations

Drug Discovery Today, 2006

In contrast to high-throughput screening, in virtual ligand screening (VS), compounds are selected using computer programs to predict their binding to a target receptor. A key prerequisite is knowledge about the spatial and energetic criteria responsible for protein-ligand binding. The concepts and prerequisites to perform VS are summarized here, and explanations are sought for the enduring limitations of the technology. Target selection, analysis and preparation are discussed, as well as considerations about the compilation of candidate ligand libraries. The tools and strategies of a VS campaign, and the accuracy of scoring and ranking of the results, are also considered.

Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results

Journal of Computer-Aided Molecular Design, 2008

As an extension to a previous published study (McGaughey et al., J Chem Inf Model 47:1504-1519, 2007) comparing 2D and 3D similarity methods to docking, we apply a subset of those virtual screening methods (TOPOSIM, SQW, ROCS-color, and Glide) to a set of protein/ligand pairs where the protein is the target for docking and the cocrystallized ligand is the target for the similarity methods. Each protein is represented by a maximum of five crystal structures. We search a diverse subset of the MDDR as well as a diverse small subset of the MCIDB, Merck's proprietary database. It is seen that the relative effectiveness of virtual screening methods, as measured by the enrichment factor, is highly dependent on the particular crystal structure or ligand, and on the database being searched. 2D similarity methods appear very good for the MDDR, but poor for the MCIDB. However, ROCS-color (a 3D similarity method) does well for both databases.

Protein Binding Pocket Optimization for Virtual High-Throughput Screening (vHTS) Drug Discovery

ACS omega, 2020

The virtual high-throughput screening (vHTS) approach has been widely used for large database screening to identify potential lead compounds for drug discovery. Due to its high computational demands, docking that allows receptor flexibility has been a challenging problem for virtual screening. Therefore, the selection of protein target conformations is crucial to produce useful vHTS results. Since only a single protein structure is used to screen large databases in most vHTS studies, the main challenge is to reduce false negative rates in selecting compounds for in vitro tests. False negatives are most likely to occur when using apo structures or homology models of protein targets due to the small volume of the binding pocket formed by incorrect side-chain conformations. Even holo protein structures can exhibit high false negative rates due to ligand-induced fit effects, since the shape of the binding pocket highly depends on its bound ligand. To reduce false negative rates and improve success rates for vHTS in drug discovery, we have developed a new Monte Carlo-based approach that optimizes the binding pocket of protein targets. This newly developed Monte Carlo pocket optimization (MCPO) approach was assessed on several datasets showing promising results. The binding pocket optimization approach could be a useful tool for vHTS-based drug discovery, especially in cases when only apo structures or homology models are available.

Structure-Based Virtual Screening Approach for Discovery of Covalently Bound Ligands

Journal of Chemical Information and Modeling, 2014

We present a fast and effective covalent docking approach suitable for large-scale virtual screening (VS). We applied this method to four targets (HCV NS3 protease, Cathepsin K, EGFR, and XPO1) with known crystal structures and known covalent inhibitors. We implemented a customized "VS mode" of the Schrödinger Covalent Docking algorithm (CovDock), which we refer to as CovDock-VS. Known actives and target-specific sets of decoys were docked to selected X-ray structures and poses were filtered based on noncovalent protein-ligand interactions known to be important for activity. We were able to retrieve 71%, 72%, and 77% of the known actives for Cathepsin K, HCV NS3 Protease, and EGFR within 5% of the decoy library, respectively. With the more challenging XPO1 target, where no specific interactions with the protein could be used for postprocessing of the docking results, we were able to retrieve 95% of the actives within 30% of the decoy library and achieved an early enrichment (EF1%) of 33. The poses of the known actives bound to existing crystal structures of 4 targets were predicted with an average RMSD of 1.9 Å. To the best of our knowledge, CovDock-VS is the first fully automated tool for efficient virtual screening of covalent inhibitors. Importantly, CovDock-VS can handle multiple chemical reactions within the same library, only requiring a generic SMARTS-based predefinition of the reaction. CovDock-VS provides a fast and accurate way of differentiating actives from decoys without significantly deteriorating the accuracy of the predicted poses for covalent protein-ligand complexes.