High throughput virtual screening with data level parallelism in multi-core processors (original) (raw)

High Performance in silico Virtual Drug Screening on Many-Core Processors

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, labbased methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multicore CPUs with SIMD instruction sets.

High Throughput Computing Validation for Drug Discovery Using the DOCK Program on a Massively Parallel System

2008

This IBM® Redpaper publication presents a virtual screening study of the DOCK Version 6.0 molecular docking software package on a massively parallel system, the IBM System Blue Gene® supercomputer, Blue Gene/L. 1 Virtual screening of very large libraries of small ligands requires not only efficient algorithms but an efficient implementation for docking thousands, if not millions, of compounds simultaneously in a reasonable amount of time.

Accelerating Molecular Docking by Parallelized Heterogeneous Computing - A Case Study of Performance, Quality of Results, and Energy-Efficiency using CPUs, GPUs, and FPGAs

2019

Molecular Docking (MD) is a key tool in computer-aided drug design that aims to predict the binding pose between a small molecule and a macromolecular target. At its core, MD calculates the strength of possible binding poses, and searches for the energetically-stronger ones among those generated during simulation. Automatic Docking (AutoDock) is a widely-used MD code that employs a physics-based scoring function to quantify the binding strength. AutoDock also uses a Lamarckian Genetic Algorithm (LGA), and in turn, the Solis-Wets method, as a local-search algorithm, in order to find strong interactions of such molecular systems. Due to the highly-parallel nature of the LGA tasks involved, AutoDock can benefit from runtime acceleration based on parallelization. This thesis presents an OpenCL-based parallelization of AutoDock, and a corresponding evaluation in terms of execution performance, quality-of-results, and compute-energy efficiency, achieved on different platforms based on: mu...

DOVIS: an implementation for high-throughput virtual screening using AutoDock

BMC Bioinformatics, 2008

Background: Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.

Parallelizing Irregular Computations for Molecular Docking

2020

AUTODOCK is a molecular docking software widely used in computational drug design. Its time-consuming executions have motivated the development of AUTODOCK-GPU, an OpenCL-accelerated version that can run on GPUs and CPUs. This work discusses the development of AUTODOCK-GPU from a programming perspective, detailing how our design addresses the irregularity of AUTODOCK while pushing towards higher performance. Details on required data transformations, re-structuring of complex functionality, as well as the performance impact of different configurations are also discussed. While AUTODOCK-GPU reaches speedup factors of 341x on a Titan V GPU and 51x on a 48-core Xeon Platinum 8175M CPU, experiments show that performance gains are highly dependent on the molecular complexity under analysis. Finally, we summarize our preliminary experiences when porting AUTODOCK onto FPGAs.

DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0

Chemistry Central Journal, 2008

Background: Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability.

Surrogate docking: structure-based virtual screening at high throughput speed

Journal of Computer-Aided Molecular Design, 2005

Structure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast structure-based screening method, which utilizes docking of a limited number of compounds to build a 2D QSAR model used to rapidly score the rest of the database. We compare here a model based on radial basis functions and a Bayesian categorization model. The number of compounds that need to be actually docked depends on the number of docking hits found. In our case studies reasonable quality models are built after docking of the number of molecules containing 50dockinghits.TherestofthelibraryisscreenedbytheQSARmodel.OptionallyafractionoftheQSAR−prioritizedlibrarycanbedockedinordertofindthetruedockinghits.Thequalityofthemodelonlydependsonthetrainingsetsize−notonthesizeofthelibrarytobescreened.Therefore,forlargerlibrariesthemethodyieldshighergaininspeednochangeinperformance.Prioritizingalargelibrarywiththesemodelsprovidesasignificantenrichmentwithdockinghits:itattainsthevaluesof50 docking hits. The rest of the library is screened by the QSAR model. Optionally a fraction of the QSAR-prioritized library can be docked in order to find the true docking hits. The quality of the model only depends on the training set size-not on the size of the library to be screened. Therefore, for larger libraries the method yields higher gain in speed no change in performance. Prioritizing a large library with these models provides a significant enrichment with docking hits: it attains the values of 50dockinghits.TherestofthelibraryisscreenedbytheQSARmodel.OptionallyafractionoftheQSARprioritizedlibrarycanbedockedinordertofindthetruedockinghits.Thequalityofthemodelonlydependsonthetrainingsetsizenotonthesizeofthelibrarytobescreened.Therefore,forlargerlibrariesthemethodyieldshighergaininspeednochangeinperformance.Prioritizingalargelibrarywiththesemodelsprovidesasignificantenrichmentwithdockinghits:itattainsthevaluesof13 and $35 at the beginning of the score-sorted libraries in our two case studies: screening of the NCI collection and a combinatorial libraries on CDK2 kinase structure. With such enrichments, only a fraction of the database must actually be docked to find many of the true hits. The throughput of the method allows its use in screening of large compound collections and in the design of large combinatorial libraries. The strategy proposed has an important effect on efficiency but does not affect retrieval of actives, the latter being determined by the quality of the docking method itself.

Porting and Optimizing Molecular Docking onto the SX-Aurora TSUBASA Vector Computer

Supercomputing Frontiers and Innovations, 2021

In computer-aided drug design, the rapid identification of drugs is critical for combating diseases. A key method in this field is molecular docking, which aims to predict the interactions between two molecules. Molecular docking involves long simulations running compute-intensive algorithms, and thus, can profit a lot from hardware-based acceleration. In this work, we investigate the performance efficiency of the SX-Aurora TSUBASA vector computer for such simulations. Specifically, we present our methodology for porting and optimizing AutoDock, a widely-used molecular docking program. Using a number of platform-specific code optimizations, we achieved executions on the SX-Aurora TSUBASA that are in average 3.6× faster than on modern 128-core CPU servers, and up to a certain extent, competitive to V100 and A100 GPUs. To the best of our knowledge, this is the first molecular docking implementation for the SX-Aurora TSUBASA.

Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers

Journal of Computational Chemistry, 2011

A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein–ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011

001 Ways to run AutoDock Vina for virtual screening

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and setup for their future large virtual screening experiments.