Olexandr Isayev - Academia.edu (original) (raw)

Papers by Olexandr Isayev

Olexandr Isayev talks about using neural networks to create fast and accurate molecular potential... more Olexandr Isayev talks about using neural networks to create fast and accurate molecular potentials trained on high-level QM data. The resulting ANI model (ANAKIN-ME: Accurate NeurAl networK engINefor Molecular Energies) seems to be very promising. The uploaded material contains presentation slides and video.

arXiv: Chemical Physics, 2018

We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to pre... more We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectories on a variety of molecules. The results are in good agreement with reference IR spectra produced by traditional theoretical methods. Critically, for this application, HIP-NN charge predictions are about 104 times faster than direct DFT charge calculations. Thus, ML provides a pathway to greatly increase the range of feasible simulations while retaining quantum-level accuracy. In summary, our results provide further evidence that machine learning can replicate high-level quantum calculations at a tiny fraction of the computational cost.

Handbook of Materials Modeling, 2018

The traditional paradigm for materials discovery has been recently expanded to incorporate substa... more The traditional paradigm for materials discovery has been recently expanded to incorporate substantial data-driven research. With the intent to accelerate the development and the deployment of new technologies, the AFLOW Fleet for computational materials design automates high-throughput first-principles calculations and provides tools for data verification and dissemination for a

Modern polymer science is plagued by the curse of multidimensionality; the large chemical space i... more Modern polymer science is plagued by the curse of multidimensionality; the large chemical space imposed by including combinations of monomers into a statistical copolymer overwhelms polymer synthesis and characterization technology and limits the ability to systematically study structure-property relationships. To tackle this challenge in the context of 19 F MRI agents, we pursued a computer-guided materials discovery approach that combines synergistic innovations in automated flow synthesis and machine learning (ML) method development. A software controlled, continuous polymer synthesis platform was developed to enable iterative experimental-computational cycles that resulted in the synthesis of 397 unique copolymer compositions within a six-variable compositional space. The non-intuitive design criteria identified by ML, which was accomplished by exploring less than 0.9% of overall compositional space, upended conventional wisdom in the design of 19 F MRI agents and lead to the identification of >10 copolymer compositions that outperformed state-of-the-art materials.

Nature Communications, 2021

Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNN... more Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory q...

Chemical Society Reviews, 2021

We cover diverse methodologies, computational approaches, and case studies illustrating the ongoi... more We cover diverse methodologies, computational approaches, and case studies illustrating the ongoing efforts to develop viable drug candidates for treatment of COVID-19.

Physical Chemistry Chemical Physics, 2020

We formulate a reaction prediction problem in terms of node-classification in a disconnected grap... more We formulate a reaction prediction problem in terms of node-classification in a disconnected graph of source molecules and generalize a graph convolution neural network for disconnected graphs.

Accounts of Chemical Research, 2021

ConspectusMachine learning interatomic potentials (MLIPs) are widely used for describing molecula... more ConspectusMachine learning interatomic potentials (MLIPs) are widely used for describing molecular energy and continue bridging the speed and accuracy gap between quantum mechanical (QM) and classical approaches like force fields. In this Account, we focus on the out-of-the-box approaches to developing transferable MLIPs for diverse chemical tasks. First, we introduce the "Accurate Neural Network engine for Molecular Energies," ANAKIN-ME, method (or ANI for short). The ANI model utilizes Justin Smith Symmetry Functions (JSSFs) and realizes training for vast data sets. The training data set of several orders of magnitude larger than before has become the key factor of the knowledge transferability and flexibility of MLIPs. As the quantity, quality, and types of interactions included in the training data set will dictate the accuracy of MLIPs, the task of proper data selection and model training could be assisted with advanced methods like active learning (AL), transfer learning (TL), and multitask learning (MTL).Next, we describe the AIMNet "Atoms-in-Molecules Network" that was inspired by the quantum theory of atoms in molecules. The AIMNet architecture lifts multiple limitations in MLIPs. It encodes long-range interactions and learnable representations of chemical elements. We also discuss the AIMNet-ME model that expands the applicability domain of AIMNet from neutral molecules toward open-shell systems. The AIMNet-ME encompasses a dependence of the potential on molecular charge and spin. It brings ML and physical models one step closer, ensuring the correct molecular energy behavior over the total molecular charge.We finally describe perhaps the simplest possible physics-aware model, which combines ML and the extended Hückel method. In ML-EHM, "Hierarchically Interacting Particle Neural Network," HIP-NN generates the set of a molecule- and environment-dependent Hamiltonian elements αμμ and K‡. As a test example, we show how in contrast to traditional Hückel theory, ML-EHM correctly describes orbital crossing with bond rotations. Hence it learns the underlying physics, highlighting that the inclusion of proper physical constraints and symmetries could significantly improve ML model generalization.

Advanced Theory and Simulations, 2020

The screening of novel materials is an important topic in the field of materials science. Althoug... more The screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first‐principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state‐of‐the‐art computational resources. Additionally, they can often be extremely time consuming. A time and resource efficient machine learning approach to create a dataset of structural properties of 18 million van der Waals layered structures is described. In particular, the authors focus on the interlayer energy and the elastic constant of layered materials composed of two different 2D structures that are important for novel solid lubricant and super‐lubricant materials. It is shown that machine learning models can predict results of computationally expansive approaches (i.e., density functional theory) with high accuracy.

Despite decades of intensive search for compounds that modulate the activity of particular target... more Despite decades of intensive search for compounds that modulate the activity of particular targets, there are currently small-molecules available only for a small proportion of the human proteome. Effective approaches are therefore required to map the massive space of unexplored compound-target interactions for novel and potent activities. Here, we carried out a crowdsourced benchmarking of predictive models for kinase inhibitor potencies across multiple kinase families using unpublished bioactivity data. The top-performing predictions were based on kernel learning, gradient boosting and deep learning, and their ensemble resulted in predictive accuracy exceeding that of kinase activity assays. We then made new experiments based on the model predictions, which further improved the accuracy of experimental mapping efforts and identified unexpected potencies even for under-studied kinases. The open-source algorithms together with the novel bioactivities between 95 compounds and 295 kin...

Advanced Theory and Simulations, 2019

Calculating vibrational properties of crystals using quantum mechanical (QM) methods is a challen... more Calculating vibrational properties of crystals using quantum mechanical (QM) methods is a challenging problem in computational material science. This problem is solved using complementary machine learning methods that rapidly and reliably recapitulate entropy, specific heat, effective polycrystalline dielectric function, and a non-vibrational property (band gap) for materials calculated by accurate but lengthy QM methods. The materials are described mathematically using property-labeled materials fragment descriptors. The machine learning models predict the QM properties with root mean square errors of 0.31 meV per atom per K for entropy, 0.18 meV per atom per K for specific heat, 4.41 for the trace of the dielectric tensor, and 0.5 eV for band gap. These models are sufficiently accurate to allow rapid screening of large numbers of crystal structures to accelerate material discovery.

Maximum diversification of data is a central theme in building generalized and accurate machine l... more Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based eneral-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data set diversity, we visualize them with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5M density functional theory calculations, while the ANI-1ccx data set contains 500k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM properties from density functional theory and co...

Science Advances, 2019

We introduce a modular, chemically inspired deep neural network model for prediction of several a... more We introduce a modular, chemically inspired deep neural network model for prediction of several atomic and molecular properties.

The Journal of Chemical Physics, 2018

The development of accurate and transferable machine learning (ML) potentials for predicting mole... more The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach we develop the COMP6 benchmark (publicly available on GitHub), which contains a diverse set of organic molecules. Through the AL process, it is shown that the AL-based potentials perform as well as the ANI-1 potential on COMP6 with only 10% of the data, and vastly outperforms ANI-1 with 25% the amount of data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecule or materials, while remaining applicable to the general class of organic molecules comprised of the elements CHNO.

Computational Materials Science, 2018

Machine learning approaches, enabled by the emergence of comprehensive databases of materials pro... more Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materialsneglecting the non-synthesizable systems and those without the desired properties-thus reducing the amount of resources spent on expensive computations and/or time-consuming experimental synthesis. However, using these predictive models is not always straightforward. Often, they require a panoply of technical expertise, creating barriers for general users. AFLOW-ML (AFLOW Machine Learning) overcomes the problem by streamlining the use of the machine learning methods developed within the AFLOW consortium. The framework provides an open RESTful API to directly access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications are envisioned to be capable of further accelerating the adoption of machine learning methods into materials development.

Science Advances, 2018

We introduce an artificial intelligence approach to de novo design of molecules with desired phys... more We introduce an artificial intelligence approach to de novo design of molecules with desired physical or biological properties.

Advanced Theory and Simulations, 2018

Nature Communications, 2017

Although historically materials discovery has been driven by a laborious trial-and-error process,... more Although historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction's accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.