Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential (original) (raw)

Exploring the frontiers of chemistry with a general reactive machine learning potential

2022

Reactive chemistry atomistic simulation has a broad range of applications from drug design to energy to materials discovery. Machine learning interatomic potentials (MLIP) have become an efficient alternative to computationally expensive quantum chemistry simulations. In practice, developing reactive MLIPs requires prior knowledge of reaction networks to generate fitting data and refitting to extensive datasets for each new application. In this work, we develop a general reactive MLIP through unbiased active learning with an atomic configuration sampler inspired by nanoreactor molecular dynamics. The resulting potential (ANI-nr) is then applied to study five distinct condensed-phase reactive chemistry systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early-earth small molecules. In all studies, ANI-nr closely matches experiment and/or previous studies using traditional model chemistry methods. ANI-nr proves to be a highly general reactive MLIP that does not need to be refit for each application, enabling high-throughput in silico reactive chemistry experimentation. 2/17 Applications Dynamical sampler System builder Random molecular concentrations Time varying temperatures and system volumes Active Learning Loop

Active Learning Accelerates Ab Initio Molecular Dynamics on Pericyclic Reactive Energy Surfaces

Modeling dynamical effects in chemical reactions, such as post-transition state bifurcation, requires ab initio molecular dynamics simulations due to the breakdown of simpler static models like transition state theory. However, these simulations tend to be restricted to lower-accuracy electronic structure methods and scarce sampling because of their high computational cost. Here, we report the use of statistical learning to accelerate reactive molecular dynamics simulations by combining high-throughput ab initio calculations, graph-convolution interatomic potentials and active learning. This pipeline was demonstrated on an ambimodal trispericyclic reaction involving 8,8-dicyanoheptafulvene and 6,6-dimethylfulvene. With a dataset size of approximately31,000 M062X/def2-SVP quantum mechanical calculations, the computational cost of exploring the reactive potential energy surface was reduced by an order of magnitude. Thousands of virtually costless picosecond-long reactive trajectories ...

Neural Network Potentials for Reactive Chemistry: CASPT2 Quality Potential Energy Surfaces for Bond Breaking

Neural Network potentials are developed which accurately make and break bonds for use in molecular simulations. We report a neural network potential that can describe the potential energy surface for carbon-carbon bond dissociation with less than 1 kcal/mol error compared to complete active space second-order perturbation theory (CASPT2), and maintains this accuracy for both the minimum energy path and molecular dynamic calculations up to 2000K. We utilize a transfer learning algorithm to develop neural network potentials to generate potential energy surfaces; this method aims to use the minimum amount of CASPT2 data on small systems to train neural network potentials while maintaining excellent transferability to larger systems. First, we generate homolytic carbon-carbon bond dissociation data of small size alkanes with density functional theory (DFT) energies to train the potentials to accurately predict bond dissociation at the DFT level. Then, using transfer learning, we retrain...

ML4Chem: A Machine Learning Package for Chemistry and Materials Science

ML4Chem is an open-source machine learning library for chemistry and materials science. It provides an extendable platform to develop and deploy machine learning models and pipelines and is targeted to the non-expert and expert users. ML4Chem follows user-experience design and offers the needed tools to go from data preparation to inference. Here we introduce its atomistic module for the implementation, deployment, and reproducibility of atom-centered models. This module is composed of six core building blocks: data, featurization, models, model optimization, inference, and visualization. We present their functionality and ease of use with demonstrations utilizing neural networks and kernel ridge regression algorithms.

Developing Machine-Learned Potentials for Coarse-Grained Molecular Simulations: Challenges and Pitfalls

2022

Coarse graining (CG) enables the investigation of molecular properties for larger systems and at longer timescales than the ones attainable at the atomistic resolution. Machine learning techniques have been recently proposed to learn CG particle interactions, i.e. develop CG force fields. Graph representations of molecules and supervised training of a graph convolutional neural network architecture are used to learn the potential of mean force through a force matching scheme. In this work, the force acting on each CG particle is correlated to a learned representation of its local environment that goes under the name of SchNet, constructed via continuous filter convolutions. We explore the application of SchNet models to obtain a CG potential for liquid benzene, investigating the effect of model architecture and hyperparameters on the thermodynamic, dynamical, and structural properties of the simulated CG systems, reporting and discussing challenges encountered and future directions envisioned. CCS CONCEPTS • Applied computing → Physical sciences and engineering • Computing methodologies → Machine learning → Machine learning approaches → Neural Networks • Computing methodologies → Modeling and simulation → Simulation types and techniques → Molecular simulation

ICHOR: A Modern Pipeline for Producing Gaussian Process Regression Models for Atomistic Simulations

Current practical use of machine learning is more involved than model architecture and the optimisation technique itself. It is very important that the modern machine learning method is supported with a robust set of tools for the creation and manipulation of data sets. ICHOR is one such tool designed for the purpose of creating fast and accurate atomistic Gaussian process regression (GPR) models through the use of active learning. ICHOR operates in the context of FFLUX, a fully polarisable force field based on the energies and multipole moments of quantum topological atoms. ICHOR interacts with the in‐house GPR program FEREBUS for training, and with DL_FFLUX (derived from DL_POLY) for geometry optimisation and molecular simulation. ICHOR utilises the latest technologies in HPC cluster management to produce GPR models reliably at scale.

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Physical Review Letters, 2012

We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schrödinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ∼10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.

Choosing the right molecular machine learning potential

Chemical Science

Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra....

Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates

Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28,000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal/mol for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kin...