Choosing the right molecular machine learning potential (original) (raw)
Related papers
The Journal of Physical Chemistry A, 2020
High-fidelity quantum-chemical calculations can provide accurate predictions of molecular energies, but their high computational costs limit their utility, especially for larger molecules. We have shown in previous work that machine learning models trained on high-level quantum-chemical calculations (G4MP2) for organic molecules with one to nine non-hydrogen atoms can provide accurate predictions for other molecules of comparable size at much lower costs. Here we demonstrate that such models can also be used to effectively predict energies of molecules larger than those in the training set. To implement this strategy, we first established a set of 191 molecules with 10−14 non-hydrogen atoms having reliable experimental enthalpies of formation. We then assessed the accuracy of computed G4MP2 enthalpies of formation for these 191 molecules. The error in the G4MP2 results was somewhat larger than that for smaller molecules, and the reason for this increase is discussed. Two density functional methods, B3LYP and ωB97X-D, were also used on this set of molecules, with ωB97X-D found to perform better than B3LYP at predicting energies. The G4MP2 energies for the 191 molecules were then predicted using these two functionals with two machine learning methods, the FCHL-Δ and SchNet-Δ models, with the learning done on calculated energies of the one to nine nonhydrogen atom molecules. The better-performing model, FCHL-Δ, gave atomization energies of the 191 organic molecules with 10−14 non-hydrogen atoms within 0.4 kcal/mol of their G4MP2 energies. Thus, this work demonstrates that quantum-chemically informed machine learning can be used to successfully predict the energies of large organic molecules whose size is beyond that in the training set.
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
Physical Review Letters, 2012
We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schrödinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ∼10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.
Journal of Physical Chemistry Letters, 2015
Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective manybody interactions proves to be essential for approaching the "holy grail" of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.
Towards more efficient and performant computations in quantum chemistry with machine learning
2020
Kernel methods allow an efficient solution of highly non-linear regression problems often encountered in quantum chemistry. Due to its flexibility it is unclear how to design a similarity matrix represented by the kernel which encodes a given learning problem in a compact and beneficial way. In this thesis, we propose novel kernels for quantum mechanical systems which are composed of two- and three-body interaction terms. Specifically, we develop descriptors of molecules which are of fixed size and invariant with respect to translation, rotation and atom indexing. For these representations, we demonstrate their ability to accurately predict quantum mechanical properties in combination with kernel ridge regression. A feature importance analysis reveals insights about the two- and three-body interactions in small organic molecules. Our descriptors are extended by novel decomposition kernels which encode the comparison of two- and three-body combinations of atoms directly into the simi...
2019
High-throughput approximations of quantum mechanics calculations and combinatorial experiments have been traditionally used to reduce the search space of possible molecules, drugs and materials. However, the interplay of structural and chemical degrees of freedom introduces enormous complexity, which the current state-of-the-art tools are not yet designed to handle. The availability of large molecular databases generated by quantum mechanics (QM) computations using first principles open new venues for data science to accelerate the discovery of new compounds. In recent years, models that combine QM with machine learning (ML) known as QM/ML models have been successful at delivering the accuracy of QM at the speed of ML. The goals are to develop a framework that will accelerate the extraction of knowledge and to get insights from quantitative process-structure-property-performance relationships hidden in materials data via a better search of the chemical compound space, and to infer n...
Description of Potential Energy Surfaces of Molecules Using FFLUX Machine Learning Models
Journal of Chemical Theory and Computation, 2018
A new type of model, FFLUX, to describe the interaction between atoms has been developed as an alternative to traditional force fields. FFLUX models are constructed from applying the kriging machine learning method to the topological energy partitioning method, Interacting Quantum Atoms (IQA). The effect of varying parameters in the construction of the FFLUX models is analyzed, with the most dominant effects found to be the structure of the molecule and the number of conformations used to build the model. Using these models the optimization of a variety of small organic molecules is performed, with sub kJ mol-1 accuracy in the energy of the optimized molecules. The FFLUX models are also evaluated in terms of their performance in describing the potential energy surfaces (PESs) associated with specific degrees of freedoms within molecules. While the accurate description of PESs presents greater challenges than individual minima, FFLUX models are able to achieve errors of <2.5 kJ mol-1 across the full CC -CC dihedral PES of n-butane, indicating the future possibilities of the technique.
Research Square (Research Square), 2023
Reactive chemistry atomistic simulation has a broad range of applications from drug design to energy to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive quantum chemistry simulations. In practice, developing reactive MLIPs requires prior knowledge of reaction networks to generate fitting data and refitting to extensive datasets for each new application. For this reason, many fields of chemistry would greatly benefit from a general reactive MLIP, i.e., an MLIP that is applicable to a broad range of reactive chemistry such that it can be applied to new systems without the need for retraining. In this work, we develop a general reactive MLIP through unbiased active learning with an atomic configuration sampler inspired by nanoreactor molecular dynamics. The resulting potential (ANI-1xnr) is then applied to study five distinct condensed-phase reactive chemistry systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early-earth small molecules. In all studies, ANI-1xnr closely matches experiment and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N, and O elements that does not need to be refit for each application, enabling high-throughput in silico reactive chemistry experimentation.
Machine learning of molecular electronic properties in chemical compound space
New Journal of Physics, 2013
The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable highthroughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic 7
Quantum machine learning for electronic structure calculations
Nature Communications
Considering recent advancements and successes in the development of efficient quantum algorithms for electronic structure calculations-alongside impressive results using machine learning techniques for computation-hybridizing quantum computing with machine learning for the intent of performing electronic structure calculations is a natural progression. Here we report a hybrid quantum algorithm employing a restricted Boltzmann machine to obtain accurate molecular potential energy surfaces. By exploiting a quantum algorithm to help optimize the underlying objective function, we obtained an efficient procedure for the calculation of the electronic ground state energy for a small molecule system. Our approach achieves high accuracy for the ground state energy for H 2 , LiH, H 2 O at a specific location on its potential energy surface with a finite basis set. With the future availability of larger-scale quantum computers, quantum machine learning techniques are set to become powerful tools to obtain accurate values for electronic structures.