Intramolecular Polarisable Multipolar Electrostatics from the Machine Learning Method Kriging (original) (raw)
Related papers
Theoretical Chemistry Accounts, 2012
We present a polarisable multipolar interatomic electrostatic potential energy function for force fields and describe its application to the pilot molecule MeNH-Ala-COMe (AlaD). The total electrostatic energy associated with 1, 4 and higher interactions is partitioned into atomic contributions by application of quantum chemical topology (QCT). The exact atom-atom interaction is expressed in terms of atomic multipole moments. The machine learning method Kriging is used to model the dependence of these multipole moments on the conformation of the entire molecule. The resulting models are able to predict the QCT-partitioned multipole moments for arbitrary chemically relevant molecular geometries. The interaction energies between atoms are predicted for these geometries and compared to their true values. The computational expense of the procedure is compared to that of the point charge formalism. Published as part of the special collection of articles: From quantum mechanics to force fields: new methodologies for the classical simulation of complex systems. M. J. L. Mills Manchester Interdisciplinary Biocentre (MIB),
Journal of Physical Chemistry Letters, 2015
Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective manybody interactions proves to be essential for approaching the "holy grail" of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.
Journal of Computational Chemistry, 2013
We propose a generic method to model polarization in the context of high-rank multipolar electrostatics. This method involves the machine learning technique kriging, here used to capture the response of an atomic multipole moment of a given atom to a change in the positions of the atoms surrounding this atom. The atoms are malleable boxes with sharp boundaries, they do not overlap and exhaust space. The method is applied to histidine where it is able to predict atomic multipole moments (up to hexadecapole) for unseen configurations, after training on 600 geometries distorted using normal modes of each of its 24 local energy minima at B3LYP/apc-1 level. The quality of the predictions is assessed by calculating the Coulomb energy between an atom for which the moments have been predicted and the surrounding atoms (having exact moments). Only interactions between atoms separated by three or more bonds ("1, 4 and higher" interactions) are included in this energy error. This energy is compared with that of a central atom with exact multipole moments interacting with the same environment. The resulting energy discrepancies are summed for 328 atom-atom interactions, for each of the 29 atoms of histidine being a central atom in turn. For 80% of the 539 test configurations (outside the training set), this summed energy deviates by less than 1 kcal mol 21 .
Polarizable Atomic Multipole-Based Molecular Mechanics for Organic Molecules
Journal of Chemical Theory and Computation, 2011
An empirical potential based on permanent atomic multipoles and atomic induced dipoles is reported for alkanes, alcohols, amines, sulfides, aldehydes, carboxylic acids, amides, aromatics, and other small organic molecules. Permanent atomic multipole moments through quadrupole moments have been derived from gas phase ab initio molecular orbital calculations. The van der Waals parameters are obtained by fitting to gas phase homodimer QM energies and structures, as well as experimental densities and heats of vaporization of neat liquids. As a validation, the hydrogen bonding energies and structures of gas phase heterodimers with water are evaluated using the resulting potential. For 32 homo and heterodimers, the association energy agrees with ab initio results to within 0.4 kcal/mol. The RMS deviation of the hydrogen bond distance from QM optimized geometry is less than 0.06 Å. In addition, liquid self diffusion and static dielectric constants computed from a molecular dynamics simulation are consistent with experimental values. The force field is also used to compute the solvation free energy of 27 compounds not included in the parametrization process, with a RMS error of 0.69 kcal/mol. The results obtained in this study suggest that the AMOEBA force field performs well across different environments and phases. The key algorithms involved in the electrostatic model and a protocol for developing parameters are detailed to facilitate extension to additional molecular systems.
Prediction of conformationally dependent atomic multipole moments in carbohydrates
Journal of Computational Chemistry, 2015
The conformational flexibility of carbohydrates is challenging within the field of computational chemistry. This flexibility causes the electron density to change, which leads to fluctuating atomic multipole moments. Quantum Chemical Topology (QCT) allows for the partitioning of an "atom in a molecule," thus localizing electron density to finite atomic domains, which permits the unambiguous evaluation of atomic multipole moments. By selecting an ensemble of physically realistic conformers of a chemical system, one evaluates the various multipole moments at defined points in configuration space. The subsequent implementation of the machine learning method kriging delivers the evaluation of an analytical function, which smoothly interpolates between these points. This allows for the prediction of atomic multipole moments at new points in conformational space, not trained for but within prediction range. In this work, we demonstrate that the carbohydrates erythrose and threose are amenable to the above methodology. We investigate how kriging models respond when the training ensemble incorporating multiple energy minima and their environment in conformational space. Additionally, we evaluate the gains in predictive capacity of our models as the size of the training ensemble increases. We believe this approach to be entirely novel within the field of carbohydrates. For a modest training set size of 600, more than 90% of the external test configurations have an error in the total (predicted) electrostatic energy (relative to ab initio) of maximum 1 kJ mol 21 for open chains and just over 90% an error of maximum 4 kJ mol 21 for rings. V
Multipolar electrostatics based on the Kriging machine learning method: an application to serine
Journal of Molecular Modeling, 2014
Next-generation force fields must incorporate improved electrostatic potentials in order to increase the reliability of their predictions. A crucial decision toward this goal is to abandon point charges in favour of multipole moments centered on nuclear sites. Here we compare the geometries generated by quantum topological multipole moments with those generated by four popular point charge models (TAFF, OPLS-AA, MMFF94x and PFROSST) for a hydrated serine. A main feature of this study is the dual comparison made, both at static level (geometry optimisation via energy minimisation) and at dynamic level (via molecular dynamics and radial/spatial distribution function analysis). At static level, multipolar electrostatics best reproduces the ab initio reference geometry. At dynamic level, multipolar electrostatics produces more structure than point charge electrostatics does, over the whole range. From our previous work on liquid water [Int. J. Quantum. Chem., 2004, 99, 685], where agreement with experiment only occurs when using multipole moments, we deduce that our predictions for hydrated serine will also be closer to experiment when using multipolar electrostatics. The spatial distribution function shows that only multipolar electrostatics shows pronounced structure at long range. Even at short range there are many regions where waters appear in the system governed by multipolar electrostatics but not in that governed by point charges.
Journal of cheminformatics, 2018
Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R = 0.87 vs. 0.66).