Daniel C Elton | National Institutes of Health (original) (raw)

Papers by Daniel C Elton

Research paper thumbnail of Automatic recognition of abdominal lymph nodes from clinical text

Proceedings of the 3rd Clinical Natural Language Processing Workshop

Research paper thumbnail of Exclusion Zone Phenomena in Water—A Critical Review of Experimental Findings and Theories

International Journal of Molecular Sciences

The existence of the exclusion zone (EZ), a layer of water in which plastic microspheres are repe... more The existence of the exclusion zone (EZ), a layer of water in which plastic microspheres are repelled from hydrophilic surfaces, has now been independently demonstrated by several groups. A better understanding of the mechanisms which generate EZs would help with understanding the possible importance of EZs in biology and in engineering applications such as filtration and microfluidics. Here we review the experimental evidence for EZ phenomena in water and the major theories that have been proposed. We review experimental results from birefringence, neutron radiography, nuclear magnetic resonance, and other studies. Pollack theorizes that water in the EZ exists has a different structure than bulk water, and that this accounts for the EZ. We present several alternative explanations for EZs and argue that Schurr’s theory based on diffusiophoresis presents a compelling alternative explanation for the core EZ phenomenon. Among other things, Schurr’s theory makes predictions about the gr...

Research paper thumbnail of Phonon Lifetimes and Thermal Conductivity of the Molecular Crystal α-RDX

MRS Advances

ABSTRACTThe heat transfer properties of the organic molecular crystal α-RDX were studied using th... more ABSTRACTThe heat transfer properties of the organic molecular crystal α-RDX were studied using three phonon scattering based thermal conductivity models. It was found that the widely used Peierls-Boltzmann model for thermal transport in crystalline materials breaks down for α-RDX. We show this breakdown is due to a large degree of anharmonicity that leads to a dominance of diffusive-like carriers. Despite being developed for disordered systems, the Allen-Feldman theory for thermal conductivity actually gives the best description of thermal transport. This is likely because diffusive carriers contribute to over 95% of the thermal conductivity in α-RDX. The dominance of diffusive carriers is larger than previously observed in other fully ordered crystalline systems. These results indicate that van der Waals bonded organic crystalline solids conduct heat in a manner more akin to amorphous materials than simple atomic crystals.

Research paper thumbnail of Deep learning for molecular design - a review of the state of the art

Molecular Systems Design & Engineering

In the space of only a few years, deep generative modeling has revolutionized how we think of art... more In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired...

Research paper thumbnail of Using a monomer potential energy surface to perform approximate path integral molecular dynamics simulation of ab-initio water at near-zero added cost

Physical Chemistry Chemical Physics

It is now established that nuclear quantum motion plays an important role in determining water&#3... more It is now established that nuclear quantum motion plays an important role in determining water's hydrogen bonding, structure, and dynamics. Such effects are important to include in density functional theory...

Research paper thumbnail of The origin of the Debye relaxation in liquid water and fitting the high frequency excess response

Physical chemistry chemical physics : PCCP, Jan 19, 2017

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1.5-2.0 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the spectrumfitter Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We pr...

Research paper thumbnail of Connexions between density and dielectric properties of water

Research paper thumbnail of The origin of the Debye relaxation in liquid water and fitting the high frequency excess response

Physical chemistry chemical physics : PCCP, Jan 19, 2017

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1.5-2.0 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the spectrumfitter Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We pr...

Research paper thumbnail of Applying machine learning techniques to predict the properties of energetic materials

Scientific Reports, 2018

We present a proof of concept that machine learning techniques can be used to predict the propert... more We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods-sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with ≈300 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights. During the past few decades, enormous resources have been invested in research efforts to discover new energetic materials with improved performance, thermodynamic stability, and safety. A key goal of these efforts has been to find replacements for a handful of energetics which have been used almost exclusively in the world's arsenals since World War II-HMX, RDX, TNT, PETN, and TATB 1. While hundreds of new energetic materials have been synthesized as a result of this research, many of which have remarkable properties, very few compounds have made it to industrial production. One exception is CL-20 2,3 , the synthesis of which came about as a result of development effort that lasted about 15 years 1. After its initial synthesis, the transition of CL-20 to industrial production took another 15 years 1. This time scale (20-40 years) from the initial start of a materials research effort until the successful application of a novel material is typical of what has been found in materials research more broadly. Currently, the development of new materials requires expensive and time consuming synthesis and characterization loops, with many synthesis experiments leading to dead ends and/or yielding little useful information. Therefore, computational screening and lead generation is critical to speeding up the pace of materials development. Traditionally screening has been done using either ad-hoc rules of thumb, which are usually limited in their domain of applicability, or by running large numbers of expensive quantum chemistry calculations which require significant supercomputing time. Machine learning (ML) from data holds the promise of allowing for rapid screening of materials at much lower computational cost. A properly trained ML model can make useful predictions about the properties of a candidate material in milliseconds rather than hours or days 4. Recently, machine learning has been shown to accelerate the discovery of new materials for dielectric polymers 5 , OLED displays 6 , and polymeric dispersants 7. In the realm of molecules, ML has been applied successfully to the prediction of atomization energies 8 , bond energies 9 , dielectric breakdown strength in polymers 10 , critical point properties of molecular liquids 11 , and exciton dynamics in photosynthetic complexes 12. In the materials science realm, ML has recently yielded predictions for dielectric polymers 5,10 , superconducting materials 13 , nickel-based superalloys 14 , elpasolite crystals 15 , perovskites 16 , nanostructures 17 , Heusler alloys 18 , and the ther-modynamic stabilities of half-Heusler compounds 19. In the pharmaceutical realm the use of ML has a longer history than in other fields of materials development, having first been used under the moniker of quantitative

Research paper thumbnail of Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora

The number of scientific journal articles and reports being published about energetic materials e... more The number of scientific journal articles and reports being published about energetic materials every year is growing exponentially, and therefore extracting relevant information and actionable insights from the latest research is becoming a considerable challenge. In this work we explore how techniques from natural language processing and machine learning can be used to automatically extract chemical insights from large collections of documents. We first describe how to download and process documents from a variety of sources - journal articles, conference proceedings (including NTREM), the US Patent & Trademark Office, and the Defense Technical Information Center archive on archive.org. We present a custom NLP pipeline which uses open source NLP tools to identify the names of chemical compounds and relates them to function words ("underwater", "rocket", "pyrotechnic") and property words ("elastomer", "non-toxic"). After explaining how word embeddings work we compare the utility of two popular word embeddings - word2vec and GloVe. Chemical-chemical and chemical-application relationships are obtained by doing computations with word vectors. We show that word embeddings capture latent information about energetic materials, so that related materials appear close together in the word embedding space.

Research paper thumbnail of The microscopic origin of the Debye relaxation in liquid water and fitting the high frequency excess response

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1-2 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the " spectrumfitter " Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We propose using the generalized Lydanne-Sachs-Teller (gLST) equation as a way of testing the physicality of model dielectric functions. Our gLST analysis indicates that fitting the excess dielectric response of water with secondary and tertiary Debye relaxations is problematic. We suggest that a distribution of Debye and oscillatory modes or truncated power-law is the correct way to fit the excess response. Our work is consistent with the recent theory of Popov et al. (2016) that Debye relaxation is due to the propagation of Bjerrum-like defects in the hydrogen bond network, similar to the mechanism in ice.

Research paper thumbnail of The hydrogen bond network of water supports propagating optical phonon-like modes

The local structure of liquid water as a function of temperature is a source of intense research.... more The local structure of liquid water as a function of temperature is a source of intense research. This structure is intimately linked to the dynamics of water molecules, which can be measured using Raman and infrared spectroscopies. The assignment of spectral peaks depends on whether they are collective modes or single molecule motions. Vibrational modes in liquids are usually considered to be associated to the motions of single molecules or small clusters. Using molecular dynamics simulations we find dispersive optical phonon-like modes in the librational and OH stretching bands. We argue that on subpicosecond time scales these modes propagate through water's hydrogen bond network over distances of up to two nanometers. In the long wavelength limit these optical modes exhibit longitudinal-transverse splitting, indicating the presence of coherent long range dipole-dipole interactions, as in ice. Our results indicate the dynamics of liquid water have more similarities to ice than previously thought.

Research paper thumbnail of Polar nanoregions in water -a study of the dielectric properties of TIP4P/2005, TIP4P/2005f and TTM3F

We present a critical comparison of the dielectric properties of three models of water-TIP4P/2005... more We present a critical comparison of the dielectric properties of three models of water-TIP4P/2005, TIP4P/2005f and TTM3F. Dipole spatial correlation is measured using the distance dependent Kirkwood function along with one dimensional and two dimensional dipole correlation functions. We find that the introduction of flexibility alone does not significantly affect dipole correlation and only affects ε(ω) at high frequencies. By contrast the introduction of polarizability increases dipole correlation and yields a more accurate ε(ω). Additionally the introduction of polarizability creates temperature dependence in the dipole moment even at fixed density, yielding a more accurate value for dε/dT compared to non-polarizable models. To better understand the physical origin of the dielectric properties of water we make analogies to the physics of polar nanoregions in relaxor ferroelectric materials. We show that ε(ω, T) and τ D (T) for water have striking similarities with relaxor ferroelectrics, a class of materials characterized by large frequency dispersion in ε(ω, T), Vogel-Fulcher-Tamann behaviour in τ D (T), and the existence of polar nanoregions.

Research paper thumbnail of Accurate estimation of third-order moments from turbulence measurements

Politano and Pouquet's law, a generalization of Kolmogorov's four-fifths law to incompressible MH... more Politano and Pouquet's law, a generalization of Kolmogorov's four-fifths law to incompressible MHD, makes it possible to measure the energy cascade rate in in-compressible MHD turbulence by means of third-order moments. In hydrodynamics, accurate measurement of third-order moments requires large amounts of data because the probability distributions of velocity-differences are nearly symmetric and the third-order moments are relatively small. Measurements of the energy cascade rate in solar wind turbulence have recently been performed for the first time, but without careful consideration of the accuracy or statistical uncertainty of the required third-order moments. This paper investigates the statistical convergence of third-order moments as a function of the sample size N. It is shown that the accuracy of the third-moment (δv) 3 depends on the number of correlation lengths spanned by the data set and a method of estimating the statistical uncertainty of the third-moment is developed. The technique is illustrated using both wind tunnel data and solar wind data.

Drafts by Daniel C Elton

Research paper thumbnail of Deep learning for molecular generation and optimization -a review of the state of the art

In the space of only a few years, deep generative modeling has revolutionized how we think of art... more In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmark-ing and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training. The average cost to bring a new drug to market is now well over one billion USD, 1 with an average time from discovery to market of 13 years. 2 Outside of pharmaceuticals the average time from discovery to commercial production can be even longer, for instance for energetic molecules it is 25 years. 3 A critical first step in molecular discovery is generating a pool of candidates for computational study or synthesis and characterization. This is a daunting task because the space of possible molecules is enormous-the number of potential drug-like compounds has been estimated to be between 10 23 and 10 60 , 4 while the number of all compounds that have been synthesized is on the order of 10 8. Heuristics, such as Lipin-ski's "rule of five" for pharmaceuticals 5 can help narrow the space of possibilities, but the task remains daunting. High throughput screening (HTS) 6 and high throughput virtual screening (HTVS) 7 techniques have made larger parts of chemical space accessible to computational and experimental study. Machine learning has been shown to be capable of yielding rapid and accurate property predictions for many properties of interest and is being integrated into screening pipelines, since it is orders of magnitude faster than traditional computational chemistry methods. 8 Techniques for the interpretation and "inversion" of a machine learning model can illuminate structure-property relations that have been learned by the model which can in turn be used to guide the design of new lead molecules. 9,10 However even with these new techniques bad leads still waste limited supercomputer and laboratory resources, so minimizing the number of bad leads generated at the start of the pipeline remains a) Electronic mail: daniel.elton@nih.gov a key priority. The focus of this review is on the use of deep learning techniques for the targeted generation of molecules and guided exploration of chemical space. We note that machine learning (and more broadly artificial intelligence) is having an impact on accelerating other parts of the chemical discovery pipeline as well, via machine learning accelerated ab-initio simulation, 8 machine learning based reaction prediction, 11,12 deep learning based synthesis planning, 13 and the development of high-throughput "self-driving" robotic laboratories. 14,15 Deep neural networks, which are often defined as networks with more than three layers, have been around for many decades but until recently were difficult to train and fell behind other techniques for classification and regression. By most accounts, the deep learning revolution in machine learning began in 2012, when deep neu-ral network based models began to win several different competitions for the first time. First came a demonstration by Cire¸sanCire¸san et al. of how deep neural networks could achieve near-human performance on the task of handwritten digit classification. 16 Next came groundbreaking work by Krizhevsky et al. which showed how deep convo-lutional networks achieved superior performance on the 2010 ImageNet image classification challenge. 17 Finally, around the same time in 2012, a multitask neural network developed by Dahl et al. won the "Merck Molecular Activity Challenge" to predict the molecular activities of molecules at 15 different sites in the body, beating out more traditional machine learning approaches such as boosted decision trees. 18 One of the key technical advances published that year and used by both Krizhevsky et al. and Dahl et al. was a novel regularization trick called "dropout".

Research paper thumbnail of Automatic recognition of abdominal lymph nodes from clinical text

Proceedings of the 3rd Clinical Natural Language Processing Workshop

Research paper thumbnail of Exclusion Zone Phenomena in Water—A Critical Review of Experimental Findings and Theories

International Journal of Molecular Sciences

The existence of the exclusion zone (EZ), a layer of water in which plastic microspheres are repe... more The existence of the exclusion zone (EZ), a layer of water in which plastic microspheres are repelled from hydrophilic surfaces, has now been independently demonstrated by several groups. A better understanding of the mechanisms which generate EZs would help with understanding the possible importance of EZs in biology and in engineering applications such as filtration and microfluidics. Here we review the experimental evidence for EZ phenomena in water and the major theories that have been proposed. We review experimental results from birefringence, neutron radiography, nuclear magnetic resonance, and other studies. Pollack theorizes that water in the EZ exists has a different structure than bulk water, and that this accounts for the EZ. We present several alternative explanations for EZs and argue that Schurr’s theory based on diffusiophoresis presents a compelling alternative explanation for the core EZ phenomenon. Among other things, Schurr’s theory makes predictions about the gr...

Research paper thumbnail of Phonon Lifetimes and Thermal Conductivity of the Molecular Crystal α-RDX

MRS Advances

ABSTRACTThe heat transfer properties of the organic molecular crystal α-RDX were studied using th... more ABSTRACTThe heat transfer properties of the organic molecular crystal α-RDX were studied using three phonon scattering based thermal conductivity models. It was found that the widely used Peierls-Boltzmann model for thermal transport in crystalline materials breaks down for α-RDX. We show this breakdown is due to a large degree of anharmonicity that leads to a dominance of diffusive-like carriers. Despite being developed for disordered systems, the Allen-Feldman theory for thermal conductivity actually gives the best description of thermal transport. This is likely because diffusive carriers contribute to over 95% of the thermal conductivity in α-RDX. The dominance of diffusive carriers is larger than previously observed in other fully ordered crystalline systems. These results indicate that van der Waals bonded organic crystalline solids conduct heat in a manner more akin to amorphous materials than simple atomic crystals.

Research paper thumbnail of Deep learning for molecular design - a review of the state of the art

Molecular Systems Design & Engineering

In the space of only a few years, deep generative modeling has revolutionized how we think of art... more In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired...

Research paper thumbnail of Using a monomer potential energy surface to perform approximate path integral molecular dynamics simulation of ab-initio water at near-zero added cost

Physical Chemistry Chemical Physics

It is now established that nuclear quantum motion plays an important role in determining water&#3... more It is now established that nuclear quantum motion plays an important role in determining water's hydrogen bonding, structure, and dynamics. Such effects are important to include in density functional theory...

Research paper thumbnail of The origin of the Debye relaxation in liquid water and fitting the high frequency excess response

Physical chemistry chemical physics : PCCP, Jan 19, 2017

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1.5-2.0 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the spectrumfitter Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We pr...

Research paper thumbnail of Connexions between density and dielectric properties of water

Research paper thumbnail of The origin of the Debye relaxation in liquid water and fitting the high frequency excess response

Physical chemistry chemical physics : PCCP, Jan 19, 2017

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1.5-2.0 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the spectrumfitter Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We pr...

Research paper thumbnail of Applying machine learning techniques to predict the properties of energetic materials

Scientific Reports, 2018

We present a proof of concept that machine learning techniques can be used to predict the propert... more We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods-sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with ≈300 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights. During the past few decades, enormous resources have been invested in research efforts to discover new energetic materials with improved performance, thermodynamic stability, and safety. A key goal of these efforts has been to find replacements for a handful of energetics which have been used almost exclusively in the world's arsenals since World War II-HMX, RDX, TNT, PETN, and TATB 1. While hundreds of new energetic materials have been synthesized as a result of this research, many of which have remarkable properties, very few compounds have made it to industrial production. One exception is CL-20 2,3 , the synthesis of which came about as a result of development effort that lasted about 15 years 1. After its initial synthesis, the transition of CL-20 to industrial production took another 15 years 1. This time scale (20-40 years) from the initial start of a materials research effort until the successful application of a novel material is typical of what has been found in materials research more broadly. Currently, the development of new materials requires expensive and time consuming synthesis and characterization loops, with many synthesis experiments leading to dead ends and/or yielding little useful information. Therefore, computational screening and lead generation is critical to speeding up the pace of materials development. Traditionally screening has been done using either ad-hoc rules of thumb, which are usually limited in their domain of applicability, or by running large numbers of expensive quantum chemistry calculations which require significant supercomputing time. Machine learning (ML) from data holds the promise of allowing for rapid screening of materials at much lower computational cost. A properly trained ML model can make useful predictions about the properties of a candidate material in milliseconds rather than hours or days 4. Recently, machine learning has been shown to accelerate the discovery of new materials for dielectric polymers 5 , OLED displays 6 , and polymeric dispersants 7. In the realm of molecules, ML has been applied successfully to the prediction of atomization energies 8 , bond energies 9 , dielectric breakdown strength in polymers 10 , critical point properties of molecular liquids 11 , and exciton dynamics in photosynthetic complexes 12. In the materials science realm, ML has recently yielded predictions for dielectric polymers 5,10 , superconducting materials 13 , nickel-based superalloys 14 , elpasolite crystals 15 , perovskites 16 , nanostructures 17 , Heusler alloys 18 , and the ther-modynamic stabilities of half-Heusler compounds 19. In the pharmaceutical realm the use of ML has a longer history than in other fields of materials development, having first been used under the moniker of quantitative

Research paper thumbnail of Using natural language processing techniques to extract information on the properties and functionalities of energetic materials from large text corpora

The number of scientific journal articles and reports being published about energetic materials e... more The number of scientific journal articles and reports being published about energetic materials every year is growing exponentially, and therefore extracting relevant information and actionable insights from the latest research is becoming a considerable challenge. In this work we explore how techniques from natural language processing and machine learning can be used to automatically extract chemical insights from large collections of documents. We first describe how to download and process documents from a variety of sources - journal articles, conference proceedings (including NTREM), the US Patent & Trademark Office, and the Defense Technical Information Center archive on archive.org. We present a custom NLP pipeline which uses open source NLP tools to identify the names of chemical compounds and relates them to function words ("underwater", "rocket", "pyrotechnic") and property words ("elastomer", "non-toxic"). After explaining how word embeddings work we compare the utility of two popular word embeddings - word2vec and GloVe. Chemical-chemical and chemical-application relationships are obtained by doing computations with word vectors. We show that word embeddings capture latent information about energetic materials, so that related materials appear close together in the word embedding space.

Research paper thumbnail of The microscopic origin of the Debye relaxation in liquid water and fitting the high frequency excess response

We critically review the literature on the Debye absorption peak of liquid water and the excess r... more We critically review the literature on the Debye absorption peak of liquid water and the excess response found on the high frequency side of the Debye peak. We find a lack of agreement on the microscopic phenomena underlying both of these features. To better understand the molecular origin of Debye peak we ran large scale molecular dynamics simulations and performed several different distance-dependent decompositions of the low frequency dielectric spectra, finding that it involves processes that take place on scales of 1-2 nm. We also calculated the k-dependence of the Debye relaxation, finding it to be highly dispersive. These findings are inconsistent with models that relate Debye relaxation to local processes such as the rotation/translation of molecules after H-bond breaking. We introduce the " spectrumfitter " Python package for fitting dielectric spectra and analyze different ways of fitting the high frequency excess, such as including one or two additional Debye peaks. We propose using the generalized Lydanne-Sachs-Teller (gLST) equation as a way of testing the physicality of model dielectric functions. Our gLST analysis indicates that fitting the excess dielectric response of water with secondary and tertiary Debye relaxations is problematic. We suggest that a distribution of Debye and oscillatory modes or truncated power-law is the correct way to fit the excess response. Our work is consistent with the recent theory of Popov et al. (2016) that Debye relaxation is due to the propagation of Bjerrum-like defects in the hydrogen bond network, similar to the mechanism in ice.

Research paper thumbnail of The hydrogen bond network of water supports propagating optical phonon-like modes

The local structure of liquid water as a function of temperature is a source of intense research.... more The local structure of liquid water as a function of temperature is a source of intense research. This structure is intimately linked to the dynamics of water molecules, which can be measured using Raman and infrared spectroscopies. The assignment of spectral peaks depends on whether they are collective modes or single molecule motions. Vibrational modes in liquids are usually considered to be associated to the motions of single molecules or small clusters. Using molecular dynamics simulations we find dispersive optical phonon-like modes in the librational and OH stretching bands. We argue that on subpicosecond time scales these modes propagate through water's hydrogen bond network over distances of up to two nanometers. In the long wavelength limit these optical modes exhibit longitudinal-transverse splitting, indicating the presence of coherent long range dipole-dipole interactions, as in ice. Our results indicate the dynamics of liquid water have more similarities to ice than previously thought.

Research paper thumbnail of Polar nanoregions in water -a study of the dielectric properties of TIP4P/2005, TIP4P/2005f and TTM3F

We present a critical comparison of the dielectric properties of three models of water-TIP4P/2005... more We present a critical comparison of the dielectric properties of three models of water-TIP4P/2005, TIP4P/2005f and TTM3F. Dipole spatial correlation is measured using the distance dependent Kirkwood function along with one dimensional and two dimensional dipole correlation functions. We find that the introduction of flexibility alone does not significantly affect dipole correlation and only affects ε(ω) at high frequencies. By contrast the introduction of polarizability increases dipole correlation and yields a more accurate ε(ω). Additionally the introduction of polarizability creates temperature dependence in the dipole moment even at fixed density, yielding a more accurate value for dε/dT compared to non-polarizable models. To better understand the physical origin of the dielectric properties of water we make analogies to the physics of polar nanoregions in relaxor ferroelectric materials. We show that ε(ω, T) and τ D (T) for water have striking similarities with relaxor ferroelectrics, a class of materials characterized by large frequency dispersion in ε(ω, T), Vogel-Fulcher-Tamann behaviour in τ D (T), and the existence of polar nanoregions.

Research paper thumbnail of Accurate estimation of third-order moments from turbulence measurements

Politano and Pouquet's law, a generalization of Kolmogorov's four-fifths law to incompressible MH... more Politano and Pouquet's law, a generalization of Kolmogorov's four-fifths law to incompressible MHD, makes it possible to measure the energy cascade rate in in-compressible MHD turbulence by means of third-order moments. In hydrodynamics, accurate measurement of third-order moments requires large amounts of data because the probability distributions of velocity-differences are nearly symmetric and the third-order moments are relatively small. Measurements of the energy cascade rate in solar wind turbulence have recently been performed for the first time, but without careful consideration of the accuracy or statistical uncertainty of the required third-order moments. This paper investigates the statistical convergence of third-order moments as a function of the sample size N. It is shown that the accuracy of the third-moment (δv) 3 depends on the number of correlation lengths spanned by the data set and a method of estimating the statistical uncertainty of the third-moment is developed. The technique is illustrated using both wind tunnel data and solar wind data.

Research paper thumbnail of Deep learning for molecular generation and optimization -a review of the state of the art

In the space of only a few years, deep generative modeling has revolutionized how we think of art... more In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmark-ing and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training. The average cost to bring a new drug to market is now well over one billion USD, 1 with an average time from discovery to market of 13 years. 2 Outside of pharmaceuticals the average time from discovery to commercial production can be even longer, for instance for energetic molecules it is 25 years. 3 A critical first step in molecular discovery is generating a pool of candidates for computational study or synthesis and characterization. This is a daunting task because the space of possible molecules is enormous-the number of potential drug-like compounds has been estimated to be between 10 23 and 10 60 , 4 while the number of all compounds that have been synthesized is on the order of 10 8. Heuristics, such as Lipin-ski's "rule of five" for pharmaceuticals 5 can help narrow the space of possibilities, but the task remains daunting. High throughput screening (HTS) 6 and high throughput virtual screening (HTVS) 7 techniques have made larger parts of chemical space accessible to computational and experimental study. Machine learning has been shown to be capable of yielding rapid and accurate property predictions for many properties of interest and is being integrated into screening pipelines, since it is orders of magnitude faster than traditional computational chemistry methods. 8 Techniques for the interpretation and "inversion" of a machine learning model can illuminate structure-property relations that have been learned by the model which can in turn be used to guide the design of new lead molecules. 9,10 However even with these new techniques bad leads still waste limited supercomputer and laboratory resources, so minimizing the number of bad leads generated at the start of the pipeline remains a) Electronic mail: daniel.elton@nih.gov a key priority. The focus of this review is on the use of deep learning techniques for the targeted generation of molecules and guided exploration of chemical space. We note that machine learning (and more broadly artificial intelligence) is having an impact on accelerating other parts of the chemical discovery pipeline as well, via machine learning accelerated ab-initio simulation, 8 machine learning based reaction prediction, 11,12 deep learning based synthesis planning, 13 and the development of high-throughput "self-driving" robotic laboratories. 14,15 Deep neural networks, which are often defined as networks with more than three layers, have been around for many decades but until recently were difficult to train and fell behind other techniques for classification and regression. By most accounts, the deep learning revolution in machine learning began in 2012, when deep neu-ral network based models began to win several different competitions for the first time. First came a demonstration by Cire¸sanCire¸san et al. of how deep neural networks could achieve near-human performance on the task of handwritten digit classification. 16 Next came groundbreaking work by Krizhevsky et al. which showed how deep convo-lutional networks achieved superior performance on the 2010 ImageNet image classification challenge. 17 Finally, around the same time in 2012, a multitask neural network developed by Dahl et al. won the "Merck Molecular Activity Challenge" to predict the molecular activities of molecules at 15 different sites in the body, beating out more traditional machine learning approaches such as boosted decision trees. 18 One of the key technical advances published that year and used by both Krizhevsky et al. and Dahl et al. was a novel regularization trick called "dropout".