Distributed representations of atoms and materials for machine learning (original) (raw)
Related papers
Learning atoms for materials discovery
Proceedings of the National Academy of Sciences of the United States of America, 2018
Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which d...
Crystal structure representations for machine learning models of formation energies
International Journal of Quantum Chemistry, 2015
We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix by using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a data set of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) 0.37 eV/atom for the respective representations.
Machine learning of molecular electronic properties in chemical compound space
New Journal of Physics, 2013
The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable highthroughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic 7
Journal of Applied Physics, 2022
Machine Learning (ML) techniques are revolutionizing the way to perform efficient materials modeling. Nevertheless, not all the ML approaches allow for the understanding of microscopic mechanisms at play in different phenomena. To address the latter aspect, we propose a combinatorial machine-learning approach to obtain physical formulas based on simple and easily-accessible ingredients, such as atomic properties. The latter are used to build materials features that are finally employed, through Linear Regression, to predict the energetic stability of semiconducting binary compounds with respect to zincblende and rocksalt crystal structures. The adopted models are trained using dataset built from firstprinciples calculations. Our results show that already one-dimensional (1D) formulas well describe the energetics; a simple grid-search optimization of the automatically-obtained 1D-formulas enhances the prediction performances at a very small computational cost. In addition, our approach allows to highlight the role of the different atomic properties involved in the formulas. The computed formulas clearly indicate that "spatial" atomic properties (i.e. radii indicating maximum probability densities for s, p, d electronic shells) drive the stabilization of one crystal structure with respect to the other, suggesting the major relevance of the radius associated to the p-shell of the cation species.
ML4Chem: A Machine Learning Package for Chemistry and Materials Science
ML4Chem is an open-source machine learning library for chemistry and materials science. It provides an extendable platform to develop and deploy machine learning models and pipelines and is targeted to the non-expert and expert users. ML4Chem follows user-experience design and offers the needed tools to go from data preparation to inference. Here we introduce its atomistic module for the implementation, deployment, and reproducibility of atom-centered models. This module is composed of six core building blocks: data, featurization, models, model optimization, inference, and visualization. We present their functionality and ease of use with demonstrations utilizing neural networks and kernel ridge regression algorithms.
Annual Review of Materials Research
Machine learning, applied to chemical and materials data, is transforming the field of materials discovery and design, yet significant work is still required to fully take advantage of machine learning algorithms, tools, and methods. Here, we review the accomplishments to date of the community and assess the maturity of state-of-the-art, data-intensive research activities that combine perspectives from materials science and chemistry. We focus on three major themes—learning to see, learning to estimate, and learning to search materials—to show how advanced computational learning technologies are rapidly and successfully used to solve materials and chemistry problems. Additionally, we discuss a clear path toward a future where data-driven approaches to materials discovery and design are standard practice.
Opportunities and Challenges for Machine Learning in Materials Science
Annual Review of Materials Research, 2020
Advances in machine learning have impacted myriad areas of materials science, such as the discovery of novel materials and the improvement of molecular simulations, with likely many more important developments to come. Given the rapid changes in this field, it is challenging to understand both the breadth of opportunities and the best practices for their use. In this review, we address aspects of both problems by providing an overview of the areas in which machine learning has recently had significant impact in materials science, and then we provide a more detailed discussion on determining the accuracy and domain of applicability of some common types of machine learning models. Finally, we discuss some opportunities and challenges for the materials community to fully utilize the capabilities of machine learning.
The Journal of Chemical Physics
One endeavor of modern physical chemistry is to use bottom-up approaches to design materials and drugs with desired properties. Here, we introduce an atomistic structure learning algorithm (ASLA) that utilizes a convolutional neural network to build 2D structures and planar compounds atom by atom. The algorithm takes no prior data or knowledge on atomic interactions but inquires a first-principles quantum mechanical program for thermodynamical stability. Using reinforcement learning, the algorithm accumulates knowledge of chemical compound space for a given number and type of atoms and stores this in the neural network, ultimately learning the blueprint for the optimal structural arrangement of the atoms. ASLA is demonstrated to work on diverse problems, including grain boundaries in graphene sheets, organic compound formation, and a surface oxide structure.
Chemical Bond-Based Representation of Materials
arXiv (Cornell University), 2017
This paper introduces a new representation method that is mainly based on chemical bonds among atoms in materials. Each chemical bond and its surrounded atoms are considered as a unified unit or a local structure that is expected to reflect a part of materials' nature. First, a material is separated into local structures; and then represented as matrices, each of which is computed by using information about the corresponding chemical bond as well as orbital-field matrices of two related atoms. After that, all local structures of the material are utilized by using the statistics point of view. In the experiment, the new method was applied into a materials informatics application that aims at predicting atomization energies using QM7 data set. The results of the experiment show that the new method is more effective than two state-of-the-art representation methods in most of the cases. Recently, the development of materials informatics [1, 23], known as a combination of materials science and data science, has opened up a new opportunity for accelerating the discovery of new materials knowledge. Regarding the literature, data science [6] is a field of study that employs a wide range of data-driven techniques from a large number of research fields, such as applied mathematics, statistics, computational science, information science, and computer science, in order to understand and analyze data. In materials informatics, data-driven techniques are applied into existing materials data for the purpose of automatically discovering new materials knowledge such as hidden features, hidden chemical and new physical rules, and new patterns [10, 11, 25]. Remarkablely, materials informatics is expected not only to provide foundations for a new paradigm of materials descovery [22], but also to be the next generation of exploring new materials [27].
DScribe: Library of descriptors for machine learning in materials science
Computer Physics Communications
DScribe is a software package for machine learning that provides popular feature transformations ("descriptors") for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0.