Structure-based classification and ontology in chemistry (original) (raw)

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Journal of Cheminformatics

Chemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial...

Ontologies in Medicinal Chemistry: Current Status and Future Challenges

Current Topics in Medicinal Chemistry, 2013

Recent years have seen a dramatic increase in the amount and availability of data in the diverse areas of medicinal chemistry, making it possible to achieve significant advances in fields such as the design, synthesis and biological evaluation of compounds. However, with this data explosion, the storage, management and analysis of available data to extract relevant information has become even a more complex task that offers challenging research issues to Artificial Intelligence (AI) scientists. Ontologies have emerged in AI as a key tool to formally represent and semantically organize aspects of the real world. Beyond glossaries or thesauri, ontologies facilitate communication between experts and allow the application of computational techniques to extract useful information from available data. In medicinal chemistry, multiple ontologies have been developed during the last years which contain knowledge about chemical compounds and processes of synthesis of pharmaceutical products. This article reviews the principal standards and ontologies in medicinal chemistry, analyzes their main applications and suggests future directions.

ChEBI: a database and ontology for chemical entities of biological interest

Nucleic Acids Research, 2008

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In addition to molecular entities, ChEBI contains groups (parts of molecular entities) and classes of entities. ChEBI includes an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. ChEBI is available online at http://www.ebi.ac.uk/chebi/

Semantic access to chemistry data with the ChEBI ontology and web services

The Chemical Entities of Biological Interest (ChEBI) ontology is an ontology of chemical entities and their roles, being developed at the European Bioinformatics Institute (EBI). Recent developments include a submission tool for direct user submissions and enhancements to the search facilities available by web services.

ChemFOnt: the chemical functional ontology resource

Nucleic Acids Research

The Chemical Functional Ontology (ChemFOnt), located at https://www.chemfont.ca, is a hierarchical, OWL-compatible ontology describing the functions and actions of >341 000 biologically important chemicals. These include primary metabolites, secondary metabolites, natural products, food chemicals, synthetic food additives, drugs, herbicides, pesticides and environmental chemicals. ChemFOnt is a FAIR-compliant resource intended to bring the same rigor, standardization and formal structure to the terms and terminology used in biochemistry, food chemistry and environmental chemistry as the gene ontology (GO) has brought to molecular biology. ChemFOnt is available as both a freely accessible, web-enabled database and a downloadable Web Ontology Language (OWL) file. Users may download and deploy ChemFOnt within their own chemical databases or integrate ChemFOnt into their own analytical software to generate machine readable relationships that can be used to make new inferences, enrich...

Investigation of classification methods for the prediction of activity in diverse chemical libraries

Journal of computer-aided molecular design, 1999

Classification methods based on linear discriminant analysis, recursive partitioning, and hierarchical agglomerative clustering are examined for their ability to separate active and inactive compounds in a diverse chemical database. Topology-based descriptions of chemical structure from the Molconn-X and ISIS programs are used in conjunction with these classification techniques to identify ACE inhibitors, beta-adrenergic antagonists, and H2 receptor antagonists. Overall, discriminant analysis misclassifies the smallest number of active compounds, while recursive partitioning yields the lowest rate of misclassification among inactives. Binary structural keys from the ISIS package are found to generally outperform the whole-molecule Molconn-X descriptors, especially for identification of inactive compounds. For all targets and classification methods, sensitivity toward active compounds is increased by making repetitive classification using training sets that contain equal numbers of a...

Chemical vocabularies and ontologies for bioinformatics

Proceedings of the 2003 International Chemical Information Conference, 2003

The diversity of objects and concepts in biological chemistry can be reflected in the number of ways used to describe an ‘elementary’ biochemical event such as enzymatic reaction. The terminology used in publications or biological databases is often a mixture of terms borrowed from widely different or even contradictory classifications. The ever-growing knowledge cannot be processed meaningfully (e.g. efficiently and correctly referenced in biological databases) without organisation, from controlled vocabularies to dictionaries and thesauri to taxonomies and formal ontologies. Ontology of some domain of knowledge is a controlled vocabulary of terms with defined logical relationships to each other. The unique types of relationships between terms have to be included in biochemical ontologies. The relevance of chemical thesauri and ontologies to bioinformatics is illustrated by current resources and projects at the European Bioinformatics Institute, such as IntEnz (Enzyme Nomenclature), COMe (the bioinorganic motif database) and the IUPHAR Receptor Database.

Chemical Informatics and the Drug Discovery Knowledge Pyramid

Bentham Science Publishers

The magnitude of the challenges in preclinical drug discovery is evident in the large amount of capital invested in such efforts in pursuit of a small static number of eventually successful marketable therapeutics. An explosion in the availability of potentially drug-like compounds and chemical biology data on these molecules can provide us with the means to improve the eventual success rates for compounds being considered at the preclinical level, but only if the community is able to access available information in an efficient and meaningful way. Thus, chemical database resources are critical to any serious drug discovery effort. This paper explores the basic principles underlying the development and implementation of chemical databases, and examines key issues of how molecular information may be encoded within these databases so as to enhance the likelihood that users will be able to extract meaningful information from data queries. In addition to a broad survey of conventional data representation and query strategies, key enabling technologies such as new context-sensitive chemical similarity measures and chemical cartridges are examined, with recommendations on how such resources may be integrated into a practical database environment

An Upper-Level Ontology for Chemistry

2008

Chemical entities are the foundation of biochemistry and biology, but until now there have been few coherent attempts to produce a top-level ontology for chemistry to connect ontological descriptions of reality at the molecular level, such as ChEBI, with upper-level ontologies such as BFO, or indeed familiar laboratoryscale concepts such as mixtures. We work out relationships between chemical types that are compatible with the OBO Relation Ontology, describe macroscopic chemical systems in terms of grains and collectives, and propose a top-level ontology for chemically-relevant continuants and discuss it in relation to BFO and BioTop.

Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology

BMC Genomics, 2013

Background: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. Results: We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.