The Scaffold Tree − Visualization of the Scaffold Universe by Hierarchical Scaffold Classification (original) (raw)
Related papers
Scaffold Topologies. 2. Analysis of Chemical Databases
Journal of Chemical Information and Modeling, 2008
We have systematically enumerated graph representations of scaffold topologies for up to 8-ring molecules and 4-valence atoms, thus providing coverage of the lower portion of the chemical space of small molecules (Pollock et al. 1 ). Here, we examine scaffold topology distributions for several databases: ChemNavigator and PubChem for commercially available chemicals, the Dictionary of Natural Products, a set of 2,742 launched drugs, WOMBAT, a database of medicinal chemistry compounds, and two subsets of PubChem, "actives" and DSSTox comprising toxic substances. We also examined a virtual database of exhaustively enumerated small organic molecules, GDB, 2 and contrast the scaffold topology distribution from these collections to the complete coverage of up to 8-ring molecules. For reasons related, perhaps, to synthetic accessibility and complexity, scaffolds exhibiting 6 rings or more are poorly represented. Among all collections examined, PubChem has the greatest scaffold topological diversity, whereas GDB is the most limited. More than 50% of all entries (13,000,000+ actual and 13,000,000+ virtual compounds) exhibit only 8 distinct topologies, one of which is the non-scaffold topology that represents all treelike structures. However, most of the topologies are represented by a single or very small number of examples. Within topologies, we found that 3-way scaffold connections (3-nodes) are much more frequent compared to 4-way (4node) connections. Fused rings have a slightly higher frequency in biologically oriented databases. Scaffold topologies can be the first step toward an efficient coarse-grained classification scheme of the molecules found in chemical databases.
“Molecular Anatomy”: a new multi-dimensional hierarchical scaffold analysis tool
Journal of Cheminformatics
The scaffold representation is widely employed to classify bioactive compounds on the basis of common core structures or correlate compound classes with specific biological activities. In this paper, we present a novel approach called “Molecular Anatomy” as a flexible and unbiased molecular scaffold-based metrics to cluster large set of compounds. We introduce a set of nine molecular representations at different abstraction levels, combined with fragmentation rules, to define a multi-dimensional network of hierarchically interconnected molecular frameworks. We demonstrate that the introduction of a flexible scaffold definition and multiple pruning rules is an effective method to identify relevant chemical moieties. This approach allows to cluster together active molecules belonging to different molecular classes, capturing most of the structure activity information, in particular when libraries containing a huge number of singletons are analyzed. We also propose a procedure to deriv...
2022
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.
Corrigendum: Interactive exploration of chemical space with Scaffold Hunter
Nature Chemical Biology, 2009
We describe Scaffold Hunter, a highly interactive computerbased tool for navigation in chemical space that fosters intuitive recognition of complex structural relationships associated with bioactivity. The program reads compound structures and bioactivity data, generates compound scaffolds, correlates them in a hierarchical tree-like arrangement, and annotates them with bioactivity. Brachiation along tree branches from structurally complex to simple scaffolds allows identification of new ligand types. We provide proof of concept for pyruvate kinase.
Indirect Similarity Based Methods for Effective Scaffold-Hopping in Chemical Compounds
Journal of Chemical Information and Modeling, 2008
Methods that can screen large databases to retrieve a structurally diverse set of compounds with desirable bioactivity properties are critical in the drug discovery and development process. This paper presents a set of such methods that are designed to find compounds that are structurally different to a certain query compound while retaining its bioactivity properties (scaffold hops). These methods utilize various indirect ways of measuring the similarity between the query and a compound that take into account additional information beyond their structure-based similarities. The set of techniques that are presented capture these indirect similarities using approaches based on analyzing the similarity network formed by the query and the database compounds. Experimental evaluation shows that most of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general.
Structure-based classification and ontology in chemistry
Journal of cheminformatics, 2012
Background: Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.
PKOM: A tool for clustering, analysis and comparison of big chemical collections
Digital Signal Processing, 2016
We describe the algorithm underlying PKOM, a methodology for clustering, analysis and visualization of multi-dimensional data onto a two-dimensional map. PKOM is based on the mixture of two very popular methods that have been widely used by the pharmaceutical industry for the clustering of genomic or SAR (Structure Activity Relationship) chemical information. The first method at the origin of PKOM is SOM (Self-Organizing Maps), a clustering technique based on neural networks. The second method is TREE MAPS, a visualization method based on hierarchical clustering by dendrograms. We initially describe herein the two methods and the reasons why we have taken the best of both to merge them into PKOM. We then describe in detail the PKOM algorithm and its advantages compared to the two former. Examples are given on how to apply this kind of 2-D topological clustering technique to the organization of big pharmaceutical collections in practical cases.
Nucleic Acids Research
Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites,...