User Assisted Substructure Extraction in Molecular Data Mining (original) (raw)

Molecular Substructure Mining Approaches for Computer-Aided Drug Discovery: A Review

Proceedings of ITAB, 2006

Substructure mining is a well-established technique used frequently in drug discovery. Its aim is to discover and characterize interesting 2D substructures present in chemical datasets. The popularity of the approach owes a lot to the success of the structure-activity relationship practice, which states that biological properties of molecules are a result of molecular structure, and to expert medicinal chemists who tend to view, organize and treat chemical compounds as a collection of their substructural parts. Several substructure mining algorithms have been developed over the years to accommodate the needs of an ever changing drug discovery process. This paper reviews the most important of these algorithms and highlights some of their applications. Emphasis is placed on the recent developments in the field.

Molecular Fragment Mining for Drug Discovery

Lecture Notes in Computer Science, 2005

The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules in order to find structural properties of molecules that determine whether a molecule will be active or inactive, so that future chemical tests can be focused on the most promising candidates. A promising approach to this task was presented in [2]: an algorithm for finding molecular fragments that discriminate between active and inactive molecules. In this paper we review this approach as well as two extensions: a special treatment of rings and a method to find fragments with wildcards based on chemical expert knowledge.

Mining molecular fragments: Finding relevant substructures of molecules

2002

We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute's HIV-screening dataset.

Interactive Data Mining for Molecular Graphs

Journal of Automated Methods & Management in Chemistry, 2009

Designing new medical drugs for a specific disease requires extensive analysis of many molecules that have an activity for the disease. The main goal of these extensive analyses is to discover substructures (fragments) that account for the activity of these molecules. Once they are discovered, these fragments are used to understand the structure of new drugs and design new medicines for the disease. In this paper, we propose an interactive approach for visual molecule mining to discover fragments of molecules that are responsible for the desired activity with respect to a specific disease. Our approach visualizes molecular data in a form that can be interpreted by a human expert. Using a pipelining structure, it enables experts to contribute to the solution with their expertise at different levels. In order to derive desired fragments, it combines histogram-based filtering and clustering methods in a novel way. This combination enables a flexible determination of frequent fragments that repeat in molecules exactly or with some variations.

MoSS: A Program for Molecular Substructure Mining

Molecular substructure mining is currently an intensively studied research area. In this paper we present an implementation of an algorithm for finding frequent substructures in a set of molecules, which may also be used to find substructures that discriminate well between a focus and a complement group. In addition to the basic algorithm, we discuss advanced pruning techniques, demonstrating their effectiveness with experiments on two publicly available molecular data sets, and briefly mention some other extensions.

Mining statistically significant molecular substructures for efficient molecular classification

Journal of chemical information and …, 2009

The increased availability of large repositories of chemical compounds has created new challenges in designing efficient molecular querying and mining systems. Molecular classification is an important problem in drug development where libraries of chemical compounds are screened and molecules with the highest probability of success against a given target are selected. We have developed a technique called GraphSig to mine significantly over-represented molecular substructures in a given class of molecules. GraphSig successfully overcomes the scalability bottleneck of mining patterns at a low frequency. Patterns mined by GraphSig display correlation with biological activities and serve as an excellent platform on which to build molecular analysis tools. The potential of GraphSig as a chemical descriptor is explored, and support vector machines are used to classify molecules described by patterns mined using GraphSig. Furthermore, the over-represented patterns are more informative than features generated exhaustively by traditional fingerprints; this has potential in providing scaffolds and lead generation. Extensive experiments are carried out to evaluate the proposed techniques, and empirical results show promising performance in terms of classification quality. An implementation of the algorithm is available free for academic use at

Automated Discovery of Active Motifs In Three Dimensional Molecules

Proceedings of the 3rd …, 1997

In this paper 1 we present a method for discovering approximately common motifs (also known as active motifs) in three dimensional (3D) molecules. Each node in a molecule is represented by a 3D point in the Euclidean Space and each edge is represented by an undirected line segment connecting two nodes in the molecule. Motifs are rigid substructures which may occur in a molecule after allowing for an arbitrary number of rotations and translations as well as a small number (specified by the user) of node insert/delete operations in the motifs or the molecule. (We call this "approximate occurrence.") The proposed method combines the geometric hashing technique and block detection algorithms for undirected graphs. To demonstrate the utility of our algorithms, we discuss their applications to classifying three families of molecules pertaining to antibacterial sulfa drugs, anti-anxiety agents (benzodiazepines) and antiadrenergic agents (β receptors). Experimental results indicate the good performance of our algorithms and the high quality of the discovered motifs.

A Family of Ring System-Based Structural Fragments for Use in Structure−Activity Studies: Database Mining and Recursive Partitioning

Journal of Chemical Information and Modeling, 2006

In earlier work from our laboratory, we have described the use of the ring system and ring scaffold as descriptors. We showed that these descriptors could be used for fast compound clustering, novelty determination, compound acquisition, and combinatorial library design. Here we extend the concept to a whole family of structural descriptors with the ring system as the centerpiece. We show how this simple idea can be used to build powerful search tools for mining chemical databases in useful ways. We have also built recursive partition trees using these fragments as descriptors. We will discuss how these trees can help in analyzing complex structure-activity data.

Graph mining: procedure, application to drug discovery and recent advances

Drug Discovery Today, 2013

Combinatorial chemistry has generated chemical libraries and databases with a huge number of chemical compounds, which include prospective drugs. Chemical structures of compounds can be molecular graphs, to which a variety of graph-based techniques in computer science, specifically graph mining, can be applied. The most basic way for analyzing molecular graphs is using structural fragments, so-called subgraphs in graph theory. The mainstream technique in graph mining is frequent subgraph mining, by which we can retrieve essential subgraphs in given molecular graphs. In this article we explain the idea and procedure of mining frequent subgraphs from given molecular graphs, raising some real applications, and we describe the recent advances of graph mining.

Finding discriminative molecular fragments

2003

The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules with the aim to build a classifier that predicts whether a novel molecule will be active or inactive, so that future chemical tests can be focused on the most promising candidates. In [1] an algorithm for constructing such a classifier was proposed that uses molecular fragments to discriminate between active and inactive molecules. In this paper we review this approach and present two extensions: A special treatment of rings and a method that finds fragments with wildcards based on chemical expert knowledge.