MoSS: A Program for Molecular Substructure Mining (original) (raw)
Related papers
Molecular Substructure Mining Approaches for Computer-Aided Drug Discovery: A Review
Proceedings of ITAB, 2006
Substructure mining is a well-established technique used frequently in drug discovery. Its aim is to discover and characterize interesting 2D substructures present in chemical datasets. The popularity of the approach owes a lot to the success of the structure-activity relationship practice, which states that biological properties of molecules are a result of molecular structure, and to expert medicinal chemists who tend to view, organize and treat chemical compounds as a collection of their substructural parts. Several substructure mining algorithms have been developed over the years to accommodate the needs of an ever changing drug discovery process. This paper reviews the most important of these algorithms and highlights some of their applications. Emphasis is placed on the recent developments in the field.
Mining molecular fragments: Finding relevant substructures of molecules
2002
We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute's HIV-screening dataset.
User Assisted Substructure Extraction in Molecular Data Mining
Lecture Notes in Computer Science, 2008
In molecular fragments mining, scientists use both manual techniques and pure computer based methods. In this paper, we propose a novel molecular fragment mining approach that incorporates interactive user assistance to speed up and increase the success rates in traditional fragment mining processes. The proposed approach visualizes 3D molecular data in 2D form that can be easily interpreted by a human expert who evaluates and filters the 2D molecular images manually. The proposed approach differs from others in literature as it does not search substructures including specific atoms like graph mining methods do. Instead, user assisted approach highlights significant substructures with specific properties and topologies graphically. Initial experiments indicate that by the use of user assisted approach, active and inactive fragments of compounds are quickly determined for drug design with high success rates.
SUBGRAPH RELATIVE FREQUENCY APPROACH FOR EXTRACTING INTERESTING SUBSTRUCTURES FROM MOLECULAR DATA
IAEME PUBLICATION, 2013
The classification of unseen molecule in molecular data is done by taking the substructures of the molecule. The mining of interesting substructures in molecular data for classification contain subgraphs that are characterized by different classes. In this paper, authors suggest a Subgraph Relative Frequency (SRF) method that screens each frequency subgraph to determine whether the substructure that occurs frequently is an interesting one or not. SRF thus discovers interesting subgraphs for each of these classes which are calculated using relative frequencies. To classify an unknown molecule, SRF first finds the subgraph of the molecule and calculates the interestingness of the sub-graph for each class, based on the weight. The performance of SRF is compared against MISMOC and is found to be just as accurate as MISMOC. MISMOC approach requires probability calculations to find the absolute frequency, thus the complexity is increased. The proposed method decreases the above complexity by just calculating the relative frequency to determine the interestingness. The method was experimented on a small predefined molecular data and the analyses of the result were done. Thus the performance of the proposed SRF approach was found satisfactory and efficient.
Mining statistically significant molecular substructures for efficient molecular classification
Journal of chemical information and …, 2009
The increased availability of large repositories of chemical compounds has created new challenges in designing efficient molecular querying and mining systems. Molecular classification is an important problem in drug development where libraries of chemical compounds are screened and molecules with the highest probability of success against a given target are selected. We have developed a technique called GraphSig to mine significantly over-represented molecular substructures in a given class of molecules. GraphSig successfully overcomes the scalability bottleneck of mining patterns at a low frequency. Patterns mined by GraphSig display correlation with biological activities and serve as an excellent platform on which to build molecular analysis tools. The potential of GraphSig as a chemical descriptor is explored, and support vector machines are used to classify molecules described by patterns mined using GraphSig. Furthermore, the over-represented patterns are more informative than features generated exhaustively by traditional fingerprints; this has potential in providing scaffolds and lead generation. Extensive experiments are carried out to evaluate the proposed techniques, and empirical results show promising performance in terms of classification quality. An implementation of the algorithm is available free for academic use at
Advanced pruning strategies to speed up mining closed molecular fragments
2004
In recent years several algorithms for mining frequent subgraphs in graph databases have been proposed, with a major application area being the discovery of frequent substructures of biomolecules. Unfortunately, most of these algorithms still struggle with fairly long execution times if larger substructures or molecular fragments are desired. In this paper we describe two advanced pruning strategiesequivalent sibling pruning and perfect extension pruningthat can be used to speed up the MoFa algorithm (introduced in [2]) in the search for closed molecular fragments, as we demonstrate with experiments on the NCI's HIV database.
Transputer implementations of chemical substructure searching algorithms
1988
Two chemical substructure searching algorithms, the relaxation algorithm and the set reduction algorithm, are introduced and described. Transputer based serial implementations of both are compared for performance; the relaxation algorithm is shown to be both more effective and more efficient. Strategies are discussed for multi-transputer implementations of the relaxation algorithm. Experimental results show that near-linear speedups are obtained with networks containing up to 21 transputers.
High performance subgraph mining in molecular compounds
2005
Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-topeer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute's HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.