Interactive Data Mining for Molecular Graphs (original) (raw)
Related papers
User Assisted Substructure Extraction in Molecular Data Mining
Lecture Notes in Computer Science, 2008
In molecular fragments mining, scientists use both manual techniques and pure computer based methods. In this paper, we propose a novel molecular fragment mining approach that incorporates interactive user assistance to speed up and increase the success rates in traditional fragment mining processes. The proposed approach visualizes 3D molecular data in 2D form that can be easily interpreted by a human expert who evaluates and filters the 2D molecular images manually. The proposed approach differs from others in literature as it does not search substructures including specific atoms like graph mining methods do. Instead, user assisted approach highlights significant substructures with specific properties and topologies graphically. Initial experiments indicate that by the use of user assisted approach, active and inactive fragments of compounds are quickly determined for drug design with high success rates.
Visualization and Grouping of Graph Patterns in Molecular Databases
Research and Development in Intelligent Systems XXIV, 2008
Mining subgraphs is an area of research where we have a given set of graphs, and we search for (connected) subgraphs contained in these graphs. In this paper we focus on the analysis of graph patterns where the graphs are molecules and the subgraphs are patterns. In the analysis of fragments one is interested in the molecules in which the patterns occur. This data can be very extensive and in this paper we introduce a technique of making it better available using visualization. The user does not have to browse all the occurrences in search of patterns occurring in the same molecules; instead the user can directly see which subgraphs are of interest.
Clustering approaches for visual knowledge exploration in molecular interaction networks
BMC Bioinformatics, 2018
Background: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. Results: We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. Conclusions: In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration.
InfVis − Platform-Independent Visual Data Mining of Multidimensional Chemical Data Sets
Journal of Chemical Information and Modeling, 2005
The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.
Information Visualization with Text Data Mining for Knowledge Discovery Tools in Bioinformatics
Key Engineering Materials - KEY ENG MAT, 2005
An abundant amount of information is produced in the digital domain, and an effective information extraction (IE) system is required to surf through this sea of information. In this paper, we show that an interactive visualization system works effectively to complement an IE system. In particular, three-dimensional (3D) visualization can turn a data-centric system into a user-centric one by facilitating the human visual system as a powerful pattern recognizer to become a part of the IE cycle. Because information as data is multidimensional in nature, 2D visualization has been the preferred mode. However, we argue that the extra dimension available for us in a 3D mode provides a valuable space where we can pack an orthogonal aspect of the available information. As for candidates of this orthogonal information, we have considered the following two aspects: 1) abstraction of the unstructured source data, and 2) the history line of the discovery process. We have applied our proposal to text data mining in bioinformatics. Through case studies of data mining for molecular interaction in the yeast and mitogen-activated protein kinase pathways, we demonstrate the possibility of interpreting the extracted results with a 3D visualization system. * Supported by the Korea Science and Engineering Foundation through AITrc.
Discovering Similar Frequent Fragments in Drug Design: A Clustering-Based Approach
2009
Designing new medical drugs requires analysis of many molecules that have an activity for a specific disease. The main goal of these extensive analyses is to discover active substructures (fragment) that account for the activity of these molecules. Once these fragments are discovered, they are used to synthesize new drugs for the disease. Current approaches for discovering active fragments are heavily based on the frequent subgraph mining algorithms that search for exactly repeating morphological substructures within a graph database. However, in this paper, we argue that, in many settings, active fragments do not repeat exactly the same but with some fine differences. This prevents frequent subgraph mining approaches to discover these fragments. In this work, we propose a clustering based approach to discover similar substructures that repeat in active molecules in a molecular graph database. We have experimentally compared our approach with the current methods using real-life and synthesized datasets. Our experiments show that the proposed approach is successful in determining fragments that are responsible for the desired biological activity and unlike other methods it can determine frequent substructures that repeat in the graphs with some fine differences.
A graphic tool for curating molecular interaction networks from the literature
Computers in Biology and Medicine, 2005
We propose a graphic tool for curating molecular interaction networks constructed from the literature by information extraction (IE). In order to turn preliminary results from IE into useful biomedical resources, we propose to use a controlled environment in which visualization and IE work synergistically. The usability of the proposed graphic tool is shown with respect to the identiÿcation of incorrectly extracted results that are due to the much troubling coordination phenomena in natural language texts. Through the experiment on molecular interactions in Saccaharomyces cerevisiae, we have seen a meaningful increase (from 91.5% to 97.5%) in the number of correctly extracted interaction information. ?
BMC Bioinformatics, 2020
Background Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. Results To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we...
Visual analysis of biological data-knowledge networks
BMC Bioinformatics, 2015
Background: The interpretation of the results from genome-scale experiments is a challenging and important problem in contemporary biomedical research. Biological networks that integrate experimental results with existing knowledge from biomedical databases and published literature can provide a rich resource and powerful basis for hypothesizing about mechanistic explanations for observed gene-phenotype relationships. However, the size and density of such networks often impede their efficient exploration and understanding. Results: We introduce a visual analytics approach that integrates interactive filtering of dense networks based on degree-of-interest functions with attribute-based layouts of the resulting subnetworks. The comparison of multiple subnetworks representing different analysis facets is facilitated through an interactive super-network that integrates brushing-and-linking techniques for highlighting components across networks. An implementation is freely available as a Cytoscape app. Conclusions: We demonstrate the utility of our approach through two case studies using a dataset that combines clinical data with high-throughput data for studying the effect of β-blocker treatment on heart failure patients. Furthermore, we discuss our team-based iterative design and development process as well as the limitations and generalizability of our approach.