Browsing large scale cheminformatics data with dimension reduction (original) (raw)

The Molecule Cloud - compact visualization of large collections of molecules

Journal of cheminformatics, 2012

Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance. A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize...

DataWarrior, An Open-Source Program For Chemistry Aware Data Visualization And Analysis

Journal of chemical information and modeling, 2015

Drug discovery projects in the pharmaceutical industry accumulate thousands of chemical structures and ten-thousands of data points from a dozen or more biological and pharmacological assays. A sufficient interpretation of the data requires to understand, which molecular families are present, which structural motives correlate with measured properties and which tiny structural changes cause large property changes. Data visualization and analysis software with sufficient chemical intelligence to support chemists in this task is rare. In an attempt to contribute filling the gap, we released our in-house developed chemistry aware data analysis program DataWarrior for free public use. This paper gives an overview of DataWarrior's functionality and architecture. Exemplarily, a new unsupervised, 2-dimensional scaling algorithm is presented, which employs vector-based or non-vector-based descriptors to visualize the chemical or pharmacophore space of even large datasets. DataWarrior us...

InfVis − Platform-Independent Visual Data Mining of Multidimensional Chemical Data Sets

Journal of Chemical Information and Modeling, 2005

The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.

The next frontier for bio- and cheminformatics visualization

IEEE Computer Graphics and Applications, 2002

L ife-science research is increasingly reliant on computation, as affirmed by the recent mapping of the human genome and the analysis questions it poses. Our task is to make sense of these genetic blueprints to develop treatments and therapies for disease. This market is huge, as is the commitment by pharmaceutical companies, biotech firms, and the investment community. The stage is set for scientific discovery and technology advances, the stakes are high, and researchers have many analysis alternatives. The question is whether visualization can be a player in this market and whether it's up to the challenges. This is the question we attempted to answer as panelists at the Visualization 2001 conference. 1 As a group of researchers and practitioners in this burgeoning field, we've noticed three broad problem-solving themes: I the visual integration of analyses, I high-dimensional analytic visualization, and I the emergence of new visualization designs to solve problems. Analytical high-dimensional visualizations, tightly couple analytical substance with high-dimensional visualization. We can think of this new breed of analytical tool as an intelligent computational probe or query. To review a few other key terms before reading further, see the sidebar "Bio-and Cheminformatics."

STM3: a chemistry visualization platform

Zeitschrift für Kristallographie, 2000

To support CSCS research users we built STM3, a software platform on which advanced chemistry visualization techniques can be integrated. Its main goal is not to replace existing tools, but to provide functionalities not covered by them. STM3's unusual characteristic among chemistry visualization tools is its ability to combine chemistry and general visualization techniques in the same view. STM3 is built on top of a proven visualization environment (AVS/Express) that lets CSCS's visualization staff concentrate its efforts on developing new technologies rather than investing time on graphical and user interface implementation issues.

EnVision: A Web-Based Tool for Scientific Visualization

2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Scientific visualization is the process of transforming raw numeric data into a visual form, and is a key element of computational science. While many tools exist, they are unnecessarily difficult to use. This complexity increases time to insight and inhibits casual inquiry. The complexity derives from the need to support arbitrarily formatted data and many visualization algorithms. EnVision addresses both sources of complexity. Its design is predicated on two key insights. First, though the number of data file formats is unbounded, the structure of any one can be described using a small number of parameters. Second, the set of visualization algorithms applicable to a given type of data is small, and the subset used within a specific scientific discipline is smaller. EnVision utilizes domain-specific knowledge and user-directed semi-automation to dramatically simplify data importation and visualization algorithm selection. Its web-based interface facilitates access to remote hardware resources and provides a collaborative visualization environment.

Visualizing chemical space networks with RDKit and NetworkX

Journal of Cheminformatics

This article demonstrates how to create Chemical Space Networks (CSNs) using a Python RDKit and NetworkX workflow. CSNs are a type of network visualization that depict compounds as nodes connected by edges, defined as a pairwise relationship such as a 2D fingerprint similarity value. A step by step approach is presented for creating two different CSNs in this manuscript, one based on RDKit 2D fingerprint Tanimoto similarity values, and another based on maximum common substructure similarity values. Several different CSN visualization features are included in the tutorial including methods to represent nodes with color based on bioactivity attribute value, edges with different line styles based on similarity value, as well as replacing the circle nodes with 2D structure depictions. Finally, some common network property and analysis calculations are presented including the clustering coefficient, degree assortativity, and modularity. All code is provided in the form of Jupyter Noteboo...

Remote Scientific Visualization for Large Datasets

Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, are becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We apply five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large scale system and the integration of this code with the Access Grid. Example use cases from Materials Science are considered.

Volume-rendering on a 3D hyperwall: A molecular visualization platform for research, education and outreach

We present a unique platform for molecular visualization and design that uses novel subatomic feature detection software in tandem with 3D hyperwall visualization technology. We demonstrate the fleshing-out of pharmacophores in drug molecules, as well as reactive sites in catalysts, focusing on subatomic features. Topological analysis with picometer resolution, in conjunction with interactive volume-rendering of the Laplacian of the electronic charge density, leads to new insight into docking and catalysis. Visual data-mining is done efficiently and in parallel using a 4 × 4 3D hyperwall (a tiled array of 3D monitors driven independently by slave GPUs but displaying high-resolution, synchronized and functionally-related images). The visual texture of images for a wide variety of molecular systems are intuitive to experienced chemists but also appealing to neophytes, making the platform simultaneously useful as a tool for advanced research as well as for pedagogical and STEM education outreach purposes.