Glyph-based Overviews of Large Datasets in Structural Bioinformatics (original) (raw)
Related papers
GRAPE: GRaphical Abstracted Protein Explorer
Nucleic Acids Research, 2010
The region surrounding a protein, known as the surface of interaction or molecular surface, can provide valuable insight into its function. Unfortunately, due to the complexity of both their geometry and their surface fields, study of these surfaces can be slow and difficult and important features may be hard to identify. Here, we describe our GRaphical Abstracted Protein Explorer, or GRAPE, a web server that allows users to explore abstracted representations of proteins. These abstracted surfaces effectively reduce the level of detail of the surface of a macromolecule, using a specialized algorithm that removes small bumps and pockets, while preserving large-scale structural features. Scalar fields, such as electrostatic potential and hydropathy, are smoothed to further reduce visual complexity. This entirely new way of looking at proteins complements more traditional views of the molecular surface. GRAPE includes a thin 3D viewer that allows users to quickly flip back and forth between both views. Abstracted views provide a fast way to assess both a molecule's shape and its different surface field distributions. GRAPE is freely available at
Visualizing the Protein Sequence Universe
ABSTRACT Modern biology is experiencing a rapid increase in data volumes that challenges our analytical skills and existing cyberinfrastructure. Exponential expansion of the Protein Sequence Universe (PSU), the protein sequence space, together with the costs and complexities of manual curation creates a major bottleneck in life sciences research. Existing resources lack scalable visualization tools that are instrumental for functional annotation.
Integrated visual analysis of protein structures, sequences, and feature data
BMC Bioinformatics, 2015
Background: To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. Results: To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. Conclusions: The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.
Data Visualization Tools for Large Biological Data Sets
2017
Researchers have access to an ever-growing volume of data available at multiple levels of biological analysis. Many visual analytic tools have been developed to display a variety of biological data types but many of these tools are challenging to use and only examine one biological level of analysis at a time. The development and testing of hypotheses is difficult when the information is hard to integrate and laborious to interpret. The application of data visualization principles and user experience design best practices could improve systems biology research workflows by providing visual analytic tools with what is known in the information visualization community as a “transparent” user interface. This thesis consists of four chapters that explore two central questions: 1) What is the best way to represent biological information at different levels of analysis? and 2) How do we enable researchers to explore and interact with their data as naturally and intuitively as possible? The...
Visual exploration across biomedical databases
2011
Abstract Though biomedical research often draws on knowledge from a wide variety of fields, few visualization methods for biomedical data incorporate meaningful cross-database exploration. A new approach is offered for visualizing and exploring a query-based subset of multiple heterogeneous biomedical databases. Databases are modeled as an entity-relation graph containing nodes (database records) and links (relationships between records).
Information visualization techniques in bioinformatics during the postgenomic era
Drug Discovery Today: BIOSILICO, 2004
Information visualization techniques, which take advantage of the bandwidth of human vision, are powerful tools for organizing and analyzing a large amount of data. In the postgenomic era, information visualization tools are indispensable for biomedical research. This paper aims to present an overview of current applications of information visualization techniques in bioinformatics for visualizing different types of biological data, such as from genomics, proteomics, expression profiling and structural studies. Finally, we discuss the challenges of information visualization in bioinformatics related to dealing with more complex biological information in the emerging fields of systems biology and systems medicine.
Zomit: biological data visualization and browsing
Bioinformatics, 1998
Motivation: The problems caused by the difficulty in visualizing and browsing biological databases have become crucial. Scientists can no longer interact directly with the huge amount of available data. However, future breakthroughs in biology depend on this interaction. We propose a new metaphor for biological data visualization and browsing that allows navigation in very large databases in an intuitive way. The concepts underlying our approach are based on navigation and visualization with zooming, semantic zooming and portals; and on data transformation via magic lenses. We think that these new visualization and navigation techniques should be applied globally to a federation of biological databases. Results: We have implemented a generic tool, called Zomit, that provides an application programming interface for developing servers for such navigation and visualization, and a generic architecture-independent client (Java™ applet) that queries such servers. As an illustration of the capabilities of our approach, we have developed ZoomMap, a prototype browser for the HuGeMap human genome map database. Availability: Zomit and ZoomMap are available at the URL
Visualisation of bioinformatics datasets
2015
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore highdimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising highdimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Glyphmaker: creating customized visualizations of complex data
Computer, 1994
Glyphmaker's general approach to data visualization and analysis lets users build customized representations of multivariate data and provides interactive tools for exploring patterns and relationships. urrent visualizationianalysis tools offer a limited selection of visual representations. and they're optimized for only a few well-known applications. such as computational fluid dynamics (CFD) or finite-element analysis. The available representationsfor example. surface or volume rendering of scalar or vector fieldsare less useful for spatially complex multivariate data with a high correlation among variables. Such data is common in physics and materials science. but it can also appear in CFD and finite-element results.' In these cases, when users attempt to replace the usual visual presentation with one that better characterizes their data, they lose the system's built-in. programmerless functionality. Thus, with extensible systems. such as dataflow systems. they must suddenly know quite a bit about graphics programmingor employ someone who does. With nonextensible systems, they are just out of luck. Our system. called Glyphmaker.' allows nonexpert users to customize their own graphical representations using a simple glyph editor and a point-and-click binding mechanism. In particular. users can create and then alter bindings to visual representations. bring in new data or glyphs with associated bindings, change ranges for bound data, and do these operations interactively. They can also focus on data down to any level of detail, including individual elements, and then isolate or highlight the focused region. These features empower users, letting them employ their specialized domain knowledge to create customized visual representations for further exploration and analysis. For ease of design and use. we built Glyphmaker on top of Iris Explorer, the Silicon Graphics Inc. (SGI) dataflow visualization system. The current version of Glyphmaker has been successfully tested on a materials system simulation. We are planning a series of tests and evaluations by scientists and engineers using real data and welcome contacts from users with spatially complex multivariate data. Glyphs Glyphs are graphical objects whose attributesposition. size. shape. color. orientation. etc.-are bound to data. These objects can be effective in depicting discrete