Web-based visual analysis for high-throughput genomics (original) (raw)

Bluejay: A Highly Scalable and Integrative Visual Environment for Genome Exploration

2007 IEEE Congress on Services (Services 2007), 2007

Many questions that biologists want to answer using the information available from completely sequenced genomes are complex. A graphical environment allows users to visually explore and operate on a sequence. Sequence and annotation data exists in bewildering varieties of types and levels of detail. The graphical environment therefore needs to adapt to this variation to provide the user the best possible visual representation of genomic data in a given context. As more and more online tools and services become available for biologists, mediating software should also be able to integrate and link to them. The Bluejay browser is a Java-based visual environment for exploring biological sequences. Uniquely, Bluejay fully integrates existing gene expression software into a genomic context. Bluejay also differentiates itself from most form-based HTML sequence browsers because it: (i) is highly scalable so that it can visualize a wide range of genomic objects ranging from a large whole genome down to individual nucleotides by using data-transformational Web services; and (ii) dynamically discovers and provides links to disparate resources such as gene annotation data (via XLinks) and semanticallydescribed biological Web Services (via the BioMOBY protocol).

Epiviz: Integrative Visual Analysis Software for Genomics

2015

Title of Dissertation: Epiviz: INTEGRATIVE VISUAL ANALYSIS SOFTWARE FOR GENOMICS Florin Chelaru, Doctor of Philosophy, 2015 Directed By: Professor Héctor Corrada Bravo Department of Computer Science Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. Commonly, the data visualized in these tools is the output of analyses performed in powerful computing environments like R/Bioconductor or Python. Two essential aspects of data analysis are usually treated as distinct, in spite of being part of the same explora...

CoolBox: a flexible toolkit for visual analysis of genomics data

BMC Bioinformatics

Background Data visualization, especially the genome track plots, is crucial for genomics researchers to discover patterns in large-scale sequencing dataset. Although existing tools works well for producing a normal view of the input data, they are not convenient when users want to create customized data representations. Such gap between the visualization and data processing, prevents the users to uncover more hidden structure of the dataset. Results We developed CoolBox—an open-source toolkit for visual analysis of genomics data. This user-friendly toolkit is highly compatible with the Python ecosystem and customizable with a well-designed user interface. It can be used in various visualization situations like a Swiss army knife. For example, to produce high-quality genome track plots or fetch commonly used genomic data files with a Python script or command line, to explore genomic data interactively within Jupyter environment or web browser. Moreover, owing to the highly extensibl...

Epiviz: a view inside the design of an integrated visual analysis software for genomics

BMC Bioinformatics, 2015

Background: Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results: In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions: Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.

InstaCircos: a Web Application for Fast and Interactive Circular Visualization of Large Genomic Data (Work in Progress)

2020 24th International Conference Information Visualisation (IV), 2020

One of the most effective visualizations for genomics data is the circular one, supported by popular packages and visualization suites. Many tools are available, however most of them share a number of negative points including limited ease of installation/usage, slow performance and memory limitations (making them unfeasible for very large genomes such as the human one) and non interactivity. In this paper we present the ongoing work on InstaCircos, a web application born from the scientific collaboration between Big Data Analytics and Bioinformatics researchers and aiming at overcoming the available tools’ limitations. It provides advanced visualization features through an easy to use web interface and offers interactive functionalities and near real-time performances thanks to an integrated big data management back-end based on MongoDB.

Bioinformatics visualization and integration with open standards: the Bluejay genomic browser

In silico biology, 2005

We have created a new Java-based integrated computational environment for the exploration of genomic data, called Bluejay. The system is capable of using almost any XML file related to genomic data. Non-XML data sources can be accessed via a proxy server. Bluejay has several features, which are new to Bioinformatics, including an unlimited semantic zoom capability, coupled with Scalable Vector Graphics (SVG) outputs; an implementation of the XLink standard, which features access to MAGPIE Genecards as well as any BioMOBY service accessible over the Internet; and the integration of gene chip analysis tools with the functional assignments. The system can be used as a signed web applet, Web Start, and a local stand-alone application, with or without connection to the Internet. It is available free of charge and as open source via http://bluejay.ucalgary.ca.

SVIST4GET: A Simple Visualization Tool for Genomic Tracks from Sequencing Experiments

BMC Bioinformatics

Background: High-throughput sequencing often provides a foundation for experimental analyses in the life sciences. For many such methods, an intermediate layer of bioinformatics data analysis is the genomic signal track constructed by short read mapping to a particular genome assembly. There are many software tools to visualize genomic tracks in a web browser or with a stand-alone graphical user interface. However, there are only few command-line applications suitable for automated usage or production of publication-ready visualizations. Results: Here we present svist4get, a command-line tool for customizable generation of publication-quality figures based on data from genomic signal tracks. Similarly to generic genome browser software, svist4get visualizes signal tracks at a given genomic location and is able to aggregate data from several tracks on a single plot along with the transcriptome annotation. The resulting plots can be saved as the vector or high-resolution bitmap images. We demonstrate practical use cases of svist4get for Ribo-Seq and RNA-Seq data. Conclusions: svist4get is implemented in Python 3 and runs on Linux. The command-line interface of svist4get allows for easy integration into bioinformatics pipelines in a console environment.

NX4: a web-based visualization of large multiple sequence alignments

Bioinformatics, 2019

Multiple Sequence Alignments (MSAs) are a fundamental operation in genome analysis. However, MSA visualizations such as sequence logos and matrix representations have changed little since the nineties and are not well suited for displaying large-scale alignments. We propose a novel, web-based MSA visualization tool called NX4, which can handle genome alignments comprising thousands of sequences. NX4 calculates the frequency of each nucleotide along the alignment and visually summarizes the results using a color-blind friendly palette that helps identifying regions of high genetic diversity. NX4 also provides the user with additional assistance in finding these regions with a 'focus þ context' mechanism that uses a line chart of the Shannon entropy across the alignment. The tool offers geneticists an easy-to-use and scalable analysis for large MSA studies.

Sungear: interactive visualization and functional analysis of genomic datasets

Bioinformatics/computer Applications in The Biosciences, 2007

Sungear is a software system that supports a rapid, visually interactive and biologist-driven comparison of large data sets. The data sets can come from microarray experiments (e.g. genes induced in each experiment), from comparative genomics (e.g. genes present in each genome), or even from non-biological applications (e.g. demographics or baseball statistics). Sungear represents multiple data sets as vertices in a polygon. Each possible intersection among the sets is represented as a circle inside the polygon. The position of the circle is determined by the position of the vertices represented in the intersection and the area of the circle is determined by the number of elements in the intersection. Sungear shows which GO terms are over-represented in a subset of circles or anchors. The intuitive Sungear interface has enabled biologists to determine quickly which data set or groups of data sets play a role in a biological function of interest. Availability: A live online version of Sungear can be found at