PTR Explorer: An approach to identify and explore Post Transcriptional Regulatory mechanisms using proteogenomics (original) (raw)
Related papers
Integrative Analysis of Transcriptomic and Proteomic Data: Challenges, Solutions and Applications
2008
Recent advances in high-throughput technologies enable quantitative monitoring of the abundance of various biological molecules and allow determination of their variation between biological states on a genomic scale. Two popular platforms are DNA microarrays that measure messenger RNA transcript levels, and gel-free proteomic analyses that quantify protein abundance. Obviously, no single approach can fully unravel the complexities of fundamental biology and it is equally clear that integrative analysis of multiple levels of gene expression would be valuable in this endeavor. However, most integrative transcriptomic and proteomic studies have thus far either failed to find a correlation or only observed a weak correlation. In addition to various biological factors, it is suggested that the poor correlation could be quite possibly due to the inadequacy of available statistical tools to compensate for biases in the data collection methodologies. To address this issue, attempts have recently been made to systematically investigate the correlation patterns between transcriptomic and proteomic datasets, and to develop sophisticated statistical tools to improve the chances of capturing a relationship. The goal of these efforts is to enhance understanding of the relationship between transcriptomes and proteomes so that integrative analyses may be utilized to reveal new biological insights that are not accessible through one-dimensional datasets. In this review, we outline some of the challenges associated with integrative analyses and present some preliminary statistical solutions. In addition, some new applications of integrated transcriptomic and proteomic analysis to the investigation of post-transcriptional regulation are also discussed.
2022
Functional analysis of high throughput experiments using pathway analysis is now ubiquitous. Though powerful, these methods often produce thousands of redundant results owing to knowledgebase redundancies upstream. This scale of results hinders extensive exploration by biologists and often leads to investigator biases due to previous knowledge and expectations. To address this issue, we present vissE, a flexible network-based analysis method that summarises redundancies into biological themes and provides various analytical modules to characterise and visualise them with respect to the underlying data, thus providing a comprehensive view of the biological system. We demonstrate vissE’s versatility by applying it to three different technologies: bulk, single-cell and spatial transcriptomics. Applying vissE to a factor analysis of a breast cancer spatial transcriptomic data, we identified stromal phenotypes that support tumour dissemination. Its adaptability allows vissE to enhance al...
Journal of Biomedicine and Biotechnology, 2006
Genome-wide gene expression profile studies encompass increasingly large number of samples, posing a challenge to their presentation and interpretation without losing the notion that each transcriptome constitutes a complex biological entity. Much like pathologists who visually analyze information-rich histological sections as a whole, we propose here an integrative approach. We use a self-organizing maps-based software, the gene expression dynamics inspector (GEDI) to analyze gene expression profiles of various lung tumors. GEDI allows the comparison of tumor profiles based on direct visual detection of transcriptome patterns. Such intuitive "gestalt" perception promotes the discovery of interesting relationships in the absence of an existing hypothesis. We uncovered qualitative relationships between squamous cell tumors, small-cell tumors, and carcinoid tumor that would have escaped existing algorithmic classifications. These results suggest that GEDI may be a valuable explorative tool that combines global and gene-centered analyses of molecular profiles from large-scale microarray experiments.
Statistical modeling and visualization of molecular profiles in cancer
BioTechniques, 2003
Current cancer classifications using morphological criteria produce heterogeneous classes with variable prognosis and clinical course. By measuring gene expression for thousands of genes in a single hybridization experiment, microarrays have the potential to contribute to more effective classifications based on molecular information. This gives hope to improve both prognosis and treatment. Statistical methods for molecular classification have focused on using high dimensional representations of molecular profiles to identify subclasses. These can be noisy, unstable, and highly platform-specific. In this article, we emphasize the notion of molecular profiles based on latent categories signifying under-, over-, and baseline expression. Following this approach, we can generate results that are more easily interpretable, more easily translated into clinical tools, more robust to noise, and less platform-dependent. We illustrate both the methods and the associated software for molecular ...
From signatures to models: understanding cancer using microarrays
Nature Genetics, 2005
Genomics has the potential to revolutionize the diagnosis and management of cancer by offering an unprecedented comprehensive view of the molecular underpinnings of pathology. Computational analysis is essential to transform the masses of generated data into a mechanistic understanding of disease. Here we review current research aimed at uncovering the modular organization and function of transcriptional networks and responses in cancer. We first describe how methods that analyze biological processes in terms of higherlevel modules can identify robust signatures of disease mechanisms. We then discuss methods that aim to identify the regulatory mechanisms underlying these modules and processes. Finally, we show how comparative analysis, combining human data with model organisms, can lead to more robust findings. We conclude by discussing the challenges of generalizing these methods from cells to tissues and the opportunities they offer to improve cancer diagnosis and management.
EdgeScaping: Mapping the spatial distribution of pairwise gene expression intensities
PLOS ONE, 2019
Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered. EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied Edge-Scaping to a human tumor GEM to identify sets of genes that exhibit conventional and nonconventional interdependent non-linear behavior associated with brain specific tumor subtypes that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at https://github.com/bhusain/EdgeScaping under the MIT license.
Nucleic acids research, 2017
Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be ac...
An intuitive graphical visualization technique for the interrogation of transcriptome data
Nucleic Acids Research, 2011
The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLABimplemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at www.nimr.mrc.ac.uk/ research/james-briscoe/visgenex.
Proteogenomic convergence for understanding cancer pathways and networks
Clinical Proteomics, 2014
During the past several decades, the understanding of cancer at the molecular level has been primarily focused on mechanisms on how signaling molecules transform homeostatically balanced cells into malignant ones within an individual pathway. However, it is becoming more apparent that pathways are dynamic and crosstalk at different control points of the signaling cascades, making the traditional linear signaling models inadequate to interpret complex biological systems. Recent technological advances in high throughput, deep sequencing for the human genomes and proteomic technologies to comprehensively characterize the human proteomes in conjunction with multiplexed targeted proteomic assays to measure panels of proteins involved in biologically relevant pathways have made significant progress in understanding cancer at the molecular level. It is undeniable that proteomic profiling of differentially expressed proteins under many perturbation conditions, or between normal and "diseased" states is important to capture a first glance at the overall proteomic landscape, which has been a main focus of proteomics research during the past 15-20 years. However, the research community is gradually shifting its heavy focus from that initial discovery step to protein target verification using multiplexed quantitative proteomic assays, capable of measuring changes in proteins and their interacting partners, isoforms, and post-translational modifications (PTMs) in response to stimuli in the context of signaling pathways and protein networks. With a critical link to genotypes (i.e., high throughput genomics and transcriptomics data), new and complementary information can be gleaned from multi-dimensional omics data to (1) assess the effect of genomic and transcriptomic aberrations on such complex molecular machinery in the context of cell signaling architectures associated with pathological diseases such as cancer (i.e., from genotype to proteotype to phenotype); and (2) target pathway-and network-driven changes and map the fluctuations of these functional units (proteins) responsible for cellular activities in response to perturbation in a spatiotemporal fashion to better understand cancer biology as a whole system.