Discovering biological progression underlying microarray samples - PubMed (original) (raw)

Discovering biological progression underlying microarray samples

Peng Qiu et al. PLoS Comput Biol. 2011 Apr.

Abstract

In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Sample Progression Discovery (SPD) framework.

Figure 2

Figure 2. SPD applied to a cell cycle gene expression dataset.

(a) Based on an expression matrix of 3196 genes for 17 unordered samples from one cell cycle, SPD derived 154 modules and a progression similarity matrix between them. (b) Zoomed-in view of the progression similarity matrix highlights nine modules that are similar in terms of progression. (c) SPD constructed an overall MST to describe the common progression supported by the nine modules, showing a near perfect reconstruction of the correct time order. (d) Hierarchical clustering analysis did not recover the correct time order. In (c) and (d), Samples were color-coded according to the time points (formula image, hours), when the samples were taken. Blue corresponds to earlier time points; red corresponds to later time points. (e) The average expressions of the nine modules across the 17 time points show that, some of the nine modules were uncorrelated, i.e., modules 10 and 30, but SPD identified them as similar in terms of progression. (f) The mean expressions of the nine modules across all three cell cycles. The number in the parentheses above each plot is the number of genes in the corresponding module.

Figure 3

Figure 3. SPD applied to gene expression data of B-cell differentiation.

(a) Analysis based on all samples in this dataset. (b) Analysis of normal samples, with cancer samples and outliers next to cancer samples removed. Samples are color-coded according to their classification: HSC (violet), CLP (blue), proB (light blue), preB (green), immature (yellow), naiveB/CB/CC/memoryB/CD19+ (red), preB-ALL (brown). Circles are added to highlight each class of samples.

Figure 4

Figure 4. SPD applied to mouse embryonic stem cell differentiation data.

SPD revealed a landscape of mouse embryonic stem cell differentiation, where samples were perfectly ordered in time, with progressively later stages of differentiating cells radiating outwards from a core cluster of ESC samples. Circles were added to highlight each lineage. Nodes were color-coded by the expression level of a gene module (blue means low expression; green/yellow means medium; red means high expression). Module 228 was progressively induced in all differentiating lineages, and was enriched for Suz12 and Ezh1 targets. Module 3, enriched by TNF targets, was highly specifically regulated along one lineage, the trophoblast.

Figure 5

Figure 5. SPD applied to a prostate cancer dataset and derived a tree structure that describes the underlying progression.

(a) Nodes were color-coded according to their classification: normal, normal adjacent to tumor (NAP), tumor, metastatic samples (Mets). In (b), (c) and (d), nodes were color-coded using the average expression of modules 2, 32 and 19, respectively, in order to show how the expression of these modules gradually change during the progression.

Similar articles

Cited by

References

    1. Mandel E, Grosschedl R. Transcription control of early b cell differentiation. Curr Opin Immunol. 2010;22:161–167. - PubMed
    1. Filkov V, Skiena S, Zhi J. Analysis techniques for microarray time-series data. J Comput Biol. 2002;9:317–330. - PubMed
    1. Storey J, Xiao W, Leek J, Tompkins R, Davis R. Significance analysis of time course microarray experiments. Proc Natl Acad Sci U S A. 2005;102:12837–12842. - PMC - PubMed
    1. Sacchi L, Larizza C, Magni P, Bellazzi R. Precedence temporal networks to represent temporal relationships in gene expression data. J Biomed Inform. 2007;40:761–774. - PubMed
    1. Zhu D, Hero A, Cheng H, Khanna R, Swaroop A. Network constrained clustering for gene microarray data. Bioinformatics. 2005;21:4014–4020. - PubMed

Publication types

MeSH terms

LinkOut - more resources