In silico microdissection of microarray data from heterogeneous cell populations - PubMed (original) (raw)

In silico microdissection of microarray data from heterogeneous cell populations

Harri Lähdesmäki et al. BMC Bioinformatics. 2005.

Abstract

Background: Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification.

Results: We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types.

Conclusion: The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Results of the sample heterogeneity inversion in the 2-dimensional PCA space. All five heterogeneous samples are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. Symbols: estimated expression profiles of the pure colon cancer cells and lymphocytes (gray stars), mixture samples (green triangles), and reference samples (red circles). The labels next to each green triangle (resp. red circle) denote the number of the heterogeneous (resp. reference) sample, e.g., 'm1' = mixture sample #1 and 'r1' = reference sample #1, etc. (see also Table 3). The estimated expression profile of the pure colon cancer cells and lymphocytes have labels 'e1' and 'e5', respectively. See text for further details.

Figure 2

Figure 2

Results of the sample heterogeneity inversion in the 1-dimensional PCA space. (a) All five heterogeneous samples, and (b) only the heterogeneous samples #2, #3, and #4 are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. The height of each bar corresponds to the value of the most significant PCA component. Each bar corresponds to a heterogeneous sample, reference sample, or estimated expression profile and is labelled with the corresponding text.

Figure 3

Figure 3

Evolution of the value of the objective function. The red (resp. blue) graph corresponds to the value of the objective function after step 2 (resp. step 3).

Figure 4

Figure 4

Results of the combined sample heterogeneity inversion and the estimation of the most likely values of the mixing parameters in the 2-dimensional PCA space. All five heterogeneous samples are used to estimate the expression profiles of the pure colon cancer and lymphocyte. Symbols: estimated expression profiles (gray stars), mixture samples (green triangles), and reference samples (red circles). See text for further details.

Figure 5

Figure 5

Results of the combined sample heterogeneity inversion and the estimation of the most likely values of the mixing parameters in the 1-dimensional PCA space. (a) All five heterogeneous samples, and (b) only the heterogeneous samples #2, #3, and #4 are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. Each bar corresponds to a heterogeneous sample, reference sample, or estimated expression profile and is labelled with the corresponding text. The height of each bar corresponds to the value of the most significant PCA component.

Figure 6

Figure 6

Estimated 90 % confidence intervals for the estimated expression values of the pure cell types. The horizontal and vertical axes correspond to the fraction of lymph node cells and the normalized expression value, respectively. Symbols: the measured expression values (blue circles), the estimated expression values of the pure cell types (red stars), regression-based confidence intervals (red points), and bootstrap-based confidence intervals (red x-marks).

Figure 7

Figure 7

Detecting differentially expressed genes. A set of genes which are not found to be significantly differentially expressed based on the heterogeneous measurements (samples #2 and #4, blue circles). After the inversion of the mixing effect, however, the expression difference between the estimated pure colon cancer cells and lymphocytes (red stars) meet even a more stringent criterion of differential expression. The horizontal and vertical axes correspond to the fraction of lymph node cells and the normalized expression value, respectively. Symbols: the heterogeneous samples (blue circles), the estimated expression values (red stars), and the measured expression values of the pure colon cancer cells (blue squares). See text for more details.

Figure 8

Figure 8

The two-step optimization algorithm. Details of the two-step algorithm used for the optimization problem shown in Equation (4).

Similar articles

Cited by

References

    1. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9:3273–3297. - PMC - PubMed
    1. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
    1. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. - DOI - PubMed
    1. Zhang W, Ramdas L, Shen WP, Song WS, Hu L, Hamilton SR. Apoptotic response to 5-fluorouracil treatment is mediated by reduced polyamines, non-autocrine fas ligand and induced tumor necrosis factor receptor 2. Cancer Biol Ther. 2003;2:572–578. - PubMed
    1. Zhang W, Shmulevich I, Astola J. Microarray Quality Control. John Wiley and Sons; 2004.

Publication types

MeSH terms

Substances

LinkOut - more resources