In silico microdissection of microarray data from heterogeneous cell populations - PubMed (original) (raw)
In silico microdissection of microarray data from heterogeneous cell populations
Harri Lähdesmäki et al. BMC Bioinformatics. 2005.
Abstract
Background: Very few analytical approaches have been reported to resolve the variability in microarray measurements stemming from sample heterogeneity. For example, tissue samples used in cancer studies are usually contaminated with the surrounding or infiltrating cell types. This heterogeneity in the sample preparation hinders further statistical analysis, significantly so if different samples contain different proportions of these cell types. Thus, sample heterogeneity can result in the identification of differentially expressed genes that may be unrelated to the biological question being studied. Similarly, irrelevant gene combinations can be discovered in the case of gene expression based classification.
Results: We propose a computational framework for removing the effects of sample heterogeneity by "microdissecting" microarray data in silico. The computational method provides estimates of the expression values of the pure (non-heterogeneous) cell samples. The inversion of the sample heterogeneity can be facilitated by providing accurate estimates of the mixing percentages of different cell types in each measurement. For those cases where no such information is available, we develop an optimization-based method for joint estimation of the mixing percentages and the expression values of the pure cell samples. We also consider the problem of selecting the correct number of cell types.
Conclusion: The efficiency of the proposed methods is illustrated by applying them to a carefully controlled cDNA microarray data obtained from heterogeneous samples. The results demonstrate that the methods are capable of reconstructing both the sample and cell type specific expression values from heterogeneous mixtures and that the mixing percentages of different cell types can also be estimated. Furthermore, a general purpose model selection method can be used to select the correct number of cell types.
Figures
Figure 1
Results of the sample heterogeneity inversion in the 2-dimensional PCA space. All five heterogeneous samples are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. Symbols: estimated expression profiles of the pure colon cancer cells and lymphocytes (gray stars), mixture samples (green triangles), and reference samples (red circles). The labels next to each green triangle (resp. red circle) denote the number of the heterogeneous (resp. reference) sample, e.g., 'm1' = mixture sample #1 and 'r1' = reference sample #1, etc. (see also Table 3). The estimated expression profile of the pure colon cancer cells and lymphocytes have labels 'e1' and 'e5', respectively. See text for further details.
Figure 2
Results of the sample heterogeneity inversion in the 1-dimensional PCA space. (a) All five heterogeneous samples, and (b) only the heterogeneous samples #2, #3, and #4 are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. The height of each bar corresponds to the value of the most significant PCA component. Each bar corresponds to a heterogeneous sample, reference sample, or estimated expression profile and is labelled with the corresponding text.
Figure 3
Evolution of the value of the objective function. The red (resp. blue) graph corresponds to the value of the objective function after step 2 (resp. step 3).
Figure 4
Results of the combined sample heterogeneity inversion and the estimation of the most likely values of the mixing parameters in the 2-dimensional PCA space. All five heterogeneous samples are used to estimate the expression profiles of the pure colon cancer and lymphocyte. Symbols: estimated expression profiles (gray stars), mixture samples (green triangles), and reference samples (red circles). See text for further details.
Figure 5
Results of the combined sample heterogeneity inversion and the estimation of the most likely values of the mixing parameters in the 1-dimensional PCA space. (a) All five heterogeneous samples, and (b) only the heterogeneous samples #2, #3, and #4 are used to estimate the expression profiles of the pure colon cancer cells and lymphocytes. Each bar corresponds to a heterogeneous sample, reference sample, or estimated expression profile and is labelled with the corresponding text. The height of each bar corresponds to the value of the most significant PCA component.
Figure 6
Estimated 90 % confidence intervals for the estimated expression values of the pure cell types. The horizontal and vertical axes correspond to the fraction of lymph node cells and the normalized expression value, respectively. Symbols: the measured expression values (blue circles), the estimated expression values of the pure cell types (red stars), regression-based confidence intervals (red points), and bootstrap-based confidence intervals (red x-marks).
Figure 7
Detecting differentially expressed genes. A set of genes which are not found to be significantly differentially expressed based on the heterogeneous measurements (samples #2 and #4, blue circles). After the inversion of the mixing effect, however, the expression difference between the estimated pure colon cancer cells and lymphocytes (red stars) meet even a more stringent criterion of differential expression. The horizontal and vertical axes correspond to the fraction of lymph node cells and the normalized expression value, respectively. Symbols: the heterogeneous samples (blue circles), the estimated expression values (red stars), and the measured expression values of the pure colon cancer cells (blue squares). See text for more details.
Figure 8
The two-step optimization algorithm. Details of the two-step algorithm used for the optimization problem shown in Equation (4).
Similar articles
- Evaluating concentration estimation errors in ELISA microarray experiments.
Daly DS, White AM, Varnum SM, Anderson KK, Zangar RC. Daly DS, et al. BMC Bioinformatics. 2005 Jan 26;6:17. doi: 10.1186/1471-2105-6-17. BMC Bioinformatics. 2005. PMID: 15673468 Free PMC article. - Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.
Shedden K, Chen W, Kuick R, Ghosh D, Macdonald J, Cho KR, Giordano TJ, Gruber SB, Fearon ER, Taylor JM, Hanash S. Shedden K, et al. BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26. BMC Bioinformatics. 2005. PMID: 15705192 Free PMC article. - Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.
Gaujoux R, Seoighe C. Gaujoux R, et al. Infect Genet Evol. 2012 Jul;12(5):913-21. doi: 10.1016/j.meegid.2011.08.014. Epub 2011 Sep 10. Infect Genet Evol. 2012. PMID: 21930246 - Microarray data analysis: from disarray to consolidation and consensus.
Allison DB, Cui X, Page GP, Sabripour M. Allison DB, et al. Nat Rev Genet. 2006 Jan;7(1):55-65. doi: 10.1038/nrg1749. Nat Rev Genet. 2006. PMID: 16369572 Review. - Where statistics and molecular microarray experiments biology meet.
Kelmansky DM. Kelmansky DM. Methods Mol Biol. 2013;972:15-35. doi: 10.1007/978-1-60327-337-4_2. Methods Mol Biol. 2013. PMID: 23385529 Review.
Cited by
- FastMix: a versatile data integration pipeline for cell type-specific biomarker inference.
Zhang Y, Sun H, Mandava A, Aevermann BD, Kollmann TR, Scheuermann RH, Qiu X, Qian Y. Zhang Y, et al. Bioinformatics. 2022 Oct 14;38(20):4735-4744. doi: 10.1093/bioinformatics/btac585. Bioinformatics. 2022. PMID: 36018232 Free PMC article. - DEBay: A computational tool for deconvolution of quantitative PCR data for estimation of cell type-specific gene expression in a mixed population.
Devaraj V, Bose B. Devaraj V, et al. Heliyon. 2020 Jul 22;6(7):e04489. doi: 10.1016/j.heliyon.2020.e04489. eCollection 2020 Jul. Heliyon. 2020. PMID: 32728643 Free PMC article. - ADAPTS: Automated deconvolution augmentation of profiles for tissue specific cells.
Danziger SA, Gibbs DL, Shmulevich I, McConnell M, Trotter MWB, Schmitz F, Reiss DJ, Ratushny AV. Danziger SA, et al. PLoS One. 2019 Nov 19;14(11):e0224693. doi: 10.1371/journal.pone.0224693. eCollection 2019. PLoS One. 2019. PMID: 31743345 Free PMC article. - Quantifying tumor-infiltrating immune cells from transcriptomics data.
Finotello F, Trajanoski Z. Finotello F, et al. Cancer Immunol Immunother. 2018 Jul;67(7):1031-1040. doi: 10.1007/s00262-018-2150-z. Epub 2018 Mar 14. Cancer Immunol Immunother. 2018. PMID: 29541787 Free PMC article. Review. - A sequential Monte Carlo approach to gene expression deconvolution.
Ogundijo OE, Wang X. Ogundijo OE, et al. PLoS One. 2017 Oct 19;12(10):e0186167. doi: 10.1371/journal.pone.0186167. eCollection 2017. PLoS One. 2017. PMID: 29049343 Free PMC article.
References
- Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. - DOI - PubMed
- Zhang W, Ramdas L, Shen WP, Song WS, Hu L, Hamilton SR. Apoptotic response to 5-fluorouracil treatment is mediated by reduced polyamines, non-autocrine fas ligand and induced tumor necrosis factor receptor 2. Cancer Biol Ther. 2003;2:572–578. - PubMed
- Zhang W, Shmulevich I, Astola J. Microarray Quality Control. John Wiley and Sons; 2004.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources