Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments - PubMed (original) (raw)

Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments

Andrew McDavid et al. Bioinformatics. 2013.

Abstract

Motivation: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However, few analytic tools have been developed specifically for the statistical and analytical challenges of single-cell quantitative polymerase chain reactions data.

Results: We present a statistical framework for the exploration, quality control and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. Although developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events.

Availability: All results presented here were obtained using the SingleCellAssay R package available on GitHub (http://github.com/RGLab/SingleCellAssay).

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Histogram and theoretical (normal) distribution of formula image for single-cell (left, light gray) and 100-cell experiments (right, dark gray). Genes FASLG, IFN- formula image, BIRC3 and CD69 are depicted. The frequency expression of each gene in the single-cell experiments formula image is printed above each histogram. The mean of the 100-cell and single-cell experiments is indicated by a thick black line along the _x_-axis

Fig. 2.

Fig. 2.

Concordance between 100 cell formula image and formula image, the in silico average of single-cell wells for datasets A, B and C. In the top row, wells with formula image are included and treated as exact zeroes. In the middle row, they are excluded, resulting in a clear lack of concordance. In the final row, wells are filtered as per Section 2.3. Dark, thin lines show the initial location of a gene before filtering and connect to the location of the gene after filtering. In each panel, formula image, the concordance correlation coefficient and formula image, the average weighted squared deviation of expression measurements is printed. The dotted black line shows a loess fit through the data. In all cases, the expression values are transformed using a shifted log-transformation [formula image]. As such, a graphed value of zero corresponds to a zero expression value (i.e. formula image)

Fig. 3.

Fig. 3.

Number of discoveries (genes formula image units) versus FDR, by treatment, dataset A. The combined LRT is compared with a Bernoulli or normal-theory only LRT, as well as a _t_-test of the raw expression values (formula image scale), including zero measurements

Fig. 4.

Fig. 4.

formula image of tests (genes formula image units) versus frequencies of expression formula image of the genes. The Bernoulli, normal-theory and combined LRTs are plotted. Asterisk indicates test is different from the combined test at 5% significance in a Wilcoxon signed-rank test

Fig. 5.

Fig. 5.

Heatmap of signed formula image for selected genes (rows, see main text) and all 16 individuals (columns). The color above each column indicates the antigen stimulation applied to the cells; thus, individuals are randomly arranged in each antigen block. Red and purple are two different CMV antigen pools; yellow and orange are two different HIV antigen pools

Similar articles

Cited by

References

    1. Bengtsson M, et al. Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 2005;15:1388–1392. - PMC - PubMed
    1. Dalerba P, et al. Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nat. Biotechnol. 2011;29:1120–1127. - PMC - PubMed
    1. Flatz L, et al. Single-cell gene-expression profiling reveals qualitatively distinct CD8 T cells elicited by different gene-based vaccines. Proc. Natl Acad. Sci. USA. 2011;108:5724–5729. - PMC - PubMed
    1. Ge Y, et al. Resampling-based multiple testing for microarray data analysis. TEST. 2003;12:1–77.
    1. Glotzbach JP, et al. An information theoretic, microfluidic-based single cell analysis permits identification of subpopulations among putatively homogeneous stem cells. PLoS One. 2011;6:e21211. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources