A regression model approach to enable cell morphology correction in high-throughput flow cytometry - PubMed (original) (raw)

A regression model approach to enable cell morphology correction in high-throughput flow cytometry

Theo A Knijnenburg et al. Mol Syst Biol. 2011.

Abstract

Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an 'unresponsive population,' may provide the population with the advantage of diversified bet hedging.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1

Figure 1

Compensating for the effect of cell size and cell granularity using regression. In this example, the experiment consists of two biological samples (sample 1 and sample 2). During preprocessing in step 1, spurious events (depicted in gray) are discarded. In step 2, the FSC and SSC measurements are used to estimate the density of cells in the two-dimensional FSC/SSC space. The regression model of FL on FSC and SSC for each sample is indicated by the colored lines in step 3. (For visualization purposes, only the FSC is depicted as an independent variable. The SSC is also an independent variable and the actual regression model represents a surface, not a curve.) The average fluorescence intensity for each sample is computed by evaluating the regression model across the complete two-dimensional FSC/SSC space and weighting each location in this space by its corresponding density (estimated in step 2) before averaging. The colors within the regression lines indicate the weights and are directly related to the colors of the density estimate in step 2. The average fluorescence values are indicated by the green and purple cross on the y axis for samples 1 and 2, respectively. Step 4 depicts a histogram of the fluorescence intensities compensated for the effect of cell size and cell granularity (as measured by FSC and SSC, respectively). The values are obtained as the residuals (distances from the regression model) offset by the average fluorescence intensity.

Figure 2

Figure 2

Analysis of Gal1–GFP in response to galactose in WT and mutant strains. (A) Fraction of responding cells over time after a galactose induction at _t_=0. (B) Mean intensity of all cells. (C) CV of active cells. (D) Fraction of cells gated or used for the regression model. All results show the average of three replicates. Error bars have been omitted for clarity, but they are similar between the two methods. Panels (A, B) correspond to Figure 2 of Ramsey et al (2006).

Figure 3

Figure 3

Comparing the variance components between gating and regression. (A) The blue polygon represents the median and the interquartile range of the variance of the fluorescence after gating with different sizes of the gate (x axis). The red lines indicate the median and the interquartile range of the variance of the fluorescence after regression. (The latter median and interquartile values are constant values that are not dependent on the x axis, since the regression model uses all data points; they are depicted as lines for representation only). The green dashed line indicates the median of the variance of the fluorescence after regression with a simple linear model, including only linear effects of FSC and SSC. (B) Distribution of the mean-subtracted fluorescence intensities based on gates with different sizes (shades of blue) and the regression model (red) for one biological sample (_n_=4469). The distribution is computed by normalized histogram binning using 50 bins and is represented by a continuous line that connects the centers of the histogram (C) Histogram of the difference in variance between the regression model and the ‘converged’ gate across all biological samples.

Figure 4

Figure 4

Scatter plots of bar-coded samples. (A) Density plot of the FL2 and FL3 raw data with their corresponding histograms of a mixture of 6 × 6 populations stained with different concentrations of PACblu-NHS (in FL2) and Alexa 488-NHS (in FL3). The gray lines (minima in the histograms) separate the 36 populations. Colors represent density of cells with a gradation from red (more dense) to blue (less dense). This figure is similar to Figure 3A in Krutzik and Nolan (2006). (B) Identical to (A), except the FL2 and FL3 data are compensated for cell size and granularity using the regression model.

Figure 5

Figure 5

Comparing biological samples without overlap in the SSC/FSC space. (A) The cells of one biological sample in the FSC/SSC space are split up into two parts (green and magenta) using the first principal component axis (black line). Inlay: the distributions of the fluorescence densities obtained from the regression model. The green and magenta lines represent the two ‘halved’ samples, while the black line represents the distribution of the complete sample. The crosses represent the means (i.e. the average fluorescence intensities). In this case, the mean of the magenta and green distribution differ by 0.6 and 5.0% from the mean of the whole sample and 0.06 and 0.23 in terms of absolute difference in density, respectively. (B) Histogram of the deviation between the correct average (of the non-split sample) and the ‘halved’ samples across all biological samples. (C) Histogram of the L1 (absolute) difference in density between the ‘halved’ samples and the whole sample across all biological samples. These absolute differences range from 0 (identical densities) to 2 (completely different densities).

Figure 6

Figure 6

Decision tree divides the 148 deletion strains into 8 clusters. The green numbers near the red arrows indicate the number of deletion strains in the branch of the tree. For clusters 1 and 2, a heatmap representation of the expression over time is shown for one gene in each cluster. For clusters 3–8, a histogram representation is shown of one gene per cluster. The dashed lines are aligned to the mean of the WT distribution for unimodal time points. For the bimodal WT time points (i.e., 6, 8 10 and 12 h), the lines are aligned to the mean of the high expressing distribution.

References

    1. Acar M, Mettetal JT, van Oudenaarden A (2008) Stochastic switching as a survival strategy in fluctuating environments. Nat Genet 40: 471–475 - PubMed
    1. Batenchuk C, St-Pierre S, Tepliakova L, Adiga S, Szuto A, Kabbani N, Bell JC, Baetz K, Kærn M (2011) Chromosomal position effects are linked to sir2-mediated variation in transcriptional burst size. Biophys J 100: L56. - PMC - PubMed
    1. Botev Z, Grotowski J, Kroese D (2010) Kernel density estimation via diffusion. Ann Stat 38: 2916–2957
    1. Brickner DG, Cajigas I, Fondufe-Mittendorf Y, Ahmed S, Lee PC, Widom J, Brickner JH (2007) H2A. Z-mediated localization of genes at the nuclear periphery confers epigenetic memory of previous transcriptional state. PLoS Biol 5: e81. - PMC - PubMed
    1. Cabal GG, Genovesio A, Rodriguez-Navarro S, Zimmer C, Gadal O, Lesne A, Buc H, Feuerbach-Fournier F, Olivo-Marin JC, Hurt EC, Nehrbass U (2006) SAGA interacting factors confine sub-diffusion of transcribed genes to the nuclear envelope. Nature 441: 770–773 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources