An approach for extensibly profiling the molecular states of cellular subpopulations - PubMed (original) (raw)

An approach for extensibly profiling the molecular states of cellular subpopulations

Lit-Hsin Loo et al. Nat Methods. 2009 Oct.

Abstract

Microscopy often reveals the existence of phenotypically distinct cellular subpopulations. However, additional characterization of observed subpopulations can be limited by the number of biomolecular markers that can be simultaneously monitored. Here we present a computational approach for extensibly profiling cellular subpopulations by freeing one or more imaging channels to monitor additional probes. In our approach, we trained classifiers to re-identify subpopulations accurately based on an enhanced collection of phenotypic features extracted from only a subset of the original markers. Then we constructed subpopulation profiles step-wise from replicate experiments, in which cells were labeled with different but overlapping marker sets. We applied our approach to identify molecular differences among subpopulations and to identify functional groupings of markers, in populations of differentiating mouse preadipocytes, polarizing human neutrophil-like cells and dividing human cancer cells.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Building extensible Virtual Phenotypic Profiles of cellular subpopulations

Schematic showing the three major steps: (a) Step 1, stain and image cells with an initial biomolecular marker set (MA, MB, and MC), and identify subpopulations based on an initial phenotypic feature set (F1, F2, and F3). (b) Step 2, free up an imaging channel by dropping a marker (MC), and adding more high-content features (F4 and F5) to partially compensate for the information lost. Train a classifier based on the new “reference” feature set to separate subpopulations. (c) Step 3, co-stain cells with the reduced marker set (MA and MB) and new markers (MD, ME, … , MZ), and use the trained classifier to separate cells into subpopulations. For each subpopulation, measure the averaged values of a specific chosen feature (Fs) for every marker, and combine the values into a common Virtual Phenotypic Profile. (Fi(Mj) = phenotypic feature i based on marker j; S1, S2 = subpopulations 1 and 2; pink and yellow squares = cells from S1 and S2; ellipsoids = boundaries of subpopulations; dashed lines or hyperplanes = trained classifiers)

Figure 2

Figure 2. High-content features partially compensated the decrease in classification performance due to dropping a marker

(a) Immunofluorescence images of 3T3-L1 cells near the centroids of the four identified 3T3-L1 subpopulations (S1 to S4) during adipogenesis. White curves = automated segmentation boundaries; scale bar = 20 μm. (b) 2D scatter plot showing the Day 9 distribution of the subpopulations identified based on LD and AdipoQ levels. The percentages of cells are shown in parentheses. Data shown was 25% sub-sampled for visualization (triangles = subpopulation centroids). (c) Average minimum class accuracy (MCA) of subpopulation classification after systematic dropping of each initial marker. The accuracies for the best classifier type for each feature set (Supplementary Fig. 1a) are shown. (d) Average MCA of subpopulation classification after dropping the LD marker, and adding high-content features from the AdipoQ marker. Shown are the accuracies for the best overall classifier type, SVM-RBF classifier (Supplementary Fig. 1b). All features implicitly contain information from the DNA marker, which was used for segmentation. (INT/ALL = all initial features; INT/AdipoQ = initial adiponectin feature, INT/LD = initial LD feature; HC/ALL = all additional high-content features; HC/SFFS = high-content feature subset selected by sequential floating forward search; error bars = standard errors of cross-validation; ** = P<0.010; *** = P<0.001; one-sided paired _t_-test; n = 3)

Figure 3

Figure 3. Virtual Phenotypic Profiles had low noise levels and were significantly different from population averages

(a) Distributions of the nuclear or cellular levels of five new markers in subpopulations identified based on all initial features (INT/ALL), initial adiponectin feature alone (INT/AdipoQ), or high-content adiponectin features (HC/SFFS). (PPARγ = peroxisome proliferator-activated receptor gamma; C/EBPα = CCAAT/enhancer binding protein alpha; PLINA = perilipin A; HSL = hormone sensitive lipase; pHSL = HSL phosphorylated at S565; error bars = medians ± upper and lower quartiles). (b) Absolute noise levels of the median cellular levels for the dropped marker (LD, left subpanel) and a new marker (PLINA, right subpanel) on subpopulations identified from either INT/AdipoQ or HC/SFFS (error bars = means ± standard errors; ** = P<0.01; *** = P<0.001; one-sided paired t-test; n = 3.) (c) Number of initial markers (left subpanel) or new markers (right subpanel) with HC/SFFS noise level significantly lower than INT/AdipoQ noise level (one-sided paired t-test; n = 3). (d) Multi-dimensional scaling plot showing the relative distances of the Virtual Phenotypic Profiles of the four identified subpopulations identified from HC/SFFS to the profiles of the whole cellular population, or randomly-selected subpopulations. P-value shown was the probability that a profile may be obtained from randomly selected subpopulations (Supplementary Methods).

Figure 4

Figure 4. High-content features from AdipoQ gave similar clustering and heatmap of Virtual Phenotypic Profiles as the initial features

Dendrograms and heatmaps showing the hierarchical clustering of the normalized Virtual Phenotypic Profiles for the subpopulations identified based on either (a) all the initial features (INT/ALL), (b) initial adiponectin feature alone (INT/AdipoQ), or (c) high-content adiponectin features (HC/SFFS). The profile of each marker was normalized by dividing the profile with its maximum value. The known functions of the tested biomolecules are indicated. The distance metric used was Euclidean distance, and the clustering algorithm used was average linkage.

Figure 5

Figure 5. Virtual Phenotypic Profiling of polarizing and of dividing cells

(a) Immunofluorescence images showing polarizing HL-60 cells near the centroids of the three identified subpopulations, S1, S2, and S3 (white curves = automated segmentation boundaries; scale bar = 20 μm). (b) Dendrogram and heatmap of normalized Virtual Phenotypic Profiles showing the hierarchical clustering of the biomolecules based on polarization indices and F-actin co-localization (Supplementary Methods). The known functions of the biomolecules are indicated (pPTEN = phosphorylated phosphatase and tensin homolog; pAKT = phosphorylated AKT; Hem1 = hematopoietic protein 1). (c) 2D scatter plot of the original (un-normalized) Virtual Phenotypic Profile for S3 cells showed that molecules from related pathways had similar values of polarization index and F-actin co-localization. (d) Immunofluorescence images showing dividing H460 cells near the centroids of the four identified subpopulations, S1, S2, S3, and S4 (corresponding to G1, S, G2 and M-phase cells respectively; white curves = automated segmentation boundaries; scale bar = 20 μm). (e) Dendrogram and heatmap of normalized Virtual Phenotypic Profiles showing the hierarchical clustering of biomolecules based on their total cellular levels in different subpopulations (pRb = phosphorylated retinoblastoma protein). (f) 2D scatter plots of the Virtual Phenotypic Profiles showing subcellular localization relative to DNA and expression levels of the tested biomolecules in different phases of the cell cycle.

Figure 6

Figure 6. Effectiveness of subpopulation profiling depends on the degree of cell-to-cell variability

(a) Simulated co-staining levels of a set of biomolecular markers on individual cells at different degrees of cell-to-cell variability (as measured by the average cell-to-cell dissimilarity). Fref, a reference feature used to divide the cells into two subpopulations. Fs, a specific feature of marker A or B. (dashed line = decision boundary of the classifier used.) (b) Medians and interquartile ranges of the single-cell distributions of Fs in each of the two subpopulations. These features formed the basis for the Virtual Phenotypic Profiles (error bars = interquartile ranges.) (c) The effectiveness of subpopulation profiling evaluated using three criteria.

Similar articles

Cited by

References

    1. Gallin JI. Human neutrophil heterogeneity exists, but is it meaningful? Blood. 1984;63:977–983. - PubMed
    1. Loo LH, et al. Heterogeneity in the physiological states and pharmacological responses of differentiating 3T3-L1 preadipocytes. J. Cell Biol. (in press) - PMC - PubMed
    1. Rubin H. The significance of biological heterogeneity. Cancer Metastasis Rev. 1990;9:1–20. - PubMed
    1. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453:544–547. - PMC - PubMed
    1. Slack MD, Martinez ED, Wu LF, Altschuler SJ. Characterizing heterogeneous cellular responses to perturbations. Proc. Natl. Acad. Sci. USA. 2008;105:19306–19311. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources