Automated identification of stratifying signatures in cellular subpopulations - PubMed (original) (raw)

Automated identification of stratifying signatures in cellular subpopulations

Robert V Bruggner et al. Proc Natl Acad Sci U S A. 2014.

Abstract

Elucidation and examination of cellular subpopulations that display condition-specific behavior can play a critical contributory role in understanding disease mechanism, as well as provide a focal point for development of diagnostic criteria linking such a mechanism to clinical prognosis. Despite recent advancements in single-cell measurement technologies, the identification of relevant cell subsets through manual efforts remains standard practice. As new technologies such as mass cytometry increase the parameterization of single-cell measurements, the scalability and subjectivity inherent in manual analyses slows both analysis and progress. We therefore developed Citrus (cluster identification, characterization, and regression), a data-driven approach for the identification of stratifying subpopulations in multidimensional cytometry datasets. The methodology of Citrus is demonstrated through the identification of known and unexpected pathway responses in a dataset of stimulated peripheral blood mononuclear cells measured by mass cytometry. Additionally, the performance of Citrus is compared with that of existing methods through the analysis of several publicly available datasets. As the complexity of flow cytometry datasets continues to increase, methods such as Citrus will be needed to aid investigators in the performance of unbiased--and potentially more thorough--correlation-based mining and inspection of cell subsets nested within high-dimensional datasets.

Keywords: biomarker discovery; informatics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Overview of Citrus. Cells from all samples (i) are combined and clustered by using hierarchical clustering (ii). Descriptive features of identified cell subsets are calculated on a per-sample basis (iii) and used in conjunction with additional experimental metadata (iv) to train a regularized regression model predictive of the experimental endpoint (v). Predictive subset features are plotted as a function of experimental endpoint (vi), along with scatter or density plots of the corresponding informative subset (vii). In this example, the abundance of cells in subset A was found to differ between healthy and diseased samples (vi; H, subset A abundance in healthy patients; D, subset A abundance in diseased patients). Scatter plots show that cells in subset A have high expression of marker 1 and low expression of marker 2 relative to all measured cells (shown in gray).

Fig. 2.

Fig. 2.

Identification of stratifying cell subsets between unstimulated and BCR/FCR cross-linked PBMCs. (A) Estimated model accuracy and feature FDRs as a function of model regularization threshold. The regularization threshold selected to constrain the final model is shown by the dotted red line. (B) The first 4 of 117 identified stratifying features between the unstimulated and stimulated samples. Levels of phosphorylated S6 in cluster 75561 were found to be the best predictor of sample stimulation group. All stratifying features and corresponding clusters are shown in

SI Appendix, Figs. S1 and S2

. (C) Scatter plots showing lineage marker values from cells in cluster 75561. Expression of the same lineage markers in all other cells is shown in gray. High expression of CD45, CD20, and HLA-DR combined with low expression of CD7 and CD3 indicate that cluster 75561 comprises B cells. (D) S6 phosphorylation levels as a function of dasatinib concentration in cluster 75561. S6 phosphorylation induced by BCR/FCR cross-linking was reduced to baseline levels by dasatinib in a dose-dependent manner.

Fig. 3.

Fig. 3.

Clustering sensitivity of hierarchical clustering in FlowCAP-I datasets. (A) Clustering sensitivity measures from hierarchical clustering and other FlowCAP-I methods in FlowCAP-I datasets. Methods are ordered by their sensitivity across all datasets. For hierarchical clustering, the MCST was set to be 0.5% of the clustered dataset size. The number of manually gated populations used to calculate clustering sensitivity is reported in

SI Appendix, Table S2

. (B) Clustering sensitivity as a function of number of identified clusters for hierarchical clustering. Smaller minimum-cluster-size thresholds for hierarchical clustering increases clustering sensitivity but identifies more clusters (hclust MCST 0.01–5.0%).

Fig. 4.

Fig. 4.

Prognostic performance of Citrus and flowType Cox models. (A) Time-dependent ROC curves for Citrus and flowType models. Curves were evaluated at the mean patient survival time of 1,025 d. (B and C) Kaplan–Meier curves of AIDS-free survival time in testing patients. Each model (Citrus, B; and flowType, C) was used to estimate the relative risk for each patient, and average patient risk was calculated across all testing-cohort patients. Patients with higher- and lower-than-average risk were assigned to high- and low-risk groups, respectively. Differences in survival time between groups in testing patients were calculated by using the log-rank test. (D) Phenotype plots of clusters that were selected in all 10 cross-validation models. Both naive CD8+ T-Cells and Ki-67+ cells were identified as having prognostic utility in previous analyses.

Similar articles

Cited by

References

    1. Kotecha N, et al. Single-cell profiling identifies aberrant STAT5 activation in myeloid malignancies with specific clinical and biologic correlates. Cancer Cell. 2008;14(4):335–343. - PMC - PubMed
    1. Basso G, et al. Risk of relapse of childhood acute lymphoblastic leukemia is predicted by flow cytometric measurement of residual disease on day 15 bone marrow. J Clin Oncol. 2009;27(31):5168–5174. - PubMed
    1. Irish JM, et al. B-cell signaling networks reveal a negative prognostic human lymphoma cell subset that emerges during tumor progression. Proc Natl Acad Sci USA. 2010;107(29):12747–12754. - PMC - PubMed
    1. Ganesan A, et al. Infectious Disease Clinical Research Program HIV Working Group Immunologic and virologic events in early HIV infection predict subsequent rate of progression. J Infect Dis. 2010;201(2):272–284. - PMC - PubMed
    1. Freeman SD, et al. Prognostic relevance of treatment response measured by flow cytometric residual disease detection in older patients with acute myeloid leukemia. J Clin Oncol. 2013;31(32):4123–4131. - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources