Automated identification of stratifying signatures in cellular subpopulations - PubMed (original) (raw)
Automated identification of stratifying signatures in cellular subpopulations
Robert V Bruggner et al. Proc Natl Acad Sci U S A. 2014.
Abstract
Elucidation and examination of cellular subpopulations that display condition-specific behavior can play a critical contributory role in understanding disease mechanism, as well as provide a focal point for development of diagnostic criteria linking such a mechanism to clinical prognosis. Despite recent advancements in single-cell measurement technologies, the identification of relevant cell subsets through manual efforts remains standard practice. As new technologies such as mass cytometry increase the parameterization of single-cell measurements, the scalability and subjectivity inherent in manual analyses slows both analysis and progress. We therefore developed Citrus (cluster identification, characterization, and regression), a data-driven approach for the identification of stratifying subpopulations in multidimensional cytometry datasets. The methodology of Citrus is demonstrated through the identification of known and unexpected pathway responses in a dataset of stimulated peripheral blood mononuclear cells measured by mass cytometry. Additionally, the performance of Citrus is compared with that of existing methods through the analysis of several publicly available datasets. As the complexity of flow cytometry datasets continues to increase, methods such as Citrus will be needed to aid investigators in the performance of unbiased--and potentially more thorough--correlation-based mining and inspection of cell subsets nested within high-dimensional datasets.
Keywords: biomarker discovery; informatics.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Fig. 1.
Overview of Citrus. Cells from all samples (i) are combined and clustered by using hierarchical clustering (ii). Descriptive features of identified cell subsets are calculated on a per-sample basis (iii) and used in conjunction with additional experimental metadata (iv) to train a regularized regression model predictive of the experimental endpoint (v). Predictive subset features are plotted as a function of experimental endpoint (vi), along with scatter or density plots of the corresponding informative subset (vii). In this example, the abundance of cells in subset A was found to differ between healthy and diseased samples (vi; H, subset A abundance in healthy patients; D, subset A abundance in diseased patients). Scatter plots show that cells in subset A have high expression of marker 1 and low expression of marker 2 relative to all measured cells (shown in gray).
Fig. 2.
Identification of stratifying cell subsets between unstimulated and BCR/FCR cross-linked PBMCs. (A) Estimated model accuracy and feature FDRs as a function of model regularization threshold. The regularization threshold selected to constrain the final model is shown by the dotted red line. (B) The first 4 of 117 identified stratifying features between the unstimulated and stimulated samples. Levels of phosphorylated S6 in cluster 75561 were found to be the best predictor of sample stimulation group. All stratifying features and corresponding clusters are shown in
SI Appendix, Figs. S1 and S2
. (C) Scatter plots showing lineage marker values from cells in cluster 75561. Expression of the same lineage markers in all other cells is shown in gray. High expression of CD45, CD20, and HLA-DR combined with low expression of CD7 and CD3 indicate that cluster 75561 comprises B cells. (D) S6 phosphorylation levels as a function of dasatinib concentration in cluster 75561. S6 phosphorylation induced by BCR/FCR cross-linking was reduced to baseline levels by dasatinib in a dose-dependent manner.
Fig. 3.
Clustering sensitivity of hierarchical clustering in FlowCAP-I datasets. (A) Clustering sensitivity measures from hierarchical clustering and other FlowCAP-I methods in FlowCAP-I datasets. Methods are ordered by their sensitivity across all datasets. For hierarchical clustering, the MCST was set to be 0.5% of the clustered dataset size. The number of manually gated populations used to calculate clustering sensitivity is reported in
SI Appendix, Table S2
. (B) Clustering sensitivity as a function of number of identified clusters for hierarchical clustering. Smaller minimum-cluster-size thresholds for hierarchical clustering increases clustering sensitivity but identifies more clusters (hclust MCST 0.01–5.0%).
Fig. 4.
Prognostic performance of Citrus and flowType Cox models. (A) Time-dependent ROC curves for Citrus and flowType models. Curves were evaluated at the mean patient survival time of 1,025 d. (B and C) Kaplan–Meier curves of AIDS-free survival time in testing patients. Each model (Citrus, B; and flowType, C) was used to estimate the relative risk for each patient, and average patient risk was calculated across all testing-cohort patients. Patients with higher- and lower-than-average risk were assigned to high- and low-risk groups, respectively. Differences in survival time between groups in testing patients were calculated by using the log-rank test. (D) Phenotype plots of clusters that were selected in all 10 cross-validation models. Both naive CD8+ T-Cells and Ki-67+ cells were identified as having prognostic utility in previous analyses.
Similar articles
- immunoClust--An automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets.
Sörensen T, Baumgart S, Durek P, Grützkau A, Häupl T. Sörensen T, et al. Cytometry A. 2015 Jul;87(7):603-15. doi: 10.1002/cyto.a.22626. Epub 2015 Apr 7. Cytometry A. 2015. PMID: 25850678 - Identifying Cell Populations in Flow Cytometry Data Using Phenotypic Signatures.
Pouyan MB, Nourani M. Pouyan MB, et al. IEEE/ACM Trans Comput Biol Bioinform. 2017 Jul-Aug;14(4):880-891. doi: 10.1109/TCBB.2016.2550428. Epub 2016 Apr 5. IEEE/ACM Trans Comput Biol Bioinform. 2017. PMID: 27076456 - SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation.
Mosmann TR, Naim I, Rebhahn J, Datta S, Cavenaugh JS, Weaver JM, Sharma G. Mosmann TR, et al. Cytometry A. 2014 May;85(5):422-33. doi: 10.1002/cyto.a.22445. Epub 2014 Feb 14. Cytometry A. 2014. PMID: 24532172 Free PMC article. - The end of gating? An introduction to automated analysis of high dimensional cytometry data.
Mair F, Hartmann FJ, Mrdjen D, Tosevski V, Krieg C, Becher B. Mair F, et al. Eur J Immunol. 2016 Jan;46(1):34-43. doi: 10.1002/eji.201545774. Epub 2015 Nov 30. Eur J Immunol. 2016. PMID: 26548301 Review. - Data-Driven Flow Cytometry Analysis.
Wang S, Brinkman RR. Wang S, et al. Methods Mol Biol. 2019;1989:245-265. doi: 10.1007/978-1-4939-9454-0_16. Methods Mol Biol. 2019. PMID: 31077110 Free PMC article. Review.
Cited by
- ImmCellTyper facilitates systematic mass cytometry data analysis for deep immune profiling.
Sun J, Choy D, Sompairac N, Jamshidi S, Mishto M, Kordasti S. Sun J, et al. Elife. 2024 Sep 6;13:RP95494. doi: 10.7554/eLife.95494. Elife. 2024. PMID: 39240985 Free PMC article. - Exploratory mass cytometry analysis reveals immunophenotypes of cancer treatment-related pneumonitis.
Yanagihara T, Hata K, Matsubara K, Kunimura K, Suzuki K, Tsubouchi K, Ikegame S, Baba Y, Fukui Y, Okamoto I. Yanagihara T, et al. Elife. 2024 Apr 12;12:RP87288. doi: 10.7554/eLife.87288. Elife. 2024. PMID: 38607373 Free PMC article. - SuperCellCyto: enabling efficient analysis of large scale cytometry datasets.
Putri GH, Howitt G, Marsh-Wakefield F, Ashhurst TM, Phipson B. Putri GH, et al. Genome Biol. 2024 Apr 8;25(1):89. doi: 10.1186/s13059-024-03229-3. Genome Biol. 2024. PMID: 38589921 Free PMC article. - DELVE: feature selection for preserving biological trajectories in single-cell data.
Ranek JS, Stallaert W, Milner JJ, Redick M, Wolff SC, Beltran AS, Stanley N, Purvis JE. Ranek JS, et al. Nat Commun. 2024 Mar 29;15(1):2765. doi: 10.1038/s41467-024-46773-z. Nat Commun. 2024. PMID: 38553455 Free PMC article. - Elevated levels of cell-free NKG2D-ligands modulate NKG2D surface expression and compromise NK cell function in severe COVID-19 disease.
Fernández-Soto D, García-Jiménez ÁF, Casasnovas JM, Valés-Gómez M, Reyburn HT. Fernández-Soto D, et al. Front Immunol. 2024 Feb 12;15:1273942. doi: 10.3389/fimmu.2024.1273942. eCollection 2024. Front Immunol. 2024. PMID: 38410511 Free PMC article.
References
- Basso G, et al. Risk of relapse of childhood acute lymphoblastic leukemia is predicted by flow cytometric measurement of residual disease on day 15 bone marrow. J Clin Oncol. 2009;27(31):5168–5174. - PubMed
- Freeman SD, et al. Prognostic relevance of treatment response measured by flow cytometric residual disease detection in older patients with acute myeloid leukemia. J Clin Oncol. 2013;31(32):4123–4131. - PubMed
Publication types
MeSH terms
Grants and funding
- UL1RR025744/RR/NCRR NIH HHS/United States
- HHSF223201210194C/PHS HHS/United States
- R01 CA130826/CA/NCI NIH HHS/United States
- RFA CA 09-009/CA/NCI NIH HHS/United States
- PN2 EY018228/EY/NEI NIH HHS/United States
- P01 CA034233/CA/NCI NIH HHS/United States
- U19 AI057229/AI/NIAID NIH HHS/United States
- N01-HV-00242/HV/NHLBI NIH HHS/United States
- RFA CA 09 011/CA/NCI NIH HHS/United States
- UL1 RR025744/RR/NCRR NIH HHS/United States
- UL1 TR001085/TR/NCATS NIH HHS/United States
- 5U54CA143907/CA/NCI NIH HHS/United States
- U54 CA149145/CA/NCI NIH HHS/United States
- HHSN272200700038C/AI/NIAID NIH HHS/United States
- PN2EY018228/EY/NEI NIH HHS/United States
- N01HV28183/HL/NHLBI NIH HHS/United States
- P01 CA034233-22A1/CA/NCI NIH HHS/United States
- U54 CA143907/CA/NCI NIH HHS/United States
- N01-HV-28183/HV/NHLBI NIH HHS/United States
- U54CA149145/CA/NCI NIH HHS/United States
- T15 LM007033/LM/NLM NIH HHS/United States
- 1R01CA130826/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials