Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types - PubMed (original) (raw)
Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types
Vincent van Unen et al. Nat Commun. 2017.
Abstract
Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for the data analysis. In particular, dimensionality reduction-based techniques like t-SNE offer single-cell resolution but are limited in the number of cells that can be analyzed. Here we introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for the analysis of mass cytometry data sets. HSNE constructs a hierarchy of non-linear similarities that can be interactively explored with a stepwise increase in detail up to the single-cell level. We apply HSNE to a study on gastrointestinal disorders and three other available mass cytometry data sets. We find that HSNE efficiently replicates previous observations and identifies rare cell populations that were previously missed due to downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a feature that makes it highly suitable for the analysis of massive high-dimensional data sets.
Conflict of interest statement
The authors declare no competing financial interests.
Figures
Fig. 1
Schematic overview of Cytosplore+HSNE for exploring the mass cytometry data. By creating a multi-level hierarchy of an illustrative 3D data set (a), we achieve a clear separation of different cell groups in an overview embedding (left panel b) that conserves non-linear relationships (i.e., follows the distance indicated by the dashed line in a, instead of the grey arrow) and more detail within the separate groups on the data level (right panel b). c Construction and exploration of the hierarchy. The hierarchy is constructed starting with the data level (left two columns). On the basis of the high-dimensional expression patterns of the cells, a weighted kNN graph is constructed, which is used to find representative cells used as landmarks in the next coarser level. By administering the area of influence (AoI) of the landmarks, cells/landmarks can be aggregated without losing the global structure of the underlying data or creating shortcuts. The exploration of the hierarchy is shown in the two rightmost columns. At the bottom, we see the overview level (in this example the 3rd level in the hierarchy), which shows that a group of landmarks has low expression in marker c (bottom-right panel). Selecting this group of landmarks for further exploration results in a look-up of the landmarks in the preceding level (neighborhood graph, intermediate level) that are in the AoI, with which a new embedding can be created at the 2nd level of the hierarchy (middle-right panel). Marker b shows a strong separation between the upper and lower landmarks at this level. Zooming-in on the landmarks with low expression of marker b reveals further separation in marker a at the lowest level, the full data level (top-right panel)
Fig. 2
Gain of information by analyzing the mass cytometry data at full resolution with Cytosplore+HSNE. a Pie chart showing cellular composition of the mass cytometry data set. Color represents the subsets (N = 142), as identified in our previous study. Black represents the cells discarded by stochastic downsampling and grey represents the cells discarded by ACCENSE clustering. b Embeddings of the 1.1 million cells annotated in ref showing the top three levels of the HSNE-hierarchy (five levels in total). Color represents annotations as in a. Size of the landmarks is proportional to the number of cells in the AoI that each landmark represents. Bottom map shows density features depicting the local probability density of cells for the level 3 embedding, where black dots indicate the centroids of identified cluster partitions using GMS clustering. c Embeddings of all 5.2 million cells, again showing only the top three levels of the hierarchy (five levels in total). Colors as in a. Right panels visualize landmarks representing cells discarded by stochastic downsampling (black) and the cells discarded by ACCENSE (grey). Bottom map shows density features for the level 3 embedding as described in (b). d Frequency of annotated cells for 145 clusters identified by Cytosplore+HSNE at the third hierarchical level using GMS clustering in c. Color coding as in a
Fig. 3
Analysis of the CD7+CD3− innate lymphocyte compartment in inflammatory intestinal diseases. a First HSNE level embedding of 5.2 million cells. Color represents arcsin5-transformed marker expression as indicated. Size of the landmarks represents AoI. Blue encirclement indicates selection of landmarks representing CD7+CD3− innate lymphocytes and CD4+ T cells further discussed in Fig. 5. b The major immune lineages, annotated on the basis of lineage marker expression. c Third HSNE level embedding of the CD7+CD3− innate lymphocytes (5.0 × 105 cells). Color represents arcsin5-transformed marker expression in top panels, and tissue-origin and clinical features in bottom panels. Blue encirclement indicates selection of landmarks representing CD127+ILC and ILC-like cells. d Third HSNE level embedding shows density features depicting the local probability density of cells, where black dots indicate the centroids of identified cluster partitions using GMS clustering. e Embedding of the CD127+ILC and ILC-like cells (6.0 × 104 cells) at single-cell resolution. Arrows indicate ILC1 (blue), ILC2 (orange) and ILC3 (green). Bottom-right panel shows corresponding cluster partitions using GMS clustering based on density features (top-right panel). f A heatmap summary of median expression values (same color coding as for the embeddings) of cell markers expressed by CD127 + ILC and ILC-like clusters identified in b and hierarchical clustering thereof. g Composition of cells for each cluster is represented graphically by a horizontal bar in which segment lengths represent the proportion of cells with: (left) tissue-of-origin, (middle) disease status and (right) sampling status
Fig. 4
CD127+ILC and ILC-like subsets identified by Cytosplore+HSNE. Table showing cluster number, distinguishing phenotypic marker expression profiles and biological annotation for the clusters identified in Fig. 3e. Black color indicates clusters described in previous reports and red color additional unknown clusters. Hierarchical clustering of clusters based on marker expression profile shown in the heatmap depicted in Fig. 3f
Fig. 5
Analysis of the CD4+ T-cell compartment in inflammatory intestinal diseases. a Third HSNE level embedding of the CD4+ T cells (1.4 × 106 cells, selected in Fig. 3). Color and size of landmarks as described in Fig. 3. Right panel shows density features for the level 3 embedding. Blue encirclement indicates selection of landmarks representing CD28−CD4+ T cells. b Embedding of the CD28−CD4+ T cells (2.6 × 104 cells) at single-cell resolution. Bottom-left panel shows yellow and black dashed encirclements based on CD56− and CD56+ expression, respectively. Three bottom-right panels show cells colored according to: (left) from subjects with different disease status (CeD, Crohn, EATLII, RCDII, and controls), (middle) sampling status (annotated subset, discarded by ACCENSE and downsampled) and (right) tissue-of-origin (blood and intestine)
Similar articles
- Interactive Visual Exploration of 3D Mass Spectrometry Imaging Data Using Hierarchical Stochastic Neighbor Embedding Reveals Spatiomolecular Structures at Full Data Resolution.
Abdelmoula WM, Pezzotti N, Hölt T, Dijkstra J, Vilanova A, McDonnell LA, Lelieveldt BPF. Abdelmoula WM, et al. J Proteome Res. 2018 Mar 2;17(3):1054-1064. doi: 10.1021/acs.jproteome.7b00725. Epub 2018 Feb 15. J Proteome Res. 2018. PMID: 29430923 Free PMC article. - CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis.
Hollt T, Pezzotti N, van Unen V, Koning F, Lelieveldt BPF, Vilanova A. Hollt T, et al. IEEE Trans Vis Comput Graph. 2018 Jan;24(1):739-748. doi: 10.1109/TVCG.2017.2744318. Epub 2017 Aug 29. IEEE Trans Vis Comput Graph. 2018. PMID: 28866537 - Categorical Analysis of Human T Cell Heterogeneity with One-Dimensional Soli-Expression by Nonlinear Stochastic Embedding.
Cheng Y, Wong MT, van der Maaten L, Newell EW. Cheng Y, et al. J Immunol. 2016 Jan 15;196(2):924-32. doi: 10.4049/jimmunol.1501928. Epub 2015 Dec 14. J Immunol. 2016. PMID: 26667171 Free PMC article. - Mass cytometry: blessed with the curse of dimensionality.
Newell EW, Cheng Y. Newell EW, et al. Nat Immunol. 2016 Jul 19;17(8):890-5. doi: 10.1038/ni.3485. Nat Immunol. 2016. PMID: 27434000 Review. - Deep Profiling Human T Cell Heterogeneity by Mass Cytometry.
Cheng Y, Newell EW. Cheng Y, et al. Adv Immunol. 2016;131:101-34. doi: 10.1016/bs.ai.2016.02.002. Epub 2016 Apr 8. Adv Immunol. 2016. PMID: 27235682 Review.
Cited by
- Phenotypic Alterations in Erythroid Nucleated Cells of Spleen and Bone Marrow in Acute Hypoxia.
Nazarov K, Perik-Zavodskii R, Perik-Zavodskaia O, Alrhmoun S, Volynets M, Shevchenko J, Sennikov S. Nazarov K, et al. Cells. 2023 Dec 10;12(24):2810. doi: 10.3390/cells12242810. Cells. 2023. PMID: 38132130 Free PMC article. - High-dimensional mass cytometry analysis of NK cell alterations in AML identifies a subgroup with adverse clinical outcome.
Chretien AS, Devillier R, Granjeaud S, Cordier C, Demerle C, Salem N, Wlosik J, Orlanducci F, Gorvel L, Fattori S, Hospital MA, Pakradouni J, Gregori E, Paul M, Rochigneux P, Pagliardini T, Morey M, Fauriat C, Dulphy N, Toubert A, Luche H, Malissen M, Blaise D, Nunès JA, Vey N, Olive D. Chretien AS, et al. Proc Natl Acad Sci U S A. 2021 Jun 1;118(22):e2020459118. doi: 10.1073/pnas.2020459118. Proc Natl Acad Sci U S A. 2021. PMID: 34050021 Free PMC article. - Controlled human hookworm infection remodels plasmacytoid dendritic cells and regulatory T cells towards profiles seen in natural infections in endemic areas.
Manurung MD, Sonnet F, Hoogerwerf MA, Janse JJ, Kruize Y, Bes-Roeleveld L, König M, Loukas A, Dewals BG, Supali T, Jochems SP, Roestenberg M, Coppola M, Yazdanbakhsh M. Manurung MD, et al. Nat Commun. 2024 Jul 16;15(1):5960. doi: 10.1038/s41467-024-50313-0. Nat Commun. 2024. PMID: 39013877 Free PMC article. - Heterogeneity of circulating CD8 T-cells specific to islet, neo-antigen and virus in patients with type 1 diabetes mellitus.
Laban S, Suwandi JS, van Unen V, Pool J, Wesselius J, Höllt T, Pezzotti N, Vilanova A, Lelieveldt BPF, Roep BO. Laban S, et al. PLoS One. 2018 Aug 8;13(8):e0200818. doi: 10.1371/journal.pone.0200818. eCollection 2018. PLoS One. 2018. PMID: 30089176 Free PMC article. - Tissue-resident memory T cells populate the human brain.
Smolders J, Heutinck KM, Fransen NL, Remmerswaal EBM, Hombrink P, Ten Berge IJM, van Lier RAW, Huitinga I, Hamann J. Smolders J, et al. Nat Commun. 2018 Nov 2;9(1):4593. doi: 10.1038/s41467-018-07053-9. Nat Commun. 2018. PMID: 30389931 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources