Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types - PubMed (original) (raw)

Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types

Vincent van Unen et al. Nat Commun. 2017.

Abstract

Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for the data analysis. In particular, dimensionality reduction-based techniques like t-SNE offer single-cell resolution but are limited in the number of cells that can be analyzed. Here we introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for the analysis of mass cytometry data sets. HSNE constructs a hierarchy of non-linear similarities that can be interactively explored with a stepwise increase in detail up to the single-cell level. We apply HSNE to a study on gastrointestinal disorders and three other available mass cytometry data sets. We find that HSNE efficiently replicates previous observations and identifies rare cell populations that were previously missed due to downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a feature that makes it highly suitable for the analysis of massive high-dimensional data sets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Fig. 1

Fig. 1

Schematic overview of Cytosplore+HSNE for exploring the mass cytometry data. By creating a multi-level hierarchy of an illustrative 3D data set (a), we achieve a clear separation of different cell groups in an overview embedding (left panel b) that conserves non-linear relationships (i.e., follows the distance indicated by the dashed line in a, instead of the grey arrow) and more detail within the separate groups on the data level (right panel b). c Construction and exploration of the hierarchy. The hierarchy is constructed starting with the data level (left two columns). On the basis of the high-dimensional expression patterns of the cells, a weighted kNN graph is constructed, which is used to find representative cells used as landmarks in the next coarser level. By administering the area of influence (AoI) of the landmarks, cells/landmarks can be aggregated without losing the global structure of the underlying data or creating shortcuts. The exploration of the hierarchy is shown in the two rightmost columns. At the bottom, we see the overview level (in this example the 3rd level in the hierarchy), which shows that a group of landmarks has low expression in marker c (bottom-right panel). Selecting this group of landmarks for further exploration results in a look-up of the landmarks in the preceding level (neighborhood graph, intermediate level) that are in the AoI, with which a new embedding can be created at the 2nd level of the hierarchy (middle-right panel). Marker b shows a strong separation between the upper and lower landmarks at this level. Zooming-in on the landmarks with low expression of marker b reveals further separation in marker a at the lowest level, the full data level (top-right panel)

Fig. 2

Fig. 2

Gain of information by analyzing the mass cytometry data at full resolution with Cytosplore+HSNE. a Pie chart showing cellular composition of the mass cytometry data set. Color represents the subsets (N = 142), as identified in our previous study. Black represents the cells discarded by stochastic downsampling and grey represents the cells discarded by ACCENSE clustering. b Embeddings of the 1.1 million cells annotated in ref showing the top three levels of the HSNE-hierarchy (five levels in total). Color represents annotations as in a. Size of the landmarks is proportional to the number of cells in the AoI that each landmark represents. Bottom map shows density features depicting the local probability density of cells for the level 3 embedding, where black dots indicate the centroids of identified cluster partitions using GMS clustering. c Embeddings of all 5.2 million cells, again showing only the top three levels of the hierarchy (five levels in total). Colors as in a. Right panels visualize landmarks representing cells discarded by stochastic downsampling (black) and the cells discarded by ACCENSE (grey). Bottom map shows density features for the level 3 embedding as described in (b). d Frequency of annotated cells for 145 clusters identified by Cytosplore+HSNE at the third hierarchical level using GMS clustering in c. Color coding as in a

Fig. 3

Fig. 3

Analysis of the CD7+CD3− innate lymphocyte compartment in inflammatory intestinal diseases. a First HSNE level embedding of 5.2 million cells. Color represents arcsin5-transformed marker expression as indicated. Size of the landmarks represents AoI. Blue encirclement indicates selection of landmarks representing CD7+CD3− innate lymphocytes and CD4+ T cells further discussed in Fig. 5. b The major immune lineages, annotated on the basis of lineage marker expression. c Third HSNE level embedding of the CD7+CD3− innate lymphocytes (5.0 × 105 cells). Color represents arcsin5-transformed marker expression in top panels, and tissue-origin and clinical features in bottom panels. Blue encirclement indicates selection of landmarks representing CD127+ILC and ILC-like cells. d Third HSNE level embedding shows density features depicting the local probability density of cells, where black dots indicate the centroids of identified cluster partitions using GMS clustering. e Embedding of the CD127+ILC and ILC-like cells (6.0 × 104 cells) at single-cell resolution. Arrows indicate ILC1 (blue), ILC2 (orange) and ILC3 (green). Bottom-right panel shows corresponding cluster partitions using GMS clustering based on density features (top-right panel). f A heatmap summary of median expression values (same color coding as for the embeddings) of cell markers expressed by CD127 + ILC and ILC-like clusters identified in b and hierarchical clustering thereof. g Composition of cells for each cluster is represented graphically by a horizontal bar in which segment lengths represent the proportion of cells with: (left) tissue-of-origin, (middle) disease status and (right) sampling status

Fig. 4

Fig. 4

CD127+ILC and ILC-like subsets identified by Cytosplore+HSNE. Table showing cluster number, distinguishing phenotypic marker expression profiles and biological annotation for the clusters identified in Fig. 3e. Black color indicates clusters described in previous reports and red color additional unknown clusters. Hierarchical clustering of clusters based on marker expression profile shown in the heatmap depicted in Fig. 3f

Fig. 5

Fig. 5

Analysis of the CD4+ T-cell compartment in inflammatory intestinal diseases. a Third HSNE level embedding of the CD4+ T cells (1.4 × 106 cells, selected in Fig. 3). Color and size of landmarks as described in Fig. 3. Right panel shows density features for the level 3 embedding. Blue encirclement indicates selection of landmarks representing CD28−CD4+ T cells. b Embedding of the CD28−CD4+ T cells (2.6 × 104 cells) at single-cell resolution. Bottom-left panel shows yellow and black dashed encirclements based on CD56− and CD56+ expression, respectively. Three bottom-right panels show cells colored according to: (left) from subjects with different disease status (CeD, Crohn, EATLII, RCDII, and controls), (middle) sampling status (annotated subset, discarded by ACCENSE and downsampled) and (right) tissue-of-origin (blood and intestine)

Similar articles

Cited by

References

    1. Saeys Y, Gassen SV, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 2016;16:449–462. doi: 10.1038/nri.2016.56. - DOI - PubMed
    1. Qiu P, et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat. Biotechnol. 2011;29:886–891. doi: 10.1038/nbt.1991. - DOI - PMC - PubMed
    1. Zunder ER, Lujan E, Goltsev Y, Wernig M, Nolan GP. A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry. Cell Stem Cell. 2015;16:323–337. doi: 10.1016/j.stem.2015.01.015. - DOI - PMC - PubMed
    1. Levine JH, et al. Data-Driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015;162:184–197. doi: 10.1016/j.cell.2015.05.047. - DOI - PMC - PubMed
    1. Samusik N, Good Z, Spitzer MH, Davis KL, Nolan GP. Automated mapping of phenotype space with single-cell data. Nat. Methods. 2016;13:493–496. doi: 10.1038/nmeth.3863. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources