Automated mapping of phenotype space with single-cell data - PubMed (original) (raw)

Automated mapping of phenotype space with single-cell data

Nikolay Samusik et al. Nat Methods. 2016 Jun.

Abstract

Accurate identification of cell subsets in complex populations is key to discovering novelty in multidimensional single-cell experiments. We present X-shift (http://web.stanford.edu/\~samusik/vortex/), an algorithm that processes data sets using fast k-nearest-neighbor estimation of cell event density and arranges populations by marker-based classification. X-shift enables automated cell-subset clustering and access to biological insights that 'prior knowledge' might prevent the researcher from discovering.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

Authors declare no competing financial interests

Figures

Figure 1

Figure 1. X-shift algorithm design and validation

(a–c) Workflow of X-shift algorithm (a) Synthetic 2-dimensional dataset with three ‘point clouds’. (b) K nearest neighbors density estimation. Example sets of 20 nearest neighbors are shown for 3 data points. (c) Connecting datapoints against the gradient of density estimate and finding local maxima (d) Testing neighboring populations for density-separation. (e) X-shift clustering of synthetic data. Randomly generated datasets with 10 populations in 15 dimensions, 20 populations in 25 dimensions and 30 populations in 35 dimensions were clustered with X-shift, varying the number of nearest neighbors (K) used for density estimate from 100 to 5. Blue line shows the fitting of the curve using line-plus-exponent regression. (f) Assessment of X-shift performance in automatic parameter-finding mode on 12-color FlowCAP I Normal Donor dataset, compared to FlowCAP I Challenge I submissions. (g) The scheme of evaluation of X-shift performance against hand-gated CyTOF data. (h) X-shift clustering of mouse bone marrow data at various K settings were compared to hand-gates and the median F-measures over 10 biological replicates were plotted as stacked areas. Population labels are positioned to the point where each F-measure first reaches 90% of its maximum. (i) Results of X-shift analysis of bone marrow data when K was automatically selected for each of the 10 replicates. Bars show median values across replicates and error bars represent inter-quartile range.

Figure 2

Figure 2. X-shift clustering reveals novel features of mouse hematopoietic differentiation

(a) Clustering of bone marrow replicate #7 with X-shift (K = 20 was auto-selected by the switch-point-finding algorithm) represented in a Divisive Marker Tree. Node radii are proportional to the cubic root of the number of cell events contained at each node. The tree is a nested representation, i.e. parent nodes contain the union of cell events of its children. Labels on nodes show marker cutoff values that define each sub-branch, expressed on the arsinh(x/5) scale. (b) X-shift finds biologically relevant subsets within the hand-gated cell populations (Bone marrow replicate #7, X-shift K = 20). (c) Single-cell Force-Directed Layout of Mouse Bone Marrow #7 (X-shift K = 20, color-coded for 48 clusters). Color code shows X-shift clusters and grey boxes show locations of hand-gated cell populations. (d) Force-directed layout of populations related to monocyte development. Color code represents expression levels of indicated markers. (e) Force-directed layout of populations related to pDC development. Color code represents expression levels of indicated markers.

Similar articles

Cited by

References

    1. Zunder ER, et al. Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm. Nat Protoc. 2015;10:316–333. - PMC - PubMed
    1. Bendall SC, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332:687–96. - PMC - PubMed
    1. Bendall SC, et al. Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development. Cell. 2014;157:714–725. - PMC - PubMed
    1. Aghaeepour N, et al. Critical assessment of automated flow cytometry data analysis techniques. Nat Methods. 2013;10:228–38. - PMC - PubMed
    1. Biau G, Chazal F, Cohen-Steiner D, Devroye L, Rodríguez C. A weighted k-nearest neighbor density estimate for geometric inference. Electron J Stat. 2011;5:204–237.

Publication types

MeSH terms

Grants and funding

LinkOut - more resources