Automated mapping of phenotype space with single-cell data - PubMed (original) (raw)
Automated mapping of phenotype space with single-cell data
Nikolay Samusik et al. Nat Methods. 2016 Jun.
Abstract
Accurate identification of cell subsets in complex populations is key to discovering novelty in multidimensional single-cell experiments. We present X-shift (http://web.stanford.edu/\~samusik/vortex/), an algorithm that processes data sets using fast k-nearest-neighbor estimation of cell event density and arranges populations by marker-based classification. X-shift enables automated cell-subset clustering and access to biological insights that 'prior knowledge' might prevent the researcher from discovering.
Conflict of interest statement
Competing Financial Interests
Authors declare no competing financial interests
Figures
Figure 1. X-shift algorithm design and validation
(a–c) Workflow of X-shift algorithm (a) Synthetic 2-dimensional dataset with three ‘point clouds’. (b) K nearest neighbors density estimation. Example sets of 20 nearest neighbors are shown for 3 data points. (c) Connecting datapoints against the gradient of density estimate and finding local maxima (d) Testing neighboring populations for density-separation. (e) X-shift clustering of synthetic data. Randomly generated datasets with 10 populations in 15 dimensions, 20 populations in 25 dimensions and 30 populations in 35 dimensions were clustered with X-shift, varying the number of nearest neighbors (K) used for density estimate from 100 to 5. Blue line shows the fitting of the curve using line-plus-exponent regression. (f) Assessment of X-shift performance in automatic parameter-finding mode on 12-color FlowCAP I Normal Donor dataset, compared to FlowCAP I Challenge I submissions. (g) The scheme of evaluation of X-shift performance against hand-gated CyTOF data. (h) X-shift clustering of mouse bone marrow data at various K settings were compared to hand-gates and the median F-measures over 10 biological replicates were plotted as stacked areas. Population labels are positioned to the point where each F-measure first reaches 90% of its maximum. (i) Results of X-shift analysis of bone marrow data when K was automatically selected for each of the 10 replicates. Bars show median values across replicates and error bars represent inter-quartile range.
Figure 2. X-shift clustering reveals novel features of mouse hematopoietic differentiation
(a) Clustering of bone marrow replicate #7 with X-shift (K = 20 was auto-selected by the switch-point-finding algorithm) represented in a Divisive Marker Tree. Node radii are proportional to the cubic root of the number of cell events contained at each node. The tree is a nested representation, i.e. parent nodes contain the union of cell events of its children. Labels on nodes show marker cutoff values that define each sub-branch, expressed on the arsinh(x/5) scale. (b) X-shift finds biologically relevant subsets within the hand-gated cell populations (Bone marrow replicate #7, X-shift K = 20). (c) Single-cell Force-Directed Layout of Mouse Bone Marrow #7 (X-shift K = 20, color-coded for 48 clusters). Color code shows X-shift clusters and grey boxes show locations of hand-gated cell populations. (d) Force-directed layout of populations related to monocyte development. Color code represents expression levels of indicated markers. (e) Force-directed layout of populations related to pDC development. Color code represents expression levels of indicated markers.
Similar articles
- Fast agglomerative clustering using a k-nearest neighbor graph.
Fränti P, Virmajoki O, Hautamäki V. Fränti P, et al. IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1875-81. doi: 10.1109/TPAMI.2006.227. IEEE Trans Pattern Anal Mach Intell. 2006. PMID: 17063692 - Fast k-nearest neighbor classification using cluster-based trees.
Zhang B, Srihari SN. Zhang B, et al. IEEE Trans Pattern Anal Mach Intell. 2004 Apr;26(4):525-8. doi: 10.1109/TPAMI.2004.1265868. IEEE Trans Pattern Anal Mach Intell. 2004. PMID: 15382657 - Implementation of high-dimensional feature map for segmentation of MR images.
He R, Sajja BR, Narayana PA. He R, et al. Ann Biomed Eng. 2005 Oct;33(10):1439-48. doi: 10.1007/s10439-005-5888-3. Ann Biomed Eng. 2005. PMID: 16240091 Free PMC article. - Efficient segmentation by sparse pixel classification.
Dam EB, Loog M. Dam EB, et al. IEEE Trans Med Imaging. 2008 Oct;27(10):1525-34. doi: 10.1109/TMI.2008.923961. IEEE Trans Med Imaging. 2008. PMID: 18815104 - Blind camera fingerprinting and image clustering.
Bloy GJ. Bloy GJ. IEEE Trans Pattern Anal Mach Intell. 2008 Mar;30(3):532-5. doi: 10.1109/TPAMI.2007.1183. IEEE Trans Pattern Anal Mach Intell. 2008. PMID: 18195445
Cited by
- Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front.
Schürch CM, Bhate SS, Barlow GL, Phillips DJ, Noti L, Zlobec I, Chu P, Black S, Demeter J, McIlwain DR, Kinoshita S, Samusik N, Goltsev Y, Nolan GP. Schürch CM, et al. Cell. 2020 Sep 3;182(5):1341-1359.e19. doi: 10.1016/j.cell.2020.07.005. Epub 2020 Aug 6. Cell. 2020. PMID: 32763154 Free PMC article. - Novel multiparameter correlates of Coxiella burnetii infection and vaccination identified by longitudinal deep immune profiling.
Reeves PM, Raju Paul S, Baeten L, Korek SE, Yi Y, Hess J, Sobell D, Scholzen A, Garritsen A, De Groot AS, Moise L, Brauns T, Bowen R, Sluder AE, Poznansky MC. Reeves PM, et al. Sci Rep. 2020 Aug 7;10(1):13311. doi: 10.1038/s41598-020-69327-x. Sci Rep. 2020. PMID: 32770104 Free PMC article. - A disrupted FOXP3 transcriptional signature underpins systemic regulatory T cell insufficiency in early pregnancy failure.
Moldenhauer LM, Foyle KL, Wilson JJ, Wong YY, Sharkey DJ, Green ES, Barry SC, Hull ML, Robertson SA. Moldenhauer LM, et al. iScience. 2024 Jan 23;27(2):108994. doi: 10.1016/j.isci.2024.108994. eCollection 2024 Feb 16. iScience. 2024. PMID: 38327801 Free PMC article. - Identification of stem cells from large cell populations with topological scoring.
Sardiu ME, Box AC, Haug JS, Washburn MP. Sardiu ME, et al. Mol Omics. 2021 Feb 1;17(1):59-65. doi: 10.1039/d0mo00039f. Epub 2020 Sep 14. Mol Omics. 2021. PMID: 32924050 Free PMC article. - Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.
Xia L, Lee C, Li JJ. Xia L, et al. Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y. Nat Commun. 2024. PMID: 38409103 Free PMC article.
References
- Biau G, Chazal F, Cohen-Steiner D, Devroye L, Rodríguez C. A weighted k-nearest neighbor density estimate for geometric inference. Electron J Stat. 2011;5:204–237.
Publication types
MeSH terms
Grants and funding
- T32 AI007290/AI/NIAID NIH HHS/United States
- R01 GM109836/GM/NIGMS NIH HHS/United States
- R33 CA183654/CA/NCI NIH HHS/United States
- U19 AI057229/AI/NIAID NIH HHS/United States
- P01 AI036535/AI/NIAID NIH HHS/United States
- U19 AI100627/AI/NIAID NIH HHS/United States
- R01 HL120724/HL/NHLBI NIH HHS/United States
- R21 CA183660/CA/NCI NIH HHS/United States
- R33 CA183692/CA/NCI NIH HHS/United States
- R01 CA184968/CA/NCI NIH HHS/United States
- UH2 AR067676/AR/NIAMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources