Single-cell transcriptional diversity is a hallmark of developmental potential - PubMed (original) (raw)

. 2020 Jan 24;367(6476):405-411.

doi: 10.1126/science.aax0249.

Shaheen S Sikandar 1, Daniel J Wesche 1, Anoop Manjunath 1, Anjan Bharadwaj 1, Mark J Berger 2, Francisco Ilagan 1, Angera H Kuo 1, Robert W Hsieh 1, Shang Cai 3, Maider Zabala 1, Ferenc A Scheeren 4, Neethan A Lobo 1, Dalong Qian 1, Feiqiao B Yu 5, Frederick M Dirbas 6, Michael F Clarke 1 7, Aaron M Newman 8 9

Affiliations

Single-cell transcriptional diversity is a hallmark of developmental potential

Gunsagar S Gulati et al. Science. 2020.

Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful approach for reconstructing cellular differentiation trajectories. However, inferring both the state and direction of differentiation is challenging. Here, we demonstrate a simple, yet robust, determinant of developmental potential-the number of expressed genes per cell-and leverage this measure of transcriptional diversity to develop a computational framework (CytoTRACE) for predicting differentiation states from scRNA-seq data. When applied to diverse tissue types and organisms, CytoTRACE outperformed previous methods and nearly 19,000 annotated gene sets for resolving 52 experimentally determined developmental trajectories. Additionally, it facilitated the identification of quiescent stem cells and revealed genes that contribute to breast tumorigenesis. This study thus establishes a key RNA-based feature of developmental potential and a platform for delineation of cellular hierarchies.

Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

PubMed Disclaimer

Conflict of interest statement

Competing interests: G.S.G., S.S.S, M.F.C., and A.M.N. are inventors on a provisional patent application filed by Stanford University (US 62/852,231) that covers methods described in this work.

Figures

Fig. 1.

Fig. 1.. RNA-based determinants of developmental potential.

(A and B) In silico screen for correlates of cellular differentiation status in scRNA-seq data. (A) Depiction of the scoring scheme. Each phenotype was assigned a rank on the basis of its known differentiation status (less differentiated = lower rank), and the values of each RNA-based feature (fig. S1A) were mean-aggregated by rank for each dataset (higher value = lower rank). Performance was calculated as the mean Spearman correlation between known and predicted ranks across all nine training datasets (table S1). (B) Performance of all evaluated RNA-based features for predicting differentiation states in the training cohort, ordered by mean Spearman correlations (fig. S1 and table S2). (C) The developmental ordering of 30 mouse cell phenotypes across 17 developmental stages shown as a function of single-cell gene counts (table S3). Data are expressed as means ± 95% confidence intervals. The linear regression line and coefficient of determination (R 2) are shown. (D) Performance of gene counts for ordering C. elegans embryogenesis. (Left) Radial tree map showing gene counts for each cell type with available scRNA-seq data from a recent study (48). NA, not available. Embryogenesis originates at the center of the plot [P0 (zygote)] and moves outwards towards terminally differentiated cells, with concentric rings representing sequential cell divisions. (Right) Boxplot showing weighted Spearman correlations between single-cell gene counts and developmental lineages with available transcriptomic data (n = 456). (E) Association between single-cell gene counts and chromatin accessibility in cells from an in vitro differentiation series of purified phenotypes from the human paraxial mesoderm lineage [Mesoderm (C1) dataset; table S1]. (Top) Association of single-cell gene counts with differentiation. Each point represents a cell colored by known phenotype (below). (Bottom) Heat map showing chromatin accessibility profiles for the same phenotypes as above. Peaks are centered by their summit, defined as the base with maximum coverage, shown within a window of 1 kb (±0.5 kb), and ordered top to bottom within each phenotype by decreasing total signal per peak. Cell type abbreviations are defined in materials and methods.

Fig. 2.

Fig. 2.. Development and validation of CytoTRACE.

(A) Schematic overview of the CytoTRACE framework applied to the hESC in vitro differentiation (C1) dataset (materials and methods and table S1). (B) Scatterplot comparing the average performance of 18,706 annotated gene sets, four stemness inference methods, gene counts, GCS, and CytoTRACE in the training and validation cohorts (table S2). (C) Boxplots showing the single-cell performance of CytoTRACE against RNA-based features and methods in the validation cohort (n = 33 datasets; table S2). Each point represents the Spearman correlation, weighted by number of cells per phenotype, between predicted and known differentiation states for a given dataset, calculated as described in materials and methods. Statistical significance was assessed by a one-sided paired Wilcoxon signed-rank test against CytoTRACE (table S4).

Fig. 3.

Fig. 3.. Characterization of developmental hierarchies and quiescent stem cells using CytoTRACE.

(A) Impact of batch correction (materials and methods) on two datasets of mouse bone marrow differentiation: Bone Marrow (10x) and Bone Marrow (Smart-seq2) (table S1). diff, differentiated. (B) Combined application of CytoTRACE and Monocle 2 to mouse bone marrow differentiation [Bone marrow (Smart-seq2) dataset] (table S1). (Left) Multi-lineage tree inferred by Monocle 2 showing all 23 possible pseudotimes when the root is unknown. (Right) Automatic selection of the correct root by CytoTRACE. (C and D) Prioritization of quiescent and cycling HSCs in index-sorted scRNA-seq data of mouse hematopoiesis [Bone Marrow (Smart-seq2) dataset] (table S1). All plots are identically ordered by CytoTRACE. (C) Boxplots showing CytoTRACE values for candidate cycling HSCs (n = 31), long-term or quiescent HSCs (n = 30), early immature B cells (n = 285), late immature B cells (n = 863), and mature B cells (n = 700). HSCs, long-term or quiescent HSCs, and proliferating cells were defined on the basis of expression of Fgd5 (49), Hoxb5 (35), and Mki67, respectively. Although boxplots represent all analyzed cells, a maximum of 50 cells per phenotype are displayed as points for clarity. Statistical significance was assessed by a two-sided Wilcoxon signed-rank test. **P = 0.003. (D) Top: RNA content per cell, shown as a function of CytoTRACE and displayed as the moving average of 200 cells. Bottom: Expression of Fgd5 and Hoxb5 displayed as a smoothing spline over the moving average of 200 cells. Data from monocytic and granulocytic lineages are consistent with the above results.

Fig. 4.

Fig. 4.. Identification of immature cell markers in normal and malignant human breast LPs using CytoTRACE.

(A) Principal component analysis of scRNA-seq profiles from 1902 human breast epithelial cells, colored according to subpopulations (top) and patient (bottom). (B) Heat map showing genes from adjacent normal LPs rank-ordered by their Pearson correlation with CytoTRACE and colored according to a clonogenicity index, defined as the log2 fold change in expression between highly and lowly clonogenic LPs from normal human breast (39) (materials and methods). The clonogenicity index is displayed as a moving average of 200 genes. Key genes associated with less (ALDH1A3, MFGE8) and more (GATA3, FOXA1, AR) differentiated normal LPs are indicated. (C) Enrichment of genes associated with human breast tumorigenesis [RNAi dropout viability screen (41)] within a ranked list of genes expressed by malignant LPs, rank-ordered by their Pearson correlation with CytoTRACE. Enrichment was calculated with preranked gene set enrichment analysis. NES, normalized enrichment score; ES, enrichment score. (D) Identification of candidate tumorigenic genes associated with immature malignant human LPs. (Top) Genes rank-ordered by the difference in their Pearson correlations with CytoTRACE in malignant LPs versus malignant mature luminal cells. The top 15 genes that are predicted to be specifically associated with less differentiated LPs are indicated on the left. (Bottom) Schema for the identification of genes that are ranked as above, but that are also more highly expressed in malignant LPs than MLs (log2 fold change > 0; Benjamini-Hochberg adjusted P < 0.05, unpaired two-sided _t_-test) and that are expressed by a subpopulation of LPs (<20% of cells). The top 5 filtered genes are shown (right). (E) Schema for shRNA knockdown of GULP1 in a human breast cancer xenograft model. (F) Growth of human breast cancer xenografts from two patients, one with TNBC (left) and one with ER+ luminal-type cancer (right), after lentiviral transduction with empty vector or shRNA targeting GULP1. Tumor volumes after knockdown with shGULP1 #1 (orange) and shGULP1 #2 (red) were indistinguishable in COH69 xenografts (right). Data are expressed as means ± SD (n = 3 mice). Statistical significance was assessed by a two-way ANOVA. **** P < 0.0001.

Comment in

Similar articles

Cited by

References

    1. Visvader JE, Clevers H, Tissue-specific designs of stem cell hierarchies. Nat Cell Biol 18, 349–355 (2016). - PubMed
    1. Kretzschmar K, Watt FM, Lineage tracing. Cell 148, 33–45 (2012). - PubMed
    1. Seita J, Weissman IL, Hematopoietic stem cell: self-renewal versus differentiation. Wiley Interdiscip Rev Syst Biol Med 2, 640–653 (2010). - PMC - PubMed
    1. Sulston JE, Schierenberg E, White JG, Thomson JN, The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev Biol 100, 64–119 (1983). - PubMed
    1. Kester L, van Oudenaarden A, Single-Cell Transcriptomics Meets Lineage Tracing. Cell Stem Cell 23, 166–179 (2018). - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources