Integrating single-cell transcriptomic data across different conditions, technologies, and species - PubMed (original) (raw)

. 2018 Jun;36(5):411-420.

doi: 10.1038/nbt.4096. Epub 2018 Apr 2.

Affiliations

Integrating single-cell transcriptomic data across different conditions, technologies, and species

Andrew Butler et al. Nat Biotechnol. 2018 Jun.

Abstract

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1.

Figure 1.. Overview of Seurat alignment of single cell RNA-seq datasets

(A) Toy example of heterogeneous populations profiled in a case/control study after drug treatment. Cells across four types are plotted with different symbols, while stimulation condition is encoded by color. In a standard workflow, cells often cluster both by cell type and stimulation condition, creating challenges for downstream comparative analysis. (B) The Seurat alignment procedure uses canonical correlation analysis to identify shared correlation structures across datasets, and aligns these dimensions using dynamic time warping. After alignment, cells are embedded in a shared low-dimensional space (visualized here in 2D with tSNE). (C) After alignment, a single integrated clustering can identify conserved cell types across conditions, allowing for comparative analysis to identify shifts in cell type proportion, as well as cell-type specific transcriptional responses to drug treatment.

Figure 2.

Figure 2.. Integrated analysis of resting and stimulated PBMC

(A-C) tSNE plots of 14,039 human PBMCs split between control and IFN-β-stimulated conditions, prior to (A) and post (B) alignment. After alignment, cells across stimulation conditions group together based on shared cell type, allowing for a single joint clustering (C) to detect 13 immune populations. (D) Integrated analysis reveals markers of cell types (conserved across stimulation conditions), uniform markers of IFN-β response (independent of cell type), and components of the IFN-β response that vary across cell types. The size of each circle reflects the percentage of cells in a cluster where the gene is detected, and the color reflects the average expression level within each cluster. (E) The fraction of cells (median across 8 donors) falling in each cluster (n = 13 clusters) for stimulated and unstimulated cells. (F) Examples of heterogeneous responses to IFN-β between conventional and plasmacytoid dendritic cells (global analysis shown in Supplementary Figure 4B). Each column represents the average expression of single cells within a single patient. Only patient/cluster combinations with at least five cells are shown. (G) Correlation heatmap (n = 430 genes with difference > ln(2) between resting and stimulated) of cell-type specific responses to IFN-β (individual correlations for T and DC subsets shown in Supplementary Figure 4A–B). Cells from myeloid and lymphoid lineages show highly correlated responses, but plasmacytoid dendritic cells exhibit a unique IFN-β response.

Figure 3.

Figure 3.. Comparative analysis of mouse hematopoietic progenitors across scRNA-seq technologies

(A-C) tSNE plots of 3,451 hematopoietic progenitor cells from murine bone marrow sequenced using MARS-seq (2,686) and SMART-Seq2 (765), prior to (A) and post (B-C) alignment. After alignment, cells group together based on shared progenitor type irrespective of sequencing technology. (C-D) Cells from the SMART-Seq2 dataset were mapped onto the closest MARS-Seq cluster and associated lineage (from Paul et al.). (C) tSNE plot of cells colored by assigned lineage. (D) Mapping correspondence between SMART-Seq2 lineage assignments (from Nestorawa et al.) and MARS-Seq clusters. (E-F) Heatmaps showing lineage- specific gene expression patterns in MARS-Seq and SMART-Seq2 datasets. Each column represents average expression after cells are grouped either by the original MARS-Seq cluster assignments (E), or the MARS-Seq cluster they map to (F). (G-H) Integrated diffusion maps of erythroid-committed cells in both datasets reveals an aligned developmental trajectory (G), with conserved ‘pseudo-temporal’ dynamics (H). (I) Scatter plot comparing the range in expression (absolute value) over the developmental trajectory, for each gene, across both datasets.

Figure 4.

Figure 4.. Joint identification of cell types across human and mouse islet scRNA-seq atlases

(A-C) tSNE plots of 10,191 pancreatic islet cells from human (n = 8,424 cells) and mouse (n = 1,767 cells) donors, prior to (A) and post (B) alignment. After alignment, cells group across species based on shared cell type, allowing for a joint clustering (C) to detect 10 cell populations. (D-E) Unsupervised identification of shared cell-type markers between human and mouse. Single cell expression heatmap for genes identified with joint DE testing across species. (F) Violin plots showing the distribution of gene expression of select genes in the beta cell cluster for human (n = 2,431 cells) and mouse (n = 762 cells) and the stressed beta cell clusters for human (n = 126 cells) and mouse (n = 10 cells). (G) Top n=100 genes up-regulated in the ‘ER-stress’ subpopulation of beta cells in both species are strongly enriched for components of the ER unfolded protein stress response. GO enrichment is visualized using the GOplot R package.

Figure 5.

Figure 5.. Benchmarking alignment and batch correction methods

(A, D, G, J, M) tSNE plots for the PBMC dataset (n = 14,039 cells) (A), hematopoietic progenitor cell dataset (n = 3,451 cells) (B), pancreatic islet cell dataset (n = 10,306 cells) (C), multiple human pancreatic islet cell datasets (n = 6,224 cells) (J), and multiple PBMC datasets (n = 16,653 cells) (M) after correction with ComBat and (B, E, H, K, N) with limma. (C, F, I, L, O) Bar plots of the alignment score after correction using the Seurat alignment procedure, ComBat, limma, and after no correction. Seurat alignment outperforms other methods in all five examples. Additional examples of ‘negative controls’ where Seurat fails to align datasets from different tissues are shown in Supplementary Figure 15.

Similar articles

Cited by

References

    1. Klein AM et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015). - PMC - PubMed
    1. Zilionis R et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat. Protoc 12, 44–73 (2017). - PubMed
    1. Macosko EZ et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214 (2015). - PMC - PubMed
    1. Zheng GXY et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun 8, 14049 (2017). - PMC - PubMed
    1. Shekhar K et al. Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell 166, 1308–1323 (2016). - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources