Integrating single-cell transcriptomic data across different conditions, technologies, and species - PubMed (original) (raw)
. 2018 Jun;36(5):411-420.
doi: 10.1038/nbt.4096. Epub 2018 Apr 2.
Affiliations
- PMID: 29608179
- PMCID: PMC6700744
- DOI: 10.1038/nbt.4096
Integrating single-cell transcriptomic data across different conditions, technologies, and species
Andrew Butler et al. Nat Biotechnol. 2018 Jun.
Abstract
Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
Conflict of interest statement
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Figures
Figure 1.. Overview of Seurat alignment of single cell RNA-seq datasets
(A) Toy example of heterogeneous populations profiled in a case/control study after drug treatment. Cells across four types are plotted with different symbols, while stimulation condition is encoded by color. In a standard workflow, cells often cluster both by cell type and stimulation condition, creating challenges for downstream comparative analysis. (B) The Seurat alignment procedure uses canonical correlation analysis to identify shared correlation structures across datasets, and aligns these dimensions using dynamic time warping. After alignment, cells are embedded in a shared low-dimensional space (visualized here in 2D with tSNE). (C) After alignment, a single integrated clustering can identify conserved cell types across conditions, allowing for comparative analysis to identify shifts in cell type proportion, as well as cell-type specific transcriptional responses to drug treatment.
Figure 2.. Integrated analysis of resting and stimulated PBMC
(A-C) tSNE plots of 14,039 human PBMCs split between control and IFN-β-stimulated conditions, prior to (A) and post (B) alignment. After alignment, cells across stimulation conditions group together based on shared cell type, allowing for a single joint clustering (C) to detect 13 immune populations. (D) Integrated analysis reveals markers of cell types (conserved across stimulation conditions), uniform markers of IFN-β response (independent of cell type), and components of the IFN-β response that vary across cell types. The size of each circle reflects the percentage of cells in a cluster where the gene is detected, and the color reflects the average expression level within each cluster. (E) The fraction of cells (median across 8 donors) falling in each cluster (n = 13 clusters) for stimulated and unstimulated cells. (F) Examples of heterogeneous responses to IFN-β between conventional and plasmacytoid dendritic cells (global analysis shown in Supplementary Figure 4B). Each column represents the average expression of single cells within a single patient. Only patient/cluster combinations with at least five cells are shown. (G) Correlation heatmap (n = 430 genes with difference > ln(2) between resting and stimulated) of cell-type specific responses to IFN-β (individual correlations for T and DC subsets shown in Supplementary Figure 4A–B). Cells from myeloid and lymphoid lineages show highly correlated responses, but plasmacytoid dendritic cells exhibit a unique IFN-β response.
Figure 3.. Comparative analysis of mouse hematopoietic progenitors across scRNA-seq technologies
(A-C) tSNE plots of 3,451 hematopoietic progenitor cells from murine bone marrow sequenced using MARS-seq (2,686) and SMART-Seq2 (765), prior to (A) and post (B-C) alignment. After alignment, cells group together based on shared progenitor type irrespective of sequencing technology. (C-D) Cells from the SMART-Seq2 dataset were mapped onto the closest MARS-Seq cluster and associated lineage (from Paul et al.). (C) tSNE plot of cells colored by assigned lineage. (D) Mapping correspondence between SMART-Seq2 lineage assignments (from Nestorawa et al.) and MARS-Seq clusters. (E-F) Heatmaps showing lineage- specific gene expression patterns in MARS-Seq and SMART-Seq2 datasets. Each column represents average expression after cells are grouped either by the original MARS-Seq cluster assignments (E), or the MARS-Seq cluster they map to (F). (G-H) Integrated diffusion maps of erythroid-committed cells in both datasets reveals an aligned developmental trajectory (G), with conserved ‘pseudo-temporal’ dynamics (H). (I) Scatter plot comparing the range in expression (absolute value) over the developmental trajectory, for each gene, across both datasets.
Figure 4.. Joint identification of cell types across human and mouse islet scRNA-seq atlases
(A-C) tSNE plots of 10,191 pancreatic islet cells from human (n = 8,424 cells) and mouse (n = 1,767 cells) donors, prior to (A) and post (B) alignment. After alignment, cells group across species based on shared cell type, allowing for a joint clustering (C) to detect 10 cell populations. (D-E) Unsupervised identification of shared cell-type markers between human and mouse. Single cell expression heatmap for genes identified with joint DE testing across species. (F) Violin plots showing the distribution of gene expression of select genes in the beta cell cluster for human (n = 2,431 cells) and mouse (n = 762 cells) and the stressed beta cell clusters for human (n = 126 cells) and mouse (n = 10 cells). (G) Top n=100 genes up-regulated in the ‘ER-stress’ subpopulation of beta cells in both species are strongly enriched for components of the ER unfolded protein stress response. GO enrichment is visualized using the GOplot R package.
Figure 5.. Benchmarking alignment and batch correction methods
(A, D, G, J, M) tSNE plots for the PBMC dataset (n = 14,039 cells) (A), hematopoietic progenitor cell dataset (n = 3,451 cells) (B), pancreatic islet cell dataset (n = 10,306 cells) (C), multiple human pancreatic islet cell datasets (n = 6,224 cells) (J), and multiple PBMC datasets (n = 16,653 cells) (M) after correction with ComBat and (B, E, H, K, N) with limma. (C, F, I, L, O) Bar plots of the alignment score after correction using the Seurat alignment procedure, ComBat, limma, and after no correction. Seurat alignment outperforms other methods in all five examples. Additional examples of ‘negative controls’ where Seurat fails to align datasets from different tissues are shown in Supplementary Figure 15.
Similar articles
- Data Analysis in Single-Cell Transcriptome Sequencing.
Gao S. Gao S. Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18. Methods Mol Biol. 2018. PMID: 29536451 - A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies.
Sun Z, Chen L, Xin H, Jiang Y, Huang Q, Cillo AR, Tabib T, Kolls JK, Bruno TC, Lafyatis R, Vignali DAA, Chen K, Ding Y, Hu M, Chen W. Sun Z, et al. Nat Commun. 2019 Apr 9;10(1):1649. doi: 10.1038/s41467-019-09639-3. Nat Commun. 2019. PMID: 30967541 Free PMC article. - Visualization of Single Cell RNA-Seq Data Using t-SNE in R.
Zhou B, Jin W. Zhou B, et al. Methods Mol Biol. 2020;2117:159-167. doi: 10.1007/978-1-0716-0301-7_8. Methods Mol Biol. 2020. PMID: 31960377 - The promise of single-cell RNA sequencing for kidney disease investigation.
Wu H, Humphreys BD. Wu H, et al. Kidney Int. 2017 Dec;92(6):1334-1342. doi: 10.1016/j.kint.2017.06.033. Epub 2017 Oct 12. Kidney Int. 2017. PMID: 28893418 Free PMC article. Review. - Single-cell and spatial transcriptomics approaches of cardiovascular development and disease.
Roth R, Kim S, Kim J, Rhee S. Roth R, et al. BMB Rep. 2020 Aug;53(8):393-399. doi: 10.5483/BMBRep.2020.53.8.130. BMB Rep. 2020. PMID: 32684243 Free PMC article. Review.
Cited by
- Circadian rhythms of macrophages are altered by the acidic tumor microenvironment.
Knudsen-Clark AM, Mwangi D, Cazarin J, Morris K, Baker C, Hablitz LM, McCall MN, Kim M, Altman BJ. Knudsen-Clark AM, et al. EMBO Rep. 2024 Oct 16. doi: 10.1038/s44319-024-00288-2. Online ahead of print. EMBO Rep. 2024. PMID: 39415049 - TGF-β-mediated crosstalk between TIGIT+ Tregs and CD226+CD8+ T cells in the progression and remission of type 1 diabetes.
Zhong T, Li X, Lei K, Tang R, Deng Q, Love PE, Zhou Z, Zhao B, Li X. Zhong T, et al. Nat Commun. 2024 Oct 15;15(1):8894. doi: 10.1038/s41467-024-53264-8. Nat Commun. 2024. PMID: 39406740 Free PMC article. - Targeting IRE1α reprograms the tumor microenvironment and enhances anti-tumor immunity in prostate cancer.
Unal B, Kuzu OF, Jin Y, Osorio D, Kildal W, Pradhan M, Kung SHY, Oo HZ, Daugaard M, Vendelbo M, Patterson JB, Thomsen MK, Kuijjer ML, Saatcioglu F. Unal B, et al. Nat Commun. 2024 Oct 15;15(1):8895. doi: 10.1038/s41467-024-53039-1. Nat Commun. 2024. PMID: 39406723 Free PMC article. - BNIP3+ fibroblasts associated with hypoxia and inflammation predict prognosis and immunotherapy response in pancreatic ductal adenocarcinoma.
Gao B, Hu G, Sun B, Li W, Yang H. Gao B, et al. J Transl Med. 2024 Oct 14;22(1):937. doi: 10.1186/s12967-024-05674-x. J Transl Med. 2024. PMID: 39402590 Free PMC article. - Identification of the metabolic protein ATP5MF as a potential therapeutic target of TNBC.
Chen K, Wu Y, Xu L, Wang C, Xue J. Chen K, et al. J Transl Med. 2024 Oct 14;22(1):932. doi: 10.1186/s12967-024-05692-9. J Transl Med. 2024. PMID: 39402579 Free PMC article.
References
- Zilionis R et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat. Protoc 12, 44–73 (2017). - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources