Bayesian approach to single-cell differential expression analysis - PubMed (original) (raw)
Bayesian approach to single-cell differential expression analysis
Peter V Kharchenko et al. Nat Methods. 2014 Jul.
Abstract
Single-cell data provide a means to dissect the composition of complex tissues and specialized cellular environments. However, the analysis of such measurements is complicated by high levels of technical noise and intrinsic biological variability. We describe a probabilistic model of expression-magnitude distortions typical of single-cell RNA-sequencing measurements, which enables detection of differential expression signatures and identification of subpopulations of cells in a way that is more tolerant of noise.
Figures
Figure 1. Modeling single-cell RNA-seq measurement as a mixture of two processes
a. Types of cell-to-cell variability observed in single-cell RNA-seq measurements. A smoothed scatter plot compares gene expression estimates from two cells of the same type (MEF cells), illustrating prevalence of drop-out events, over-dispersion, and high-magnitude outliers. b. Single-cell variability throws off standard RNA-seq analysis methods, with top differentially expressed genes influenced by difference in drop-out (Rnaseh2a) or outlier (Bmp4) events. The examples are taken from CuffDiff2 comparison of 10 ESC and 10 MEF cells, with triangles showing expression magnitudes observed in different cells, and whiskers spanning the range of observed expression magnitudes. c. To identify a reliable set of genes for fitting model parameters, our approach initially uses cross-comparison of single-cell measurements (using cells of the same type, e.g. MEF), determining whether the transcript is likely to have been successfully amplified in both experiments (correlated component). The true expression magnitude of such genes is estimated as a median expression level across cells in which the gene appears in a correlated component. d. Each single-cell measurement is modeled as a mixture of drop-out and successful amplification processes. The parameters of the distributions and the magnitude-dependent mixing of the two processes are determined based on the expected population expression averages of genes appearing in many correlated components (c.). e. Drop-out rates vary between different cell types. The rate of transcript detection failures (drop-out events) depends on the average expression magnitude of a gene in the cell population, and varies among the cells. In Islam et al. dataset, higher drop-out frequencies are observed for mouse ES cells compared to MEF cells. f. Drop-out rates for 4, 8 and 16-cell embryo samples examined by Deng et al. using a recently-developed protocol also show systematic differences.
Figure 2. Applying single-cell models for differential expression and subpopulation analyses
a. The model fitted for each single cell is used to estimate the likelihood that a gene is expressed at any particular level (i.e. posterior distribution) given the observed data (colored curves). The approach estimates joint posterior distribution for the overall level with each cell type (black curves), and the expression fold difference between the cell types (middle plot). The example demonstrates expression differences of Sox2 between all ES and MEF cells measured by Islam et al. The plots show posterior probability of expression magnitudes in proximal (top) and distal (bottom) cells. The posterior probability of the fold-expression difference magnitude is shown in the middle plot with the associated raw P-value of differential expression. b. Differential expression of Dazl between cells of 8-cell and 16-cell mouse embryo stages, as determined by SCDE method. A regulator factor expressed in mammalian embryos, , Dazl is expressed at earlier stages, and shows a drop-off between 8- and 16-cell stages. c. The ability of different analysis methods to detect differentially expressed genes is shown using the false/true positive rate relationship (ROC curve), using traditional bulk expression measurements as a benchmark. The SCDE method shows higher sensitivity at low false-positive range, as well as higher overall performance, as measured by area under the curve (AUC) scores. d. Performance of error-model-based transcriptional similarity measures in distinguishing ES and MEF cell types. The plot shows the fraction of correctly classified cells, assessed for increasingly difficult classification problem by iteratively excluding up to 7000 most informative genes (i.e. genes differentially expressed between ES and MEF, x-axis). The 95% confidence bands are shown in light shading. Transcriptional similarity measures that take into account direct or reciprocal drop-out event probability show consistently better classification performance than Pearson linear correlation or Bray-Curtis similarity measure.
Similar articles
- Beyond comparisons of means: understanding changes in gene expression at the single-cell level.
Vallejos CA, Richardson S, Marioni JC. Vallejos CA, et al. Genome Biol. 2016 Apr 15;17:70. doi: 10.1186/s13059-016-0930-3. Genome Biol. 2016. PMID: 27083558 Free PMC article. - Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data.
Jia C, Hu Y, Kelly D, Kim J, Li M, Zhang NR. Jia C, et al. Nucleic Acids Res. 2017 Nov 2;45(19):10978-10988. doi: 10.1093/nar/gkx754. Nucleic Acids Res. 2017. PMID: 29036714 Free PMC article. - BASiCS: Bayesian Analysis of Single-Cell Sequencing Data.
Vallejos CA, Marioni JC, Richardson S. Vallejos CA, et al. PLoS Comput Biol. 2015 Jun 24;11(6):e1004333. doi: 10.1371/journal.pcbi.1004333. eCollection 2015 Jun. PLoS Comput Biol. 2015. PMID: 26107944 Free PMC article. - Single-cell RNA-seq: advances and future challenges.
Saliba AE, Westermann AJ, Gorski SA, Vogel J. Saliba AE, et al. Nucleic Acids Res. 2014 Aug;42(14):8845-60. doi: 10.1093/nar/gku555. Epub 2014 Jul 22. Nucleic Acids Res. 2014. PMID: 25053837 Free PMC article. Review. - Normalization for Single-Cell RNA-Seq Data Analysis.
Bacher R. Bacher R. Methods Mol Biol. 2019;1935:11-23. doi: 10.1007/978-1-4939-9057-3_2. Methods Mol Biol. 2019. PMID: 30758817 Review.
Cited by
- Biological Sequence Classification: A Review on Data and General Methods.
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Ao C, et al. Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022. Research (Wash D C). 2022. PMID: 39285948 Free PMC article. - Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells.
Mellis IA, Melzer ME, Bodkin N, Goyal Y. Mellis IA, et al. Genome Biol. 2024 Aug 12;25(1):217. doi: 10.1186/s13059-024-03351-2. Genome Biol. 2024. PMID: 39135102 Free PMC article. - MOCHA's advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts.
Rachid Zaim S, Pebworth MP, McGrath I, Okada L, Weiss M, Reading J, Czartoski JL, Torgerson TR, McElrath MJ, Bumol TF, Skene PJ, Li XJ. Rachid Zaim S, et al. Nat Commun. 2024 Aug 9;15(1):6828. doi: 10.1038/s41467-024-50612-6. Nat Commun. 2024. PMID: 39122670 Free PMC article. - Deciphering deep-sea chemosynthetic symbiosis by single-nucleus RNA-sequencing.
Wang H, He K, Zhang H, Zhang Q, Cao L, Li J, Zhong Z, Chen H, Zhou L, Lian C, Wang M, Chen K, Qian PY, Li C. Wang H, et al. Elife. 2024 Aug 5;12:RP88294. doi: 10.7554/eLife.88294. Elife. 2024. PMID: 39102287 Free PMC article. - Single-cell omics: experimental workflow, data analyses and applications.
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JJ, Liu Q, Fan X, Li C, Wang C, Shi T. Sun F, et al. Sci China Life Sci. 2024 Jul 23. doi: 10.1007/s11427-023-2561-0. Online ahead of print. Sci China Life Sci. 2024. PMID: 39060615 Review.
References
- Tang F, et al. Nat Methods. 2009;6:377–382. - PubMed
- Hashimshony T, Wagner F, Sher N, Yanai I. Cell Rep. 2012;2:666–673. - PubMed
Publication types
MeSH terms
Grants and funding
- K25 AG037596/AG/NIA NIH HHS/United States
- U01 HL100402/HL/NHLBI NIH HHS/United States
- R01DK050234-15A1/DK/NIDDK NIH HHS/United States
- R01 HL097794/HL/NHLBI NIH HHS/United States
- K25AG037596/AG/NIA NIH HHS/United States
- R01 DK050234/DK/NIDDK NIH HHS/United States
- R01HL097794-03/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources