scShapes: A statistical framework for identifying distribution shapes in single-cell RNA-sequencing data (original) (raw)
New Results
doi: https://doi.org/10.1101/2022.02.13.480299
Abstract
Background Single cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell-cell variability therefore are useful for going beyond changes based on average expression and instead, identifying genes with homogenous expression versus those that vary widely from cell to cell.
Results We present a novel statistical framework scShapes for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single cell data are driven by over-dispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically-relevant genes that were not discovered through standard approaches.
Conclusions This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes helps to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into Bioconductor R package (https://github.com/Malindrie/scShapes).
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
ANOVA
Analysis of variance
BH
Benjamini & Hochberg
BIC
Bayesian Information Criterion
BL
Blood
BM
Bone marrow
DB
Both differential modality and different component means
DC
Dendritic cells
DD
Differential distribution
DE
Differential expression
DM
Differential modality
DP
Differential proportion
DZ
Differential zeroes
ELPD
Expected log predictive density
GLM
Generalized linear model
KS
Kolmogorov-Smirnov
LG
Lungs
LN
Lymph node
LRT
Likelihood ratio test
NB
Negative binomial
NK
Natural killer
PBMC
Peripheral blood mononuclear cells
QLF
Quasi-likelihood approach
scRNA-seq
Single-cell RNA sequencing
ZINB
Zero inflated negative binomial
ZIP
Zero inflated Poisson
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.