scds:single cell doublet scoring: In-silico doublet annotation for single cell RNA sequencing data (original) (raw)
Introduction
In this vignette, we provide an overview of the basic functionality and usage of the scds
package, which interfaces with SingleCellExperiment
objects.
Installation
Install the scds
package using Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("scds", version = "3.9")
Or from github:
library(devtools)
devtools::install_github('kostkalab/scds')
Quick start
scds
takes as input a SingleCellExperiment
object (see here SingleCellExperiment), where raw counts are stored in a counts
assay, i.e. assay(sce,"counts")
. An example dataset created by sub-sampling the cell-hashing cell-lines data set (see https://satijalab.org/seurat/hashing_vignette.html) is included with the package and accessible via data("sce")
.Note that scds
is designed to workd with larger datasets, but for the purposes of this vignette, we work with a smaller example dataset. We apply scds
to this data and compare/visualize reasults:
Example data set
Get example data set provided with the package.
library(scds)
library(scater)
library(rsvd)
library(Rtsne)
library(cowplot)
set.seed(30519)
data("sce_chcl")
sce = sce_chcl #- less typing
dim(sce)
## [1] 2000 2000
We see it contains 2,000 genes and 2,000 cells, 216 of which are identified as doublets:
table(sce$hto_classification_global)
##
## Doublet Negative Singlet
## 216 83 1701
We can visualize cells/doublets after projecting into two dimensions:
logcounts(sce) = log1p(counts(sce))
vrs = apply(logcounts(sce),1,var)
pc = rpca(t(logcounts(sce)[order(vrs,decreasing=TRUE)[1:100],]))
ts = Rtsne(pc$x[,1:10],verb=FALSE)
reducedDim(sce,"tsne") = ts$Y; rm(ts,vrs,pc)
plotReducedDim(sce,"tsne",color_by="hto_classification_global")
Computational doublet annotation
We now run the scds
doublet annotation approaches. Briefly, we identify doublets in two complementary ways: cxds
is based on co-expression of gene pairs and works with absence/presence calls only, while bcds
uses the full count information and a binary classification approach using artificially generated doublets. cxds_bcds_hybrid
combines both approaches, for more details please consult (this manuscript). Each of the three methods returns a doublet score, with higher scores indicating more “doublet-like” barcodes.
#- Annotate doublet using co-expression based doublet scoring:
sce = cxds(sce,retRes = TRUE)
sce = bcds(sce,retRes = TRUE,verb=TRUE)
sce = cxds_bcds_hybrid(sce)
par(mfcol=c(1,3))
boxplot(sce$cxds_score ~ sce$doublet_true_labels, main="cxds")
boxplot(sce$bcds_score ~ sce$doublet_true_labels, main="bcds")
boxplot(sce$hybrid_score ~ sce$doublet_true_labels, main="hybrid")
Visualizing gene pairs
For cxds
we can identify and visualize gene pairs driving doublet annoataions, with the expectation that the two genes in a pair might mark different types of cells (see manuscript). In the following we look at the top three pairs, each gene pair is a row in the plot below:
scds =
top3 = metadata(sce)$cxds$topPairs[1:3,]
rs = rownames(sce)
hb = rowData(sce)$cxds_hvg_bool
ho = rowData(sce)$cxds_hvg_ordr[hb]
hgs = rs[ho]
l1 = ggdraw() + draw_text("Pair 1", x = 0.5, y = 0.5)
p1 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,1]])
p2 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,2]])
l2 = ggdraw() + draw_text("Pair 2", x = 0.5, y = 0.5)
p3 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,1]])
p4 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,2]])
l3 = ggdraw() + draw_text("Pair 3", x = 0.5, y = 0.5)
p5 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,1]])
p6 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,2]])
plot_grid(l1,p1,p2,l2,p3,p4,l3,p5,p6,ncol=3, rel_widths = c(1,2,2))
Session Info
sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.3 Rtsne_0.17
## [3] rsvd_1.0.5 scater_1.37.0
## [5] ggplot2_3.5.2 scuttle_1.19.0
## [7] SingleCellExperiment_1.31.0 SummarizedExperiment_1.39.0
## [9] Biobase_2.69.0 GenomicRanges_1.61.0
## [11] GenomeInfoDb_1.45.0 IRanges_2.43.0
## [13] S4Vectors_0.47.0 BiocGenerics_0.55.0
## [15] generics_0.1.3 MatrixGenerics_1.21.0
## [17] matrixStats_1.5.0 scds_1.25.0
## [19] BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2 dplyr_1.1.4
## [4] vipor_0.4.7 farver_2.1.2 viridis_0.6.5
## [7] fastmap_1.2.0 pROC_1.18.5 digest_0.6.37
## [10] lifecycle_1.0.4 magrittr_2.0.3 compiler_4.5.0
## [13] rlang_1.1.6 sass_0.4.10 tools_4.5.0
## [16] yaml_2.3.10 data.table_1.17.0 knitr_1.50
## [19] S4Arrays_1.9.0 labeling_0.4.3 xgboost_1.7.9.1
## [22] DelayedArray_0.35.0 plyr_1.8.9 abind_1.4-8
## [25] BiocParallel_1.43.0 withr_3.0.2 grid_4.5.0
## [28] beachmat_2.25.0 colorspace_2.1-1 scales_1.3.0
## [31] tinytex_0.57 cli_3.6.4 rmarkdown_2.29
## [34] crayon_1.5.3 httr_1.4.7 ggbeeswarm_0.7.2
## [37] cachem_1.1.0 parallel_4.5.0 BiocManager_1.30.25
## [40] XVector_0.49.0 vctrs_0.6.5 Matrix_1.7-3
## [43] jsonlite_2.0.0 bookdown_0.43 BiocSingular_1.25.0
## [46] BiocNeighbors_2.3.0 ggrepel_0.9.6 irlba_2.3.5.1
## [49] beeswarm_0.4.0 magick_2.8.6 jquerylib_0.1.4
## [52] glue_1.8.0 codetools_0.2-20 gtable_0.3.6
## [55] UCSC.utils_1.5.0 ScaledMatrix_1.17.0 munsell_0.5.1
## [58] tibble_3.2.1 pillar_1.10.2 htmltools_0.5.8.1
## [61] GenomeInfoDbData_1.2.14 R6_2.6.1 evaluate_1.0.3
## [64] lattice_0.22-7 bslib_0.9.0 Rcpp_1.0.14
## [67] gridExtra_2.3 SparseArray_1.9.0 xfun_0.52
## [70] pkgconfig_2.0.3