scDesign3 Quickstart (original) (raw)

library(scDesign3)
library(SingleCellExperiment)
library(ggplot2)
theme_set(theme_bw())

Introduction

scDesign3 is a unified probabilistic framework that generates realistic in silico high-dimensional single-cell omics data of various cell states, including discrete cell types, continuous trajectories, and spatial locations by learning from real datasets. Since the functions of scDesign3 is very comprehensive, here we only introduce how scDesign3 simulates an scRNA-seq dataset with one continuous developmental trajectory. For more information, please check the Articles on our website: (https://songdongyuan1994.github.io/scDesign3/docs/index.html).

Read in the reference data

The raw data is from the scvelo, which describes pancreatic endocrinogenesis. We pre-select the top 1000 highly variable genes and filter out some cell types to ensure a single trajectory.

example_sce <- readRDS((url("https://figshare.com/ndownloader/files/40581992")))
print(example_sce)
#> class: SingleCellExperiment 
#> dim: 1000 2087 
#> metadata(5): clusters_coarse_colors clusters_colors day_colors
#>   neighbors pca
#> assays(6): X spliced ... cpm logcounts
#> rownames(1000): Pyy Iapp ... Eya2 Kif21a
#> rowData names(1): highly_variable_genes
#> colnames(2087): AAACCTGAGAGGGATA AAACCTGGTAAGTGGC ... TTTGTCAAGTGACATA
#>   TTTGTCAAGTGTGGCA
#> colData names(7): clusters_coarse clusters ... sizeFactor pseudotime
#> reducedDimNames(4): X_pca X_umap PCA UMAP
#> mainExpName: NULL
#> altExpNames(0):

To save computational time, we only use the top 100 genes.

example_sce <- example_sce[1:100, ]

Simulation

The function scdesign3() takes in a SinglecellExperiment object with the cell covariates (such as cell types, pseudotime, or spatial coordinates) stored in the colData of the SinglecellExperiment object.

set.seed(123)
example_simu <- scdesign3(
    sce = example_sce,
    assay_use = "counts",
    celltype = "cell_type",
    pseudotime = "pseudotime",
    spatial = NULL,
    other_covariates = NULL,
    mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula = "1", # If you want your dispersion also varies along pseudotime, use "s(pseudotime, k = 5, bs = 'cr')"
    family_use = "nb",
    n_cores = 2,
    usebam = FALSE,
    corr_formula = "1",
    copula = "gaussian",
    DT = TRUE,
    pseudo_obs = FALSE,
    return_model = FALSE,
    nonzerovar = FALSE
  )

The output of scdesign3() is a list which includes:

In this example, since we did not change the parameter ncell, the synthetic count matrix will have the same dimension as the input count matrix.

dim(example_simu$new_count)
#> [1]  100 2087

Then, we can create the SinglecellExperiment object using the synthetic count matrix and store the logcounts to the input and synthetic SinglecellExperiment objects.

logcounts(example_sce) <- log1p(counts(example_sce))
simu_sce <- SingleCellExperiment(list(counts = example_simu$new_count), colData = example_simu$new_covariate)
logcounts(simu_sce) <- log1p(counts(simu_sce))

Visualization

set.seed(123)
compare_figure <- plot_reduceddim(ref_sce = example_sce, 
                                  sce_list = list(simu_sce), 
                                  name_vec = c("Reference", "scDesign3"),
                                  assay_use = "logcounts", 
                                  if_plot = TRUE, 
                                  color_by = "pseudotime", 
                                  n_pc = 20)
plot(compare_figure$p_umap)

Session information

sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ggplot2_3.5.2               SingleCellExperiment_1.30.0
#>  [3] SummarizedExperiment_1.38.0 Biobase_2.68.0             
#>  [5] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
#>  [7] IRanges_2.42.0              S4Vectors_0.46.0           
#>  [9] BiocGenerics_0.54.0         generics_0.1.3             
#> [11] MatrixGenerics_1.20.0       matrixStats_1.5.0          
#> [13] scDesign3_1.6.0             BiocStyle_2.36.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6            xfun_0.52               bslib_0.9.0            
#>  [4] lattice_0.22-7          gamlss_5.4-22           vctrs_0.6.5            
#>  [7] tools_4.5.0             parallel_4.5.0          tibble_3.2.1           
#> [10] pkgconfig_2.0.3         Matrix_1.7-3            lifecycle_1.0.4        
#> [13] GenomeInfoDbData_1.2.14 farver_2.1.2            compiler_4.5.0         
#> [16] munsell_0.5.1           htmltools_0.5.8.1       sass_0.4.10            
#> [19] yaml_2.3.10             pillar_1.10.2           crayon_1.5.3           
#> [22] jquerylib_0.1.4         MASS_7.3-65             openssl_2.3.2          
#> [25] cachem_1.1.0            DelayedArray_0.34.0     viridis_0.6.5          
#> [28] abind_1.4-8             mclust_6.1.1            nlme_3.1-168           
#> [31] RSpectra_0.16-2         tidyselect_1.2.1        digest_0.6.37          
#> [34] mvtnorm_1.3-3           dplyr_1.1.4             bookdown_0.43          
#> [37] labeling_0.4.3          splines_4.5.0           gamlss.dist_6.1-1      
#> [40] fastmap_1.2.0           grid_4.5.0              gamlss.data_6.0-6      
#> [43] colorspace_2.1-1        cli_3.6.4               SparseArray_1.8.0      
#> [46] magrittr_2.0.3          S4Arrays_1.8.0          survival_3.8-3         
#> [49] withr_3.0.2             scales_1.3.0            UCSC.utils_1.4.0       
#> [52] rmarkdown_2.29          XVector_0.48.0          httr_1.4.7             
#> [55] umap_0.2.10.0           gridExtra_2.3           reticulate_1.42.0      
#> [58] png_0.1-8               askpass_1.2.1           evaluate_1.0.3         
#> [61] knitr_1.50              viridisLite_0.4.2       irlba_2.3.5.1          
#> [64] mgcv_1.9-3              rlang_1.1.6             Rcpp_1.0.14            
#> [67] glue_1.8.0              BiocManager_1.30.25     jsonlite_2.0.0         
#> [70] R6_2.6.1