A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics (original) (raw)

scDesign3


The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models. Detailed tutorials that illustrate various functionalities of scDesign3 are available at this website. The following illustration figure summarizes the usage of scDesign3:

To find out more details about scDesign3, you can check out our manuscript on Nature Biotechnology:

Song, D., Wang, Q., Yan, G. et al. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 42, 247–252 (2024).

Please note that the parallel computing of scDesign3 is mainly designed for UNIX OS; be careful when you set n_cores.

  1. Installation
  2. Quick Start
  3. Tutorials
  4. Contact
  5. Related Manuscripts

Installation

To install the development version from GitHub, please run:

We are now working on submitting it to Bioconductor and will provide the link once online.

Quick Start

The following code is a quick example of running our simulator. The function [scdesign3()](reference/scdesign3.html) takes in a SinglecellExperiment object with the cell covariates(such as cell types, pseudotime, or spatial coordinates) stored in the colData of the SinglecellExperiment object. For more details on the SinlgeCellExperiment object, please check on its Bioconductor link.

example_simu <- scdesign3(
    sce = example_sce,
    assay_use = "counts",
    celltype = "cell_type",
    pseudotime = "pseudotime",
    spatial = NULL,
    other_covariates = NULL,
    mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula = "s(pseudotime, k = 5, bs = 'cr')",
    family_use = "nb",
    n_cores = 2,
    usebam = FALSE,
    corr_formula = "1",
    copula = "gaussian",
    fastmvn = FALSE,
    DT = TRUE,
    pseudo_obs = FALSE,
    family_set = c("gauss", "indep"),
    important_feature = "all",
    nonnegative = TRUE,
    return_model = FALSE,
    nonzerovar = FALSE,
    parallelization = "mcmapply",
    BPPARAM = NULL,
    trace = FALSE
  )

The parameters of [scdesign3()](reference/scdesign3.html) are:

The output of [scdesign3()](reference/scdesign3.html) is a list which includes:

For more details about the mu_formula and sigma_formula formula specification, please check online materials about the package mgcv. Technically speaking, you can try any formulas as long as they are available for mgcv.

Tutorials

For all detailed tutorials, please check the website. The tutorials will demonstrate the applications of scDesign3 from the following four perspectives: data simulation, model parameters, model selection, and model alteration.

Changelog