scDesign3 marginal distribution for genes (original) (raw)

Introduction

In this tutorial, we explain different forms of the function that can be used when fitting the marginal distribution for each gene.

Notation

The following notations are used:

For each feature \(j=1,\ldots,m\) in every cell \(i=1,\ldots,n\), the measurement \(Y_{ij}\)—conditional on cell \(i\)’s state covariates \(\mathbf{x_i}\) and design covariates \(\mathbf{z}_i = (b_i, c_i)^T\)—is assumed to follow a distribution \(F_{j}( \cdot~|~\mathbf{x}_i, \mathbf{z}_i~;~\mu_{ij}, \sigma_{ij}, p_{ij})\), which is specified as the generalized additive model for location, scale and shape (GAMLSS). The various specifications of \(f_{jc_i}(\cdot)\), \(g_{jc_i}(\cdot)\), and \(h_{jc_i}(\cdot)\) are summarized in the next section. \[\begin{equation} \begin{cases} Y_{ij}~|~\mathbf{x}_i, \mathbf{z}_i &\overset{\mathrm{ind}}{\sim} F_{j}( \cdot~|~\mathbf{x}_i, \mathbf{z}_i~;~\mu_{ij}, \sigma_{ij}, p_{ij})\\ \theta_{j}(\mu_{ij}) &= \alpha_{j0} + \alpha_{jb_i} + \alpha_{jc_i} + f_{jc_i}(\mathbf{x}_i) \\ \log(\sigma_{ij}) &= \beta_{j0}+ \beta_{jb_i} + \beta_{jc_i} + g_{jc_i}(\mathbf{x}_i) \\ \operatorname{logit}(p_{ij}) &= \gamma_{j0} + \gamma_{jb_i}+ \gamma_{jc_i}+ h_{jc_i}(\mathbf{x}_i) \\ \end{cases} \, \end{equation}\]

Summary

Covariate type Covariate form Function form Explaination Geometric meaning Code Example
Discrete cell type \(x_i \in \left\{1, \ldots, K_C\right\}\) \(f_{jc_i}(x_i) = \alpha_{jc_ix_i}\) Cell type \(x_i\) has the effect \(\alpha_{jc_ix_i}\); for identifiability, \(\alpha_{jc_ix_i} = 0\) if \(x_i = 1\) One intercept for each cell type mu_formula = “cell_type”
Continuous pseudotime in one lineage \(x_i \in [0,\infty)\) \(f_{jc_i}({x}_i) = \sum_{k = 1}^Kb_{jc_ik}(x_{i})\beta_{jc_ik}\) \(b_{jc_ik}(\cdot)\) is a basis function of cubic spline; \(K\) is the dimension of the basis A curve along the pseudotime mu_formula = “s(pseudotime)”
Continuous pseudotimes in \(p\) lineages \(\mathbf{x}_i = (x_{i1}, \ldots, x_{ip})^T \in [0,\infty)^{p}\) \(f_{jc_i}(\mathbf{x}_i) = \sum_{l = 1}^p \sum_{k = 1}^Kb_{jc_ilk}(x_{il})\beta_{jc_ilk}\) \(b_{jc_ilk}(\cdot)\) is a basis function of cubic spline; \(K\) is the the dimension of the basis (default \(K=10\)) One curve along each lineage mu_formula = “s(pseudotime1, k = 10, by = l1, bs = ‘cr’) + s(pseudotime2, k = 10, by = l2, bs = ‘cr’)”, \(p = 2\) in this case
Spatial location \(\mathbf{x}_i = (x_{i1}, x_{i2})^T \in \mathbb{R}^{2}\) \(f_{jc_i}(\mathbf{x}_i) = f_{jc_i}^{\operatorname{GP}}(x_{i1}, x_{i2}, K)\) \(f_{jc_i}^{\operatorname{GP}}(\cdot, \cdot, K)\) is a Gaussian process smoother; \(K\) is the dimension of the basis (default \(K=400\)) A smooth surface mu_formula = “s(spatial1, spatial2, bs = ‘gp’, k = 400)”
Note: For simplicity, we only show the form of \(f_{jc_i}(\cdot)\) because \(g_{jc_i}(\cdot)\) and \(h_{jc_i}(\cdot)\) have the same form.