Testing non-linear effects (original) (raw)

Introduction

Typical analysis using regression models assumes a linear affect of the covariate on the response. Here we consider testing non-linear effects in the case of 1) continuous and 2) ordered categorical variables.

We demonstrate this feature on a lightly modified analysis of PBMCs from 8 individuals stimulated with interferon-β (Kang, et al, 2018, Nature Biotech).

Standard processing

Here is the code from the main vignette:

library(dreamlet)
library(muscat)
library(ExperimentHub)
library(scater)

# Download data, specifying EH2259 for the Kang, et al study
eh <- ExperimentHub()
sce <- eh[["EH2259"]]

# only keep singlet cells with sufficient reads
sce <- sce[rowSums(counts(sce) > 0) > 0, ]
sce <- sce[, colData(sce)$multiplets == "singlet"]

# compute QC metrics
qc <- perCellQCMetrics(sce)

# remove cells with few or many detected genes
ol <- isOutlier(metric = qc$detected, nmads = 2, log = TRUE)
sce <- sce[, !ol]

# set variable indicating stimulated (stim) or control (ctrl)
sce$StimStatus <- sce$stim

sce$id <- paste0(sce$StimStatus, sce$ind)

# Create pseudobulk
pb <- aggregateToPseudoBulk(sce,
  assay = "counts",
  cluster_id = "cell",
  sample_id = "id",
  verbose = FALSE
)

Continuous variable

Consider the continuous variable Age. Typical analysis only considers linear effects using a single regression coefficient, but we also want to consider the non-linear effects of age. We can peform a basis expansion using splines instead use 3 coefficients to model the age effect.

# Simulate age between 18 and 65
pb$Age <- runif(ncol(pb), 18, 65)

# formula included non-linear effects of Age
# by using a natural spline of degree 3
# This corresponds to using 3 coefficients instead of 1
form <- ~ splines::ns(Age, 3)

# Normalize and apply voom/voomWithDreamWeights
res.proc <- processAssays(pb, form, min.count = 5)

# Differential expression analysis within each assay
res.dl <- dreamlet(res.proc, form)

# The spline has degree 3, so there are 3 coefficients
# estimated for Age effects
coefNames(res.dl)
## [1] "(Intercept)"          "splines::ns(Age, 3)1" "splines::ns(Age, 3)2"
## [4] "splines::ns(Age, 3)3"
# Jointly test effects of the 3 spline components
# The test of the 3 coefficients is performed with an F-statistic
topTable(res.dl, coef = coefNames(res.dl)[2:4], number = 3)
## DataFrame with 3 rows and 9 columns
##             assay          ID splines..ns.Age..3.1 splines..ns.Age..3.2
##       <character> <character>            <numeric>            <numeric>
## 1 Dendritic cells         LYZ             -2.70001             6.420836
## 2     CD4 T cells        CCR6              1.15438            -3.375434
## 3        NK cells       ANXA5             -2.42742            -0.981867
##   splines..ns.Age..3.3   AveExpr         F     P.Value adj.P.Val
##              <numeric> <numeric> <numeric>   <numeric> <numeric>
## 1          -1.38529514  11.42359  12.96769 0.000207406  0.999953
## 2          -0.00915551   4.50453   8.81074 0.000952834  0.999953
## 3          -0.72038814   7.44145   8.49223 0.000955826  0.999953

Ordered categorical

We can also test non-linear effects in the case of categorical variables with a natural ordering to the categories. Consider time course data with 4 time points. Each time point is a category and has a natural ordering from first to last.

We have multiple options to model the time course.

ord <- c("time_1", "time_2", "time_3", "time_4")  
ordered(factor(TimePoint), ord)  

Here we simulated 4 time points, and perform differential expression analysis.

# Consider data generated across 4 time points
# While there are no time points in the real data
# we can add some for demonstration purposes
pb$TimePoint <- ordered(paste0("time_", rep(1:4, 4)))

# examine the ordering
pb$TimePoint
##  [1] time_1 time_2 time_3 time_4 time_1 time_2 time_3 time_4 time_1 time_2
## [11] time_3 time_4 time_1 time_2 time_3 time_4
## Levels: time_1 < time_2 < time_3 < time_4
# Use formula including time point
form <- ~TimePoint

# Normalize and apply voom/voomWithDreamWeights
res.proc <- processAssays(pb, form, min.count = 5)

# Differential expression analysis within each assay
res.dl <- dreamlet(res.proc, form)

# Examine the coefficient estimated
# for TimePoint it estimates
# linear (i.e. L)
# quadratic (i.e. Q)
# and cubic (i.e. C) effects
coefNames(res.dl)
## [1] "(Intercept)" "TimePoint.L" "TimePoint.Q" "TimePoint.C"
# Test only linear effect
topTable(res.dl, coef = "TimePoint.L", number = 3)
## DataFrame with 3 rows and 9 columns
##         assay          ID     logFC   AveExpr         t     P.Value adj.P.Val
##   <character> <character> <numeric> <numeric> <numeric>   <numeric> <numeric>
## 1 CD4 T cells        DCXR -0.671645   6.52867  -5.42880 4.78666e-05  0.393058
## 2 CD4 T cells        GGA2 -0.890112   4.98987  -5.26372 6.69370e-05  0.393058
## 3 CD8 T cells        FTH1 -0.759875  14.55380  -4.99199 8.13672e-05  0.393058
##           B     z.std
##   <numeric> <numeric>
## 1  1.686147  -5.42880
## 2  0.936906  -5.26372
## 3  1.627753  -4.99199
# Test linear, quadratic and cubic effcts
coefs <- c("TimePoint.L", "TimePoint.Q", "TimePoint.C")
topTable(res.dl, coef = coefs, number = 3)
## DataFrame with 3 rows and 9 columns
##         assay                   ID TimePoint.L TimePoint.Q TimePoint.C
##   <character>          <character>   <numeric>   <numeric>   <numeric>
## 1 CD8 T cells                 CD52    0.903032   -1.528456   -0.984065
## 2 CD8 T cells CCL5_ENSG00000161570    0.626201   -0.986502   -0.645468
## 3 CD8 T cells                  CD2    0.887578   -1.152415   -0.195616
##     AveExpr         F     P.Value adj.P.Val
##   <numeric> <numeric>   <numeric> <numeric>
## 1   8.69458   16.0946 1.90868e-05 0.0853298
## 2  11.88811   16.0433 1.94982e-05 0.0853298
## 3   9.67181   15.7793 2.17777e-05 0.0853298

Sample filtering

Due to variation in cell and read count for each sample, processAssays() filters out some sample. This filtering is summarized here:

details(res.dl)
##               assay n_retain    formula formDropsTerms n_genes n_errors
## 1           B cells       16 ~TimePoint          FALSE    1961        0
## 2   CD14+ Monocytes       16 ~TimePoint          FALSE    3087        0
## 3       CD4 T cells       16 ~TimePoint          FALSE    5262        0
## 4       CD8 T cells       16 ~TimePoint          FALSE    1030        0
## 5   Dendritic cells       13 ~TimePoint          FALSE     164        0
## 6 FCGR3A+ Monocytes       16 ~TimePoint          FALSE    1160        0
## 7    Megakaryocytes       13 ~TimePoint          FALSE     172        0
## 8          NK cells       16 ~TimePoint          FALSE    1656        0
##   error_initial
## 1         FALSE
## 2         FALSE
## 3         FALSE
## 4         FALSE
## 5         FALSE
## 6         FALSE
## 7         FALSE
## 8         FALSE

Whle all 16 samples are detained in B cells, only 9 are retained for megakaryocytes. This can result in a time point being dropped, and so the polynomial expansion for some cell types can have a lower degree. The combined results will then have NA values for these coefficients. For example, for TIMP1 in Megakaryocytes above there is not enought data to fit the cubic term, so TimePoint.C is NA.

Session Info

## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GO.db_3.20.0                org.Hs.eg.db_3.20.0        
##  [3] AnnotationDbi_1.69.0        zenith_1.9.0               
##  [5] muscData_1.21.0             scater_1.35.0              
##  [7] scuttle_1.17.0              ExperimentHub_2.15.0       
##  [9] AnnotationHub_3.15.0        BiocFileCache_2.15.0       
## [11] dbplyr_2.5.0                muscat_1.21.0              
## [13] dreamlet_1.5.0              SingleCellExperiment_1.29.1
## [15] SummarizedExperiment_1.37.0 Biobase_2.67.0             
## [17] GenomicRanges_1.59.0        GenomeInfoDb_1.43.0        
## [19] IRanges_2.41.0              S4Vectors_0.45.1           
## [21] BiocGenerics_0.53.2         generics_0.1.3             
## [23] MatrixGenerics_1.19.0       matrixStats_1.4.1          
## [25] variancePartition_1.37.1    BiocParallel_1.41.0        
## [27] limma_3.63.2                ggplot2_3.5.1              
## [29] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-9              httr_1.4.7               
##   [3] RColorBrewer_1.1-3        doParallel_1.0.17        
##   [5] Rgraphviz_2.51.0          numDeriv_2016.8-1.1      
##   [7] sctransform_0.4.1         tools_4.5.0              
##   [9] backports_1.5.0           utf8_1.2.4               
##  [11] R6_2.5.1                  metafor_4.6-0            
##  [13] mgcv_1.9-1                GetoptLong_1.0.5         
##  [15] withr_3.0.2               prettyunits_1.2.0        
##  [17] gridExtra_2.3             cli_3.6.3                
##  [19] sandwich_3.1-1            labeling_0.4.3           
##  [21] sass_0.4.9                KEGGgraph_1.67.0         
##  [23] SQUAREM_2021.1            mvtnorm_1.3-2            
##  [25] blme_1.0-6                mixsqp_0.3-54            
##  [27] parallelly_1.39.0         invgamma_1.1             
##  [29] RSQLite_2.3.7             shape_1.4.6.1            
##  [31] gtools_3.9.5              dplyr_1.1.4              
##  [33] Matrix_1.7-1              metadat_1.2-0            
##  [35] ggbeeswarm_0.7.2          fansi_1.0.6              
##  [37] abind_1.4-8               lifecycle_1.0.4          
##  [39] multcomp_1.4-26           yaml_2.3.10              
##  [41] edgeR_4.5.0               mathjaxr_1.6-0           
##  [43] gplots_3.2.0              SparseArray_1.7.1        
##  [45] grid_4.5.0                blob_1.2.4               
##  [47] crayon_1.5.3              lattice_0.22-6           
##  [49] beachmat_2.23.1           msigdbr_7.5.1            
##  [51] annotate_1.85.0           KEGGREST_1.47.0          
##  [53] magick_2.8.5              pillar_1.9.0             
##  [55] knitr_1.49                ComplexHeatmap_2.23.0    
##  [57] rjson_0.2.23              boot_1.3-31              
##  [59] estimability_1.5.1        corpcor_1.6.10           
##  [61] future.apply_1.11.3       codetools_0.2-20         
##  [63] glue_1.8.0                data.table_1.16.2        
##  [65] vctrs_0.6.5               png_0.1-8                
##  [67] Rdpack_2.6.1              gtable_0.3.6             
##  [69] assertthat_0.2.1          cachem_1.1.0             
##  [71] xfun_0.49                 mime_0.12                
##  [73] rbibutils_2.3             S4Arrays_1.7.1           
##  [75] Rfast_2.1.0               coda_0.19-4.1            
##  [77] reformulas_0.4.0          survival_3.7-0           
##  [79] iterators_1.0.14          tinytex_0.54             
##  [81] statmod_1.5.0             TH.data_1.1-2            
##  [83] nlme_3.1-166              pbkrtest_0.5.3           
##  [85] bit64_4.5.2               filelock_1.0.3           
##  [87] progress_1.2.3            EnvStats_3.0.0           
##  [89] bslib_0.8.0               TMB_1.9.15               
##  [91] irlba_2.3.5.1             vipor_0.4.7              
##  [93] KernSmooth_2.23-24        colorspace_2.1-1         
##  [95] rmeta_3.0                 DBI_1.2.3                
##  [97] DESeq2_1.47.0             tidyselect_1.2.1         
##  [99] emmeans_1.10.5            curl_6.0.0               
## [101] bit_4.5.0                 compiler_4.5.0           
## [103] graph_1.85.0              BiocNeighbors_2.1.0      
## [105] DelayedArray_0.33.1       bookdown_0.41            
## [107] scales_1.3.0              caTools_1.18.3           
## [109] remaCor_0.0.18            rappdirs_0.3.3           
## [111] stringr_1.5.1             digest_0.6.37            
## [113] minqa_1.2.8               rmarkdown_2.29           
## [115] aod_1.3.3                 XVector_0.47.0           
## [117] RhpcBLASctl_0.23-42       htmltools_0.5.8.1        
## [119] pkgconfig_2.0.3           lme4_1.1-35.5            
## [121] sparseMatrixStats_1.19.0  mashr_0.2.79             
## [123] fastmap_1.2.0             rlang_1.1.4              
## [125] GlobalOptions_0.1.2       UCSC.utils_1.3.0         
## [127] DelayedMatrixStats_1.29.0 farver_2.1.2             
## [129] jquerylib_0.1.4           zoo_1.8-12               
## [131] jsonlite_1.8.9            BiocSingular_1.23.0      
## [133] RCurl_1.98-1.16           magrittr_2.0.3           
## [135] GenomeInfoDbData_1.2.13   munsell_0.5.1            
## [137] Rcpp_1.0.13-1             babelgene_22.9           
## [139] viridis_0.6.5             EnrichmentBrowser_2.37.0 
## [141] RcppZiggurat_0.1.6        stringi_1.8.4            
## [143] zlibbioc_1.53.0           MASS_7.3-61              
## [145] plyr_1.8.9                listenv_0.9.1            
## [147] parallel_4.5.0            ggrepel_0.9.6            
## [149] Biostrings_2.75.1         splines_4.5.0            
## [151] hms_1.1.3                 circlize_0.4.16          
## [153] locfit_1.5-9.10           reshape2_1.4.4           
## [155] ScaledMatrix_1.15.0       BiocVersion_3.21.1       
## [157] XML_3.99-0.17             evaluate_1.0.1           
## [159] RcppParallel_5.1.9        BiocManager_1.30.25      
## [161] nloptr_2.1.1              foreach_1.5.2            
## [163] tidyr_1.3.1               purrr_1.0.2              
## [165] future_1.34.0             clue_0.3-65              
## [167] scattermore_1.2           ashr_2.2-63              
## [169] rsvd_1.0.5                broom_1.0.7              
## [171] xtable_1.8-4              fANCOVA_0.6-1            
## [173] viridisLite_0.4.2         truncnorm_1.0-9          
## [175] tibble_3.2.1              lmerTest_3.1-3           
## [177] glmmTMB_1.1.10            memoise_2.0.1            
## [179] beeswarm_0.4.0            cluster_2.1.6            
## [181] globals_0.16.3            GSEABase_1.69.0