lute algorithm class definitions and uses (original) (raw)
About
This guide describes lute
’s generics, methods, and classes for algorithms, including deconvolution and marker selection algorithms. This software and the method to rescale on cell type-specific sizes is detailed in the manuscriptMaden et al. (2024). This may be useful to algorithm developers and researchers interested in conducting systematic algorithm benchmarks.
Background
The class structure used by lute
is based on the bluster R/Bioconductor package. It expands on that class structure by defining a hierarchy.
Motivation
Many algorithms are maintained and versioned in GitHub or Zenodo rather than a routinely versioned repository such as Bioconductor or CRAN. This can prove an obstacle when tracing package development and attempting comprehensive benchmarks, as software that is not actively maintained can become deprecated over time, and not all software will use compatible dependency versions (Maden et al. (2023)).
lute
classes can help to (1.) encourage use of common Bioconductor object classes (e.g. SummarizedExperiment
, SingleCellExperiment
,DelayedArray
, etc.) and (2.) to use more standard inputs and outputs to encourage code reuse, discourage duplicated efforts, and enable more rapid and exhaustive benchmarks.
Classes
In a general sense, the class hierarchy is a wrapper allowing access to many algorithms using a single function and shared methods. However, it is possible to share data reformatting and preprocessing tasks, making the hierarchy more effectively similar to a workflow.
typemarkerParam
Topmost parameter class for cell type gene markers. This is used to manage the marker IDs.
deconvolutionParam
This is the parent class for all deconvolution algorithm param objects. ThedeconvolutionParam
class is minimal, and simply defines slots forbulkExpression
, or a matrix of bulk expression data, and returnInfo
, a logical value indicating whether the default algorithm output will be stored and returned with standard output from running the deconvolution()
method on a valid algorithm param object.
referencebasedParam
As shown in the class hierarchy diagram (above), referencebasedParam
is a parent subclass inheriting attributes from deconvolutionParam
. It is meant to contain and manage all tasks shared by reference-based deconvolution algorithms, or algorithms that utilize a cell type summary dataset. This is to be distinguished from reference-free algorithms.
This param class adds slots for referenceExpression
, the cell type reference data, and cellScaleFactors
, an optional vector of cell type size factors used to transform the reference.
independentbulkParam
This class is a subset of referencebasedParam
algorithms specifying explicit samples used separately, such as for discrete training and test stages.
This param class adds a slot called bulkExpressionIndependent
, which is for a dataset of bulk samples independent from samples specified in thebulkExpression
slot.
Helper functions
lute
provides a number of helper functions used to make the algorithm classes work. These include the parent classes and subclasses, and several functions to convert between object classes. These helper functions may be useful to developers. The following table indicates the functions and a short summary of what they do.
Algorithms
findMarkers
The param class findmarkersParam
is defined for the function findMarkers()
from scran
(see ?findmarkersParam
). This is a function to identify cell type marker genes from a single-cell or single-nucleus expression dataset.
The findmarkersParam
class is organized under its parent classes astypemarkersParam->findMarkersParam
. It includes the typemarkers()
method for the identification of marker genes, and show()
for inspecting the param contents.
The following images annotate the constructor function and the typemarkers()
generic defined for the findmarkersParam
class.
NNLS
The param class nnlsParam
is defined for the function nnls
from the nnls
R/CRAN package (see ?nnlsParam
). Non-negative least squares (NNLS) is commonly used for deconvolution.
The nnlsParam
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->nnlsParam
. It includes thedeconvolution()
generic for cell type deconvolution, and the show()
method for inspecting the param contents.
The following images annotate the constructor function and the deconvolution()
generic defined for the nnlsParam
class.
Bisque
The param class bisqueParam
is defined for the functionReferenceBasedDeconvolution
from the BisqueRNA
R/Bioconductor package (see?bisqueParam
). The Bisque algorithm adjusts on assay-specific biases arising between the bulk and single-cell or single-nucleus platforms used to generate expression datasets for deconvolution.
The bisqueParam
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->independentbulkParam->bisqueParam
. It includes the deconvolution()
generic for cell type deconvolution, and theshow()
method for inspecting the param contents.
The following images annotate the constructor function and the deconvolution()
generic defined for the bisqueParam
class.
Extensions
We demonstrated the extensibility and flexibility of lute
’s generic, method, and class system by extending support for additional algorithms beyond the 3 described above.
These algorithms can be used by sourcing the provided R/GitHub packages which pair the classes and functions with YML files for easier dependency management.
meanRatios
The param class meanratiosParam
is defined for the functionget_mean_ratios2()
from the DeconvoBuddies
R/GitHub package at LieberInstitute/DeconvoBuddies. This function uses the mean of cell type summary ratios to rank and select for top marker genes.
The meanratiosParam
class is organized under its parent classes astypemarkersParam->meanratiosParam
. It includes the typemarkers()
generic for the identification of marker genes, and the show()
method for inspecting the param contents.
meanratiosParam
is available from GitHub at metamaden/meanratiosParam.
DeconRNASeq
deconvolutionParam->referencebasedParam->deconrnaseqParam
.
The param class deconrnaseqParam
is defined for the function DeconRNASeq
(see ?deconrnaseqParam
) from the DeconRNASeq
R/Bioconductor package (link). The DeconRNASeq algorithm uses weighted averaged expression between types to predicted cell type amounts more accurately for heterogeneous tissues (Gong and Szustakowski (2013)).
The deconrnaseqParam
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->deconrnaseqParam
. It includes thedeconvolution()
generic for cell type deconvolution, and the show()
method for inspecting the param contents.
The deconrnaseqParam
class is available from GitHub atmetamaden/deconrnaseqParam
EPIC
The param class epicParam
is defined for the function EPIC
from the EPIC
R/GitHub package (see ?epicParam
). The EPIC algorithm was developed in blood samples and incorporates cell size mRNA abundance (i.e. cell size) and variance normalizations (Racle and Gfeller (2020)).
The epicParam
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->epicParam
. It includes thedeconvolution()
generic for cell type deconvolution, and the show()
method for inspecting the param contents.
The epicParam
class is available from GitHub atmetamaden/epicParam
MuSiC
The param class musicParam
is defined for the functionReferenceBasedDeconvolution
from the MuSiC
R/GitHub package (see?musicParam
). The MuSiC algorithm adjusts on between-source variances for reference data from multiple sources (Wang et al. (2019)).
The musicParam
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->musicParam
. It includes thedeconvolution()
generic for cell type deconvolution, and the show()
method for inspecting the param contents.
The musicParam
class is available from GitHub at metamaden/musicParam
MuSiC2
The param class music2Param
is defined for 2 implementations of the MuSiC2
algorithm from the MuSiC
and MuSiC2
R/GitHub packages, respectively (see?music2Param
). The MuSiC2 algorithm pairs the features of the MuSiC algorithm with an additional filter for marker genes differentially expressed between cases and controls in the bulk and expression datasets (Fan et al. (2022)).
The music2Param
class is organized under its parent classes asdeconvolutionParam->referencebasedParam->independentbulkParam->music2Param
. It includes the deconvolution()
generic for cell type deconvolution, and theshow()
method for inspecting param contents.
The music2Param
class is available from GitHub at metamaden/music2Param
Conclusions
This vignette showed how lute
’s classes and methods are extensible and modular, and can encourage further development with standard algorithm I/O and object class management. First, we described lute
’s algorithm class hierarchy, including how its parent classes and subclasses manage common tasks shared among algorithms and a table of functions developers may find useful. Further, we showed the annotated class and generic functions for algorithms supported bylute
out of the box. Finally, we detail additional algorithms supported by R/GitHub packages that may be individually installed.
Session info
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] lute_1.4.0 SingleCellExperiment_1.30.0
## [3] SummarizedExperiment_1.38.0 Biobase_2.68.0
## [5] GenomicRanges_1.60.0 GenomeInfoDb_1.44.0
## [7] IRanges_2.42.0 S4Vectors_0.46.0
## [9] BiocGenerics_0.54.0 generics_0.1.3
## [11] MatrixGenerics_1.20.0 matrixStats_1.5.0
## [13] BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] xfun_0.52 bslib_0.9.0 lattice_0.22-7
## [4] vctrs_0.6.5 tools_4.5.0 parallel_4.5.0
## [7] tibble_3.2.1 cluster_2.1.8.1 pkgconfig_2.0.3
## [10] BiocNeighbors_2.2.0 Matrix_1.7-3 dqrng_0.4.1
## [13] lifecycle_1.0.4 GenomeInfoDbData_1.2.14 compiler_4.5.0
## [16] statmod_1.5.0 bluster_1.18.0 codetools_0.2-20
## [19] htmltools_0.5.8.1 sass_0.4.10 yaml_2.3.10
## [22] pillar_1.10.2 crayon_1.5.3 jquerylib_0.1.4
## [25] BiocParallel_1.42.0 limma_3.64.0 DelayedArray_0.34.0
## [28] cachem_1.1.0 abind_1.4-8 metapod_1.16.0
## [31] locfit_1.5-9.12 tidyselect_1.2.1 rsvd_1.0.5
## [34] digest_0.6.37 BiocSingular_1.24.0 dplyr_1.1.4
## [37] bookdown_0.43 fastmap_1.2.0 grid_4.5.0
## [40] cli_3.6.4 SparseArray_1.8.0 magrittr_2.0.3
## [43] S4Arrays_1.8.0 edgeR_4.6.0 UCSC.utils_1.4.0
## [46] rmarkdown_2.29 XVector_0.48.0 httr_1.4.7
## [49] igraph_2.1.4 scran_1.36.0 ScaledMatrix_1.16.0
## [52] beachmat_2.24.0 evaluate_1.0.3 knitr_1.50
## [55] irlba_2.3.5.1 rlang_1.1.6 Rcpp_1.0.14
## [58] scuttle_1.18.0 glue_1.8.0 BiocManager_1.30.25
## [61] jsonlite_2.0.0 R6_2.6.1
Works cited
Fan, Jiaxin, Yafei Lyu, Qihuang Zhang, Xuran Wang, Mingyao Li, and Rui Xiao. 2022. “MuSiC2: Cell-Type Deconvolution for Multi-Condition Bulk RNA-Seq Data.” Briefings in Bioinformatics, October, bbac430. https://doi.org/10.1093/bib/bbac430.
Gong, Ting, and Joseph D. Szustakowski. 2013. “DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mRNA-Seq Data.” Bioinformatics 29 (8): 1083–5. https://doi.org/10.1093/bioinformatics/btt090.
Maden, Sean K., Louise A. Huuki-Myers, Sang Ho Kwon, Leonardo Collado-Torres, Kristen R. Maynard, and Stephanie C. Hicks. 2024. “Lute: Estimating the Cell Composition of Heterogeneous Tissue with Varying Cell Sizes Using Gene Expression.” bioRxiv. https://doi.org/10.1101/2024.04.04.588105.
Maden, Sean K., Sang Ho Kwon, Louise A. Huuki-Myers, Leonardo Collado-Torres, Stephanie C. Hicks, and Kristen R. Maynard. 2023. “Challenges and Opportunities to Computationally Deconvolve Heterogeneous Tissue with Varying Cell Sizes Using Single Cell RNA-Sequencing Datasets.” arXiv. https://doi.org/10.48550/arXiv.2305.06501.
Racle, Julien, and David Gfeller. 2020. “EPIC: A Tool to Estimate the Proportions of Different Cell Types from Bulk Gene Expression Data.” Edited by Sebastian Boegel, Methods in Molecular Biology,, 233–48. https://doi.org/10.1007/978-1-0716-0327-7_17.
Wang, Xuran, Jihwan Park, Katalin Susztak, Nancy R. Zhang, and Mingyao Li. 2019. “Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference.” Nature Communications 10 (1): 380. https://doi.org/10.1038/s41467-018-08023-x.