miaSim: Microbiome Data Simulation (original) (raw)
Contents
Introduction
miaSim
implements tools for microbiome data simulation based on varying ecological modeling assumptions. These can be used to simulate species abundance matrices, including time series. Detailed function documentation is available at the function reference
The miaSim package supports the R/Bioconductor framework based on the TreeSummarizedExperiment data container. For more information on operating with this data format in microbial ecology, see the online tutorial.
Installation
The stable Bioconductor release version can be installed as follows.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!requireNamespace("miaSim", quietly = TRUE))
BiocManager::install("miaSim")
The experimental Bioconductor devel version can be installed as follows.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("miaSim")
Load the library
library(miaSim)
Examples
Generating species interaction matrices
Some of the models rely on interaction matrices that represents interaction heterogeneity between species. The interaction matrix can be generated with different distributional assumptions.
Generate interactions from normal distribution:
A_normal <- powerlawA(n_species = 4, alpha = 3)
Generate interactions from uniform distribution:
A_uniform <- randomA(n_species = 10,
diagonal = -0.4,
connectance = 0.5,
interactions = runif(n = 10^2, min = -0.8, max = 0.8))
Ricker model
Ricker model is a discrete version of the gLV:
tse_ricker <- simulateRicker(n_species=4, A = A_normal, t_end=100, norm = FALSE)
The number of species specified in the interaction matrix must be the same as the species used in the models.
Hubbell model
Hubbell Neutral simulation model characterizes diversity and relative abundance of species in ecological communities assuming migration, births and deaths but no interactions. Losses become replaced by migration or birth.
tse_hubbell <- simulateHubbell(n_species = 8,
M = 10,
carrying_capacity = 1000,
k_events = 50,
migration_p = 0.02,
t_end = 100)
One can also simulate parameters for the Hubbell model.
params_hubbell <- simulateHubbellRates(x0 = c(0,5,10),
migration_p = 0.1, metacommunity_probability = NULL, k_events = 1,
growth_rates = NULL, norm = FALSE, t_end=1000)
Self-Organised Instability (SOI)
The Self-Organised Instability (SOI) model generates time series for communities and accelerates stochastic simulation.
tse_soi <- simulateSOI(n_species = 4, carrying_capacity = 1000,
A = A_normal, k_events=5,
x0 = NULL,t_end = 150, norm = TRUE)
Stochastic logistic model
Stochastic logistic model is used to determine dead and alive counts in community.
tse_logistic <- simulateStochasticLogistic(n_species = 5)
Consumer-resource model
The consumer resource model requires the randomE
function. This returns a matrix containing the production rates and consumption rates of each species. The resulting matrix is used as a determination of resource consumption efficiency.
# Consumer-resource model as a TreeSE object
tse_crm <- simulateConsumerResource(n_species = 2,
n_resources = 4,
E = randomE(n_species = 2, n_resources = 4))
You could visualize the simulated dynamics using tools from the miaTime package.
Generalized Lotka-Volterra (gLV)
The generalized Lotka-Volterra simulation model generates time-series assuming microbial population dynamics and interaction.
tse_glv <- simulateGLV(n_species = 4,
A = A_normal,
t_start = 0,
t_store = 1000,
stochastic = FALSE,
norm = FALSE)
Data containers
The simulated data sets are returned as TreeSummarizedExperiment
objects. This provides access to a broad range of tools for microbiome analysis that support this format (seemicrobiome.github.io). More examples on the object manipulation and analysis can be found at OMA Online Manual.
For instance, to plot population density we can use the miaViz
package:
library(miaViz)
p1 <- plotAbundanceDensity(tse_hubbell, assay.type = "counts")
p2 <- plotSeries(tse_hubbell, x = "time")
print(p1+p2)
Case studies
Source code for replicating the published case studies using the miaSim package (Gao et al. 2023) is available in the phyloseq folder (based on the phyloseq data container). Some of the original case studies have now been replicated also with TreeSummarized, see the TreeSummarizedExperiment folder.
Session info
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] miaViz_1.16.0 mia_1.16.0
## [3] MultiAssayExperiment_1.34.0 ggraph_2.2.1
## [5] ggplot2_3.5.2 miaSim_1.14.0
## [7] TreeSummarizedExperiment_2.16.0 Biostrings_2.76.0
## [9] XVector_0.48.0 SingleCellExperiment_1.30.0
## [11] SummarizedExperiment_1.38.0 Biobase_2.68.0
## [13] GenomicRanges_1.60.0 GenomeInfoDb_1.44.0
## [15] IRanges_2.42.0 S4Vectors_0.46.0
## [17] BiocGenerics_0.54.0 generics_0.1.3
## [19] MatrixGenerics_1.20.0 matrixStats_1.5.0
## [21] BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_2.0.0 magrittr_2.0.3
## [3] magick_2.8.6 TH.data_1.1-3
## [5] estimability_1.5.1 ggbeeswarm_0.7.2
## [7] farver_2.1.2 rmarkdown_2.29
## [9] ragg_1.4.0 fs_1.6.6
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] DelayedMatrixStats_1.30.0 ggtree_3.16.0
## [15] tinytex_0.57 htmltools_0.5.8.1
## [17] S4Arrays_1.8.0 BiocBaseUtils_1.10.0
## [19] BiocNeighbors_2.2.0 deSolve_1.40
## [21] janeaustenr_1.0.0 cellranger_1.1.0
## [23] gridGraphics_0.5-1 SparseArray_1.8.0
## [25] parallelly_1.43.0 sass_0.4.10
## [27] bslib_0.9.0 tokenizers_0.3.0
## [29] plyr_1.8.9 DECIPHER_3.4.0
## [31] sandwich_3.1-1 emmeans_1.11.0
## [33] zoo_1.8-14 cachem_1.1.0
## [35] igraph_2.1.4 lifecycle_1.0.4
## [37] pkgconfig_2.0.3 rsvd_1.0.5
## [39] Matrix_1.7-3 R6_2.6.1
## [41] fastmap_1.2.0 GenomeInfoDbData_1.2.14
## [43] tidytext_0.4.2 aplot_0.2.5
## [45] digest_0.6.37 colorspace_2.1-1
## [47] ggnewscale_0.5.1 patchwork_1.3.0
## [49] scater_1.36.0 irlba_2.3.5.1
## [51] SnowballC_0.7.1 textshaping_1.0.0
## [53] vegan_2.6-10 beachmat_2.24.0
## [55] labeling_0.4.3 mgcv_1.9-3
## [57] httr_1.4.7 polyclip_1.10-7
## [59] abind_1.4-8 compiler_4.5.0
## [61] withr_3.0.2 BiocParallel_1.42.0
## [63] viridis_0.6.5 DBI_1.2.3
## [65] ggforce_0.4.2 MASS_7.3-65
## [67] poweRlaw_1.0.0 DelayedArray_0.34.0
## [69] bluster_1.18.0 permute_0.9-7
## [71] tools_4.5.0 vipor_0.4.7
## [73] beeswarm_0.4.0 ape_5.8-1
## [75] glue_1.8.0 nlme_3.1-168
## [77] gridtext_0.1.5 grid_4.5.0
## [79] cluster_2.1.8.1 reshape2_1.4.4
## [81] gtable_0.3.6 tzdb_0.5.0
## [83] fillpattern_1.0.2 tidyr_1.3.1
## [85] hms_1.1.3 BiocSingular_1.24.0
## [87] tidygraph_1.3.1 ScaledMatrix_1.16.0
## [89] xml2_1.3.8 ggrepel_0.9.6
## [91] pillar_1.10.2 stringr_1.5.1
## [93] yulab.utils_0.2.0 splines_4.5.0
## [95] dplyr_1.1.4 ggtext_0.1.2
## [97] tweenr_2.0.3 treeio_1.32.0
## [99] lattice_0.22-7 survival_3.8-3
## [101] tidyselect_1.2.1 DirichletMultinomial_1.50.0
## [103] scuttle_1.18.0 knitr_1.50
## [105] gridExtra_2.3 bookdown_0.43
## [107] xfun_0.52 graphlayouts_1.2.2
## [109] rbiom_2.2.0 stringi_1.8.7
## [111] UCSC.utils_1.4.0 ggfun_0.1.8
## [113] lazyeval_0.2.2 yaml_2.3.10
## [115] evaluate_1.0.3 codetools_0.2-20
## [117] tibble_3.2.1 BiocManager_1.30.25
## [119] ggplotify_0.1.2 cli_3.6.4
## [121] systemfonts_1.2.2 xtable_1.8-4
## [123] munsell_0.5.1 jquerylib_0.1.4
## [125] readxl_1.4.5 Rcpp_1.0.14
## [127] coda_0.19-4.1 parallel_4.5.0
## [129] readr_2.1.5 sparseMatrixStats_1.20.0
## [131] slam_0.1-55 decontam_1.28.0
## [133] viridisLite_0.4.2 mvtnorm_1.3-3
## [135] tidytree_0.4.6 scales_1.3.0
## [137] purrr_1.0.4 crayon_1.5.3
## [139] rlang_1.1.6 multcomp_1.4-28