Pre-compiled Binary Files in Simulation Studies (original) (raw)
Reusing pre-compiled binaries in simulation studies
The following example demonstrate the logic of how to reuse pre-compiled binary files in the context of Monte Carlo simulations. The goal in this case is to:
- Pre-compile the necessary binary files before running
runSimulation()
, - Include tractable extraction information in the
Design
/fixed_objects
inputs to point to and extract the necessary binary object associated with the respective simulation conditions under investigation, respectively, and - Sit around for a while, as no doubt the use of pre-compiled binaries suggests the computational problem is already intensive!
The following code example is an adaption of Mark Lai’s example using cmdstanr
alongside SimDesign
. See https://mc-stan.org/cmdstanr/articles/cmdstanr.html for further information on using the cmdstanr
package.
Example using Stan
Step 1: Define and compile binary files
# if you haven't done so, install the cmdstanr package using the following:
# install.packages("cmdstanr",
# repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
#
# and follow all the necessary setup instructions to get the package working
The first step in the process would be to generate all the necessary Stan objects and store them into a meaningfully named list. Below two objects are generated to perform the same estimation of a population mean, however the first uses a weak normal prior (\(\mu \sim N(0, 10)\)) while the latter uses a much stronger prior (\(\mu \sim N(0,2\))).
library(cmdstanr)
bmean_stan_weak <- "
data {
int<lower=0> N;
array[N] real x;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
target += normal_lpdf(mu | 0, 10); // weakly informative prior
target += normal_lpdf(x | mu, sigma);
}
"
bmean_stan_strong <- "
data {
int<lower=0> N;
array[N] real x;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
target += normal_lpdf(mu | 0, 2); // more informative prior
target += normal_lpdf(x | mu, sigma);
}
"
# Save files and compile
stan_path_weak <- write_stan_file(bmean_stan_weak)
stan_path_strong <- write_stan_file(bmean_stan_strong)
# compile and store within fixed objects for use in runSimulation()
fo <- list(weak=cmdstan_model(stan_path_weak),
strong=cmdstan_model(stan_path_strong))
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
159 | inline math::var log_prob(std::vector<math::var>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
154 | inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
96 | inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
91 | inline double log_prob(Eige
n::VectorXd& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
159 | inline math::var log_prob(std::vector<math::var>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
154 | inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
96 | inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostrea
m*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
91 | inline double log_prob(Eigen::VectorXd& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
Step 2: Define suitable simulation flow
Elements in the above fo
object (passed to runSimulation()
via the fixed_objects
input) are correctly pointed to when needed.
# function to run MCMC and extract associated mean estimate
stan_fit <- function(mod, dat){
# Stan is noisy, so tell it to be more quiet()
mcmc <- quiet(mod$sample(list(x = dat, N = length(dat)),
refresh = 0, chains = 1,
show_messages = FALSE))
if (mcmc$summary("mu")[1, "rhat"] > 1.01) {
# Gives an error when the convergence criterion is not met
stop("MCMC did not converge.")
}
MB <- mcmc$summary("mu", mean)$mean[1]
MB
}
# SimDesign::SimFunctions()
library(SimDesign)
Design <- createDesign(sample_size = c(30, 60, 120, 240),
distribution = c('norm', 'chi'))
Design
# A tibble: 8 × 2
sample_size distribution
<dbl> <chr>
1 30 norm
2 60 norm
3 120 norm
4 240 norm
5 30 chi
6 60 chi
7 120 chi
8 240 chi
Generate <- function(condition, fixed_objects = NULL) {
Attach(condition)
if(distribution == 'norm'){
dat <- rnorm(sample_size, mean = 3)
} else if(distribution == 'chi'){
dat <- rchisq(sample_size, df = 3)
}
dat
}
Analyse <- function(condition, dat, fixed_objects = NULL) {
Attach(condition)
M0 <- mean(dat)
M1 <- mean(dat, trim = .1)
M2 <- mean(dat, trim = .2)
med <- median(dat)
# extract suitable pre-compiled Stan object
MB_weak <- stan_fit(fixed_objects$weak, dat)
MB_strong <- stan_fit(fixed_objects$strong, dat)
ret <- c(mean_no_trim = M0, mean_trim.1 = M1,
mean_trim.2 = M2, median = med,
mean_weak_prior = MB_weak,
mean_strong_prior = MB_strong)
ret
}
Summarise <- function(condition, results, fixed_objects = NULL) {
obs_bias <- bias(results, parameter = 3)
obs_RMSE <- RMSE(results, parameter = 3)
ret <- c(bias = obs_bias, RMSE = obs_RMSE, RE = RE(obs_RMSE))
ret
}
Step 3: Run the simulation in parallel
res <- runSimulation(Design, replications = 1000, generate = Generate,
analyse = Analyse, summarise = Summarise,
fixed_objects = fo, parallel=TRUE,
packages = "cmdstanr")
Number of parallel clusters in use: 27
Design: 1/8; Replications: 1000; RAM Used: 65.8 Mb; Total Time: 0.00s
Conditions: sample_size=30, distribution=norm
Design: 2/8; Replications: 1000; RAM Used: 67.3 Mb; Total Time: 01m 52.66s
Conditions: sample_size=60, distribution=norm
Design: 3/8; Replications: 1000; RAM Used: 67.3 Mb; Total Time: 03m 45.57s
Conditions: sample_size=120, distribution=norm
Design: 4/8; Replications: 1000; RAM Used: 67.4 Mb; Total Time: 05m 34.54s
Conditions: sample_size=240, distribution=norm
Design: 5/8; Replications: 1000; RAM Used: 67.4 Mb; Total Time: 07m 27.41s
Conditions: sample_size=30, distribution=chi
Design: 6/8; Replications: 1000; RAM Used: 67.5 Mb; Total Time: 09m 19.83s
Conditions: sample_size=60, distribution=chi
Design: 7/8; Replications: 1000; RAM Used: 67.5 Mb; Total Time: 11m 13.40s
Conditions: sample_size=120, distribution=chi
Design: 8/8; Replications: 1000; RAM Used: 67.6 Mb; Total Time: 13m 9.46s
Conditions: sample_size=240, distribution=chi
Simulation complete. Total execution time: 15m 6.33s
Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
running command '"quarto"
TMPDIR=C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/file1154453c42e6 -V' had
status 1
# A tibble: 8 × 26
sample_size distribution bias.mean_no_trim bias.mean_trim.1 bias.mean_trim.2
<dbl> <chr> <dbl> <dbl> <dbl>
1 30 norm -0.012206 -0.012205 -0.010733
2 60 norm -0.00066471 -0.00058635 -0.00049449
3 120 norm -0.00045496 -0.0013078 -0.0018907
4 240 norm -0.0033323 -0.0030002 -0.0034330
5 30 chi 0.0084255 -0.30383 -0.44208
6 60 chi -0.0032308 -0.33978 -0.48129
7 120 chi -0.0061516 -0.35092 -0.49200
8 240 chi -0.0055454 -0.35250 -0.49620
# ℹ 21 more variables: bias.median <dbl>, bias.mean_weak_prior <dbl>,
# bias.mean_strong_prior <dbl>, RMSE.mean_no_trim <dbl>,
# RMSE.mean_trim.1 <dbl>, RMSE.mean_trim.2 <dbl>, RMSE.median <dbl>,
# RMSE.mean_weak_prior <dbl>, RMSE.mean_strong_prior <dbl>,
# RE.mean_no_trim <dbl>, RE.mean_trim.1 <dbl>, RE.mean_trim.2 <dbl>,
# RE.median <dbl>, RE.mean_weak_prior <dbl>, RE.mean_strong_prior <dbl>,
# REPLICATIONS <dbl>, SIM_TIME <chr>, RAM_USED <chr>, SEED <int>, …
# render output better
knitr::kable(res)
sample_size | distribution | bias.mean_no_trim | bias.mean_trim.1 | bias.mean_trim.2 | bias.median | bias.mean_weak_prior | bias.mean_strong_prior | RMSE.mean_no_trim | RMSE.mean_trim.1 | RMSE.mean_trim.2 | RMSE.median | RMSE.mean_weak_prior | RMSE.mean_strong_prior | RE.mean_no_trim | RE.mean_trim.1 | RE.mean_trim.2 | RE.median | RE.mean_weak_prior | RE.mean_strong_prior | REPLICATIONS | SIM_TIME | RAM_USED | SEED | COMPLETED | ERRORS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | norm | -0.012 | -0.012 | -0.011 | -0.013 | -0.014 | -0.040 | 0.177 | 0.182 | 0.189 | 0.218 | 0.177 | 0.180 | 1 | 1.05 | 1.13 | 1.52 | 1.001 | 1.031 | 1000 | 113 | 67.3 Mb | 1.94e+09 | Thu Apr 17 16:24:52 2025 | 54 |
60 | norm | -0.001 | -0.001 | 0.000 | -0.001 | -0.001 | -0.014 | 0.127 | 0.131 | 0.136 | 0.160 | 0.127 | 0.128 | 1 | 1.05 | 1.14 | 1.57 | 0.999 | 1.005 | 1000 | 113 | 67.3 Mb | 1.34e+09 | Thu Apr 17 16:26:45 2025 | 45 |
120 | norm | 0.000 | -0.001 | -0.002 | -0.002 | -0.001 | -0.007 | 0.095 | 0.097 | 0.101 | 0.116 | 0.095 | 0.095 | 1 | 1.05 | 1.13 | 1.50 | 1.000 | 1.002 | 1000 | 109 | 67.4 Mb | 6.70e+08 | Thu Apr 17 16:28:34 2025 | 43 |
240 | norm | -0.003 | -0.003 | -0.003 | -0.005 | -0.003 | -0.006 | 0.068 | 0.069 | 0.071 | 0.081 | 0.068 | 0.068 | 1 | 1.03 | 1.09 | 1.42 | 1.000 | 1.006 | 1000 | 113 | 67.4 Mb | 1.23e+09 | Thu Apr 17 16:30:27 2025 | 38 |
30 | chi | 0.008 | -0.304 | -0.442 | -0.587 | 0.002 | -0.152 | 0.433 | 0.516 | 0.612 | 0.746 | 0.430 | 0.401 | 1 | 1.42 | 2.00 | 2.97 | 0.987 | 0.860 | 1000 | 112 | 67.5 Mb | 1.52e+09 | Thu Apr 17 16:32:20 2025 | 42 |
60 | chi | -0.003 | -0.340 | -0.481 | -0.617 | -0.007 | -0.082 | 0.316 | 0.452 | 0.567 | 0.707 | 0.315 | 0.304 | 1 | 2.05 | 3.23 | 5.01 | 0.992 | 0.925 | 1000 | 114 | 67.5 Mb | 1.13e+09 | Thu Apr 17 16:34:13 2025 | 36 |
120 | chi | -0.006 | -0.351 | -0.492 | -0.628 | -0.008 | -0.044 | 0.219 | 0.406 | 0.533 | 0.670 | 0.219 | 0.215 | 1 | 3.45 | 5.95 | 9.39 | 1.004 | 0.965 | 1000 | 116 | 67.6 Mb | 1.89e+09 | Thu Apr 17 16:36:09 2025 | 42 |
240 | chi | -0.006 | -0.353 | -0.496 | -0.638 | -0.006 | -0.024 | 0.152 | 0.381 | 0.518 | 0.659 | 0.152 | 0.151 | 1 | 6.27 | 11.57 | 18.74 | 1.002 | 0.989 | 1000 | 117 | 67.6 Mb | 1.30e+09 | Thu Apr 17 16:38:06 2025 | 34 |
Notes about parallel processing
Much like the philosophy of constructing simulation code that can be executed using multiple cores in the package, it is generally better to let SimDesign
perform any necessary parallel processing distribution of the workload. Hence, while Stan supports parallel computations and the execution of multiple chains in parallel, it is generally advisable to execute each instance of Stan code on a single core so that within a given simulation replication there will not be any competition for available processor resources.
Notes on convergence of MCMC
The above example uses Stan for Markov Chain Monte Carlo (MCMC) sampling via the Hamiltonian method. While convergence is likely not an issue for this particular example, it is possible that convergence issues will appear in more complex models. One way to ensure the resulting replications have converged is to give an error when some convergence diagnostic statistic, such as \(\hat R\), is below a recommended threshold. In the stan_fit()
function above, an error is thrown whenever \(\hat R\) > 1.01 (see this paper). Another possibility is to call Stan again with more iterations if convergence failed.