Pre-compiled Binary Files in Simulation Studies (original) (raw)

Reusing pre-compiled binaries in simulation studies

The following example demonstrate the logic of how to reuse pre-compiled binary files in the context of Monte Carlo simulations. The goal in this case is to:

  1. Pre-compile the necessary binary files before running runSimulation(),
  2. Include tractable extraction information in the Design/fixed_objects inputs to point to and extract the necessary binary object associated with the respective simulation conditions under investigation, respectively, and
  3. Sit around for a while, as no doubt the use of pre-compiled binaries suggests the computational problem is already intensive!

The following code example is an adaption of Mark Lai’s example using cmdstanr alongside SimDesign. See https://mc-stan.org/cmdstanr/articles/cmdstanr.html for further information on using the cmdstanr package.

Example using Stan

Step 1: Define and compile binary files

# if you haven't done so, install the cmdstanr package using the following:
# install.packages("cmdstanr", 
#   repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
#
# and follow all the necessary setup instructions to get the package working

The first step in the process would be to generate all the necessary Stan objects and store them into a meaningfully named list. Below two objects are generated to perform the same estimation of a population mean, however the first uses a weak normal prior (\(\mu \sim N(0, 10)\)) while the latter uses a much stronger prior (\(\mu \sim N(0,2\))).

library(cmdstanr)

bmean_stan_weak <- "
    data {
        int<lower=0> N;
        array[N] real x;
    }
    parameters {
        real mu;
        real<lower=0> sigma;
    }
    model {
        target += normal_lpdf(mu | 0, 10);  // weakly informative prior
        target += normal_lpdf(x | mu, sigma);
    }
"

bmean_stan_strong <- "
    data {
        int<lower=0> N;
        array[N] real x;
    }
    parameters {
        real mu;
        real<lower=0> sigma;
    }
    model {
        target += normal_lpdf(mu | 0, 2);  // more informative prior
        target += normal_lpdf(x | mu, sigma);
    }
"

# Save files and compile
stan_path_weak <- write_stan_file(bmean_stan_weak)
stan_path_strong <- write_stan_file(bmean_stan_strong)

# compile and store within fixed objects for use in runSimulation()
fo <- list(weak=cmdstan_model(stan_path_weak), 
           strong=cmdstan_model(stan_path_strong))
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  159 |   inline math::var log_prob(std::vector<math::var>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  154 |   inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   96 |   inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   91 |   inline double log_prob(Eige
n::VectorXd& theta,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  159 |   inline math::var log_prob(std::vector<math::var>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  154 |   inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   96 |   inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostrea
m*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   91 |   inline double log_prob(Eigen::VectorXd& theta,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 

Step 2: Define suitable simulation flow

Elements in the above fo object (passed to runSimulation() via the fixed_objects input) are correctly pointed to when needed.

# function to run MCMC and extract associated mean estimate
stan_fit <- function(mod, dat){
    
    # Stan is noisy, so tell it to be more quiet()
    mcmc <- quiet(mod$sample(list(x = dat, N = length(dat)),
                             refresh = 0, chains = 1, 
                             show_messages = FALSE))
    if (mcmc$summary("mu")[1, "rhat"] > 1.01) {
        # Gives an error when the convergence criterion is not met
        stop("MCMC did not converge.")
    }
    MB <- mcmc$summary("mu", mean)$mean[1]
    MB
}
# SimDesign::SimFunctions()
library(SimDesign)

Design <- createDesign(sample_size = c(30, 60, 120, 240), 
                       distribution = c('norm', 'chi'))
Design
# A tibble: 8 × 2
  sample_size distribution
        <dbl> <chr>       
1          30 norm        
2          60 norm        
3         120 norm        
4         240 norm        
5          30 chi         
6          60 chi         
7         120 chi         
8         240 chi         
Generate <- function(condition, fixed_objects = NULL) {
    Attach(condition)
    if(distribution == 'norm'){
        dat <- rnorm(sample_size, mean = 3)
    } else if(distribution == 'chi'){
        dat <- rchisq(sample_size, df = 3)
    }
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    Attach(condition)
    
    M0 <- mean(dat)
    M1 <- mean(dat, trim = .1)
    M2 <- mean(dat, trim = .2)
    med <- median(dat)
    
    # extract suitable pre-compiled Stan object
    MB_weak <- stan_fit(fixed_objects$weak, dat)
    MB_strong <- stan_fit(fixed_objects$strong, dat)
    
    ret <- c(mean_no_trim = M0, mean_trim.1 = M1, 
             mean_trim.2 = M2, median = med, 
             mean_weak_prior = MB_weak, 
             mean_strong_prior = MB_strong)
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    obs_bias <- bias(results, parameter = 3)
    obs_RMSE <- RMSE(results, parameter = 3)
    ret <- c(bias = obs_bias, RMSE = obs_RMSE, RE = RE(obs_RMSE))
    ret
}

Step 3: Run the simulation in parallel

res <- runSimulation(Design, replications = 1000, generate = Generate, 
                     analyse = Analyse, summarise = Summarise, 
                     fixed_objects = fo, parallel=TRUE, 
                     packages = "cmdstanr")

Number of parallel clusters in use: 27


Design: 1/8;   Replications: 1000;   RAM Used: 65.8 Mb;   Total Time: 0.00s 
 Conditions: sample_size=30, distribution=norm


Design: 2/8;   Replications: 1000;   RAM Used: 67.3 Mb;   Total Time: 01m 52.66s 
 Conditions: sample_size=60, distribution=norm


Design: 3/8;   Replications: 1000;   RAM Used: 67.3 Mb;   Total Time: 03m 45.57s 
 Conditions: sample_size=120, distribution=norm


Design: 4/8;   Replications: 1000;   RAM Used: 67.4 Mb;   Total Time: 05m 34.54s 
 Conditions: sample_size=240, distribution=norm


Design: 5/8;   Replications: 1000;   RAM Used: 67.4 Mb;   Total Time: 07m 27.41s 
 Conditions: sample_size=30, distribution=chi


Design: 6/8;   Replications: 1000;   RAM Used: 67.5 Mb;   Total Time: 09m 19.83s 
 Conditions: sample_size=60, distribution=chi


Design: 7/8;   Replications: 1000;   RAM Used: 67.5 Mb;   Total Time: 11m 13.40s 
 Conditions: sample_size=120, distribution=chi


Design: 8/8;   Replications: 1000;   RAM Used: 67.6 Mb;   Total Time: 13m 9.46s 
 Conditions: sample_size=240, distribution=chi

Simulation complete. Total execution time: 15m 6.33s
Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
running command '"quarto"
TMPDIR=C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/file1154453c42e6 -V' had
status 1
# A tibble: 8 × 26
  sample_size distribution bias.mean_no_trim bias.mean_trim.1 bias.mean_trim.2
        <dbl> <chr>                    <dbl>            <dbl>            <dbl>
1          30 norm               -0.012206        -0.012205        -0.010733  
2          60 norm               -0.00066471      -0.00058635      -0.00049449
3         120 norm               -0.00045496      -0.0013078       -0.0018907 
4         240 norm               -0.0033323       -0.0030002       -0.0034330 
5          30 chi                 0.0084255       -0.30383         -0.44208   
6          60 chi                -0.0032308       -0.33978         -0.48129   
7         120 chi                -0.0061516       -0.35092         -0.49200   
8         240 chi                -0.0055454       -0.35250         -0.49620   
# ℹ 21 more variables: bias.median <dbl>, bias.mean_weak_prior <dbl>,
#   bias.mean_strong_prior <dbl>, RMSE.mean_no_trim <dbl>,
#   RMSE.mean_trim.1 <dbl>, RMSE.mean_trim.2 <dbl>, RMSE.median <dbl>,
#   RMSE.mean_weak_prior <dbl>, RMSE.mean_strong_prior <dbl>,
#   RE.mean_no_trim <dbl>, RE.mean_trim.1 <dbl>, RE.mean_trim.2 <dbl>,
#   RE.median <dbl>, RE.mean_weak_prior <dbl>, RE.mean_strong_prior <dbl>,
#   REPLICATIONS <dbl>, SIM_TIME <chr>, RAM_USED <chr>, SEED <int>, …
# render output better
knitr::kable(res)
sample_size distribution bias.mean_no_trim bias.mean_trim.1 bias.mean_trim.2 bias.median bias.mean_weak_prior bias.mean_strong_prior RMSE.mean_no_trim RMSE.mean_trim.1 RMSE.mean_trim.2 RMSE.median RMSE.mean_weak_prior RMSE.mean_strong_prior RE.mean_no_trim RE.mean_trim.1 RE.mean_trim.2 RE.median RE.mean_weak_prior RE.mean_strong_prior REPLICATIONS SIM_TIME RAM_USED SEED COMPLETED ERRORS
30 norm -0.012 -0.012 -0.011 -0.013 -0.014 -0.040 0.177 0.182 0.189 0.218 0.177 0.180 1 1.05 1.13 1.52 1.001 1.031 1000 113 67.3 Mb 1.94e+09 Thu Apr 17 16:24:52 2025 54
60 norm -0.001 -0.001 0.000 -0.001 -0.001 -0.014 0.127 0.131 0.136 0.160 0.127 0.128 1 1.05 1.14 1.57 0.999 1.005 1000 113 67.3 Mb 1.34e+09 Thu Apr 17 16:26:45 2025 45
120 norm 0.000 -0.001 -0.002 -0.002 -0.001 -0.007 0.095 0.097 0.101 0.116 0.095 0.095 1 1.05 1.13 1.50 1.000 1.002 1000 109 67.4 Mb 6.70e+08 Thu Apr 17 16:28:34 2025 43
240 norm -0.003 -0.003 -0.003 -0.005 -0.003 -0.006 0.068 0.069 0.071 0.081 0.068 0.068 1 1.03 1.09 1.42 1.000 1.006 1000 113 67.4 Mb 1.23e+09 Thu Apr 17 16:30:27 2025 38
30 chi 0.008 -0.304 -0.442 -0.587 0.002 -0.152 0.433 0.516 0.612 0.746 0.430 0.401 1 1.42 2.00 2.97 0.987 0.860 1000 112 67.5 Mb 1.52e+09 Thu Apr 17 16:32:20 2025 42
60 chi -0.003 -0.340 -0.481 -0.617 -0.007 -0.082 0.316 0.452 0.567 0.707 0.315 0.304 1 2.05 3.23 5.01 0.992 0.925 1000 114 67.5 Mb 1.13e+09 Thu Apr 17 16:34:13 2025 36
120 chi -0.006 -0.351 -0.492 -0.628 -0.008 -0.044 0.219 0.406 0.533 0.670 0.219 0.215 1 3.45 5.95 9.39 1.004 0.965 1000 116 67.6 Mb 1.89e+09 Thu Apr 17 16:36:09 2025 42
240 chi -0.006 -0.353 -0.496 -0.638 -0.006 -0.024 0.152 0.381 0.518 0.659 0.152 0.151 1 6.27 11.57 18.74 1.002 0.989 1000 117 67.6 Mb 1.30e+09 Thu Apr 17 16:38:06 2025 34

Notes about parallel processing

Much like the philosophy of constructing simulation code that can be executed using multiple cores in the package, it is generally better to let SimDesign perform any necessary parallel processing distribution of the workload. Hence, while Stan supports parallel computations and the execution of multiple chains in parallel, it is generally advisable to execute each instance of Stan code on a single core so that within a given simulation replication there will not be any competition for available processor resources.

Notes on convergence of MCMC

The above example uses Stan for Markov Chain Monte Carlo (MCMC) sampling via the Hamiltonian method. While convergence is likely not an issue for this particular example, it is possible that convergence issues will appear in more complex models. One way to ensure the resulting replications have converged is to give an error when some convergence diagnostic statistic, such as \(\hat R\), is below a recommended threshold. In the stan_fit() function above, an error is thrown whenever \(\hat R\) > 1.01 (see this paper). Another possibility is to call Stan again with more iterations if convergence failed.