Holdout validation and K-fold cross-validation of Stan programs with the loo package (original) (raw)

We now loop over the 10 folds. In each fold we do the following. First, we fit the model to all the observations except the ones belonging to the left-out fold. Second, we compute the log pointwise predictive densities for the left-out fold. Last, we store the predictive density for the observations of the left-out fold in a matrix. The output of this loop is a matrix of the log pointwise predictive densities of all the observations.

# Prepare a matrix with the number of post-warmup iterations by number of observations:
log_pd_kfold <- matrix(nrow = 4000, ncol = nrow(roaches))
# Loop over the folds
for(k in 1:10){
  data_train <- list(y = roaches$y[roaches$fold != k],
                   x = as.matrix(roaches[roaches$fold != k,
                                         c("roach1", "treatment", "senior")]),
                   N = nrow(roaches[roaches$fold != k,]),
                   K = 3,
                   offset = roaches$offset[roaches$fold != k],
                   beta_prior_scale = 2.5,
                   alpha_prior_scale = 5.0
                   )
  data_test <- list(y = roaches$y[roaches$fold == k],
                   x = as.matrix(roaches[roaches$fold == k,
                                         c("roach1", "treatment", "senior")]),
                   N = nrow(roaches[roaches$fold == k,]),
                   K = 3,
                   offset = roaches$offset[roaches$fold == k],
                   beta_prior_scale = 2.5,
                   alpha_prior_scale = 5.0
                   )
  fit <- sampling(stanmodel, data = data_train, seed = seed, refresh = 0)
  gen_test <- gqs(stanmodel, draws = as.matrix(fit), data= data_test)
  log_pd_kfold[, roaches$fold == k] <- extract_log_lik(gen_test)
}