Coverage estimates (original) (raw)
Coverage relates to how well a proposed confidence interval captures the population parameters. For instance, given some desired nominal error rate, say, \(\alpha = .05\), does the confidence interval mean based on a t-test capture the population mean across 95% of the time across random samples? Intervals that are too narrow (i.e., less than the nominal coverage rate) are generally termed ‘liberal’, while those that are too wide (i.e., greater than the nominal coverage rate) are considered ‘conservative’. In the example below we evaluate whether the z or t-distribution is a suitable candidate to capture the population mean across different sample sizes.
Define the functions
library(SimDesign)
#SimFunctions(comments = FALSE)
# z-test confidence intervals function
z.CI <- function(dat, alpha = .05){
xbar <- mean(dat)
SE <- sd(dat) / sqrt(length(dat))
z <- abs(qnorm(alpha/2))
CI <- c(xbar - z * SE, xbar + z * SE)
CI
}
### Define design conditions. Here we want a finer range of values for plotting
Design <- createDesign(mu = c(0, 5, 10),
sample_size = c(10, 20, 30))
#---------------------------------------------------------------------
Generate <- function(condition, fixed_objects = NULL) {
ret <- with(condition, rnorm(sample_size, mu))
ret
}
Analyse <- function(condition, dat, fixed_objects = NULL) {
t_test <- t.test(dat)
t_CI <- t_test$conf.int
z_CI <- z.CI(dat)
ret <- c(t_CI = ECR(t_CI, parameter = condition$mu),
z_CI = ECR(z_CI, parameter = condition$mu))
ret
}
Summarise <- function(condition, results, fixed_objects = NULL) {
ret <- colMeans(results)
ret
}
#---------------------------------------------------------------------
### Run the simulation
res <- runSimulation(Design, replications=10000, verbose=FALSE, parallel = TRUE,
generate=Generate, analyse=Analyse, summarise=Summarise)
# A tibble: 9 × 8
mu sample_size t_CI z_CI REPLICATIONS SIM_TIME SEED COMPLETED
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 0 10 0.9531 0.9221 10000 0.52s 488349520 Thu Apr 17 2…
2 5 10 0.9466 0.9169 10000 0.34s 469344150 Thu Apr 17 2…
3 10 10 0.9503 0.9173 10000 0.37s 1550099436 Thu Apr 17 2…
4 0 20 0.9481 0.9325 10000 0.45s 40778633 Thu Apr 17 2…
5 5 20 0.9527 0.9373 10000 0.37s 713327526 Thu Apr 17 2…
6 10 20 0.9523 0.9375 10000 0.40s 831490948 Thu Apr 17 2…
7 0 30 0.9494 0.9408 10000 0.47s 1214377063 Thu Apr 17 2…
8 5 30 0.9476 0.9381 10000 0.36s 1255528026 Thu Apr 17 2…
9 10 30 0.9488 0.9392 10000 0.39s 1229276239 Thu Apr 17 2…
From these results it’s clear to see that the t-distribution gives very good coverage rates across all conditions. However, the z-distribution CI is too liberal in the smaller sample sizes, and gives an interval that does not contain the population mean as often as desired. Around a sample size of 30 we can see that the two distributions behave reasonably well.
Note that the ECR()
function was used within the Analyse()
function rather than in Summarise()
. When only the three values are supplied (CIs + population value) then ECR()
will return either a 1 if the population value was within the CI or 0 if it was not. Hence, in the Summarise()
function all that is required to obtain the coverage estimate is to find the mean of these 1’s and 0’s. The next section demonstrates how to use ECR()
in the Summarise()
instead, and it’s really a matter of preference which style you prefer.
An Alternative Form
The above simulation called the ECR()
function within analyse
, however it is entirely possible to use this function within the summarise
block by simply supplying matrices of confidence intervals instead.
library(SimDesign)
#SimFunctions(comments = FALSE)
### Define design conditions. Here we want a finer range of values for plotting
Design <- createDesign(mu = c(0, 5, 10),
sample_size = c(10, 20, 30))
z.CI <- function(dat, alpha = .05){
xbar <- mean(dat)
SE <- sd(dat) / sqrt(length(dat))
z <- abs(qnorm(alpha/2))
CI <- c(xbar - z * SE, xbar + z * SE)
CI
}
#-----------------------------------------------------------------
Generate <- function(condition, fixed_objects = NULL) {
ret <- with(condition, rnorm(sample_size, mu))
ret
}
Analyse <- function(condition, dat, fixed_objects = NULL) {
t_test <- t.test(dat)
t_CI <- t_test$conf.int
z_CI <- z.CI(dat)
ret <- c(t_CI = t_CI, z_CI = z_CI)
ret
}
Summarise <- function(condition, results, fixed_objects = NULL) {
ret <- ECR(results, parameter = condition$mu,
names = c('t_CI', 'z_CI'))
ret
}
#-----------------------------------------------------------------
### Run the simulation
res <- runSimulation(Design, replications=10000, verbose=FALSE, parallel = FALSE,
generate=Generate, analyse=Analyse, summarise=Summarise)
Many secondary function in SimDesign
have this type of flexibility, therefore it’s generally useful to refer to the documentation to help think about easier ways to summarise your results.