pwrEWAS: a user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS) - PubMed (original) (raw)
pwrEWAS: a user-friendly tool for comprehensive power estimation for epigenome wide association studies (EWAS)
Stefan Graw et al. BMC Bioinformatics. 2019.
Abstract
Background: When designing an epigenome-wide association study (EWAS) to investigate the relationship between DNA methylation (DNAm) and some exposure(s) or phenotype(s), it is critically important to assess the sample size needed to detect a hypothesized difference with adequate statistical power. However, the complex and nuanced nature of DNAm data makes direct assessment of statistical power challenging. To circumvent these challenges and to address the outstanding need for a user-friendly interface for EWAS power evaluation, we have developed pwrEWAS.
Results: The current implementation of pwrEWAS accommodates power estimation for two-group comparisons of DNAm (e.g. case vs control, exposed vs non-exposed, etc.), where methylation assessment is carried out using the Illumina Human Methylation BeadChip technology. Power is calculated using a semi-parametric simulation-based approach in which DNAm data is randomly generated from beta-distributions using CpG-specific means and variances estimated from one of several different existing DNAm data sets, chosen to cover the most common tissue-types used in EWAS. In addition to specifying the tissue type to be used for DNAm profiling, users are required to specify the sample size, number of differentially methylated CpGs, effect size(s) (Δβ), target false discovery rate (FDR) and the number of simulated data sets, and have the option of selecting from several different statistical methods to perform differential methylation analyses. pwrEWAS reports the marginal power, marginal type I error rate, marginal FDR, and false discovery cost (FDC). Here, we demonstrate how pwrEWAS can be applied in practice using a hypothetical EWAS. In addition, we report its computational efficiency across a variety of user settings.
Conclusion: Both under- and overpowered studies unnecessarily deplete resources and even risk failure of a study. With pwrEWAS, we provide a user-friendly tool to help researchers circumvent these risks and to assist in the design and planning of EWAS.
Availability: The web interface is written in the R statistical programming language using Shiny (RStudio Inc., 2016) and is available at https://biostats-shinyr.kumc.edu/pwrEWAS/ . The R package for pwrEWAS is publicly available at GitHub ( https://github.com/stefangraw/pwrEWAS ).
Keywords: Bioconductor package; DNA methylation; Illumina human methylation BeadChip; Microarray data analysis; Sample size calculation; Statistical power.
Conflict of interest statement
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures
Fig. 1
Workflow for pwrEWAS. From an existing tissue-type-specific data set, J CpG-specific means and variances are estimated. Next, P CpGs are sampled with replacement from the collection of CpGs. For two groups, the mean of one group is changed by Δ β, while the mean of the other group remains unchanged. Δ β comes from a truncated normal distribution N(0, _τ_2). These parameters are then used to simulate _β_-values for the two groups. A CpG with an absolute difference in mean methylation greater than a predefined detection limit (default: 0.01) is considered as truly differentially methylated. Next, the simulated data set is used to test for differential, comparing the mean methylation signatures between the two groups. A CpG is defined as “detected” if its corresponding FDR is smaller than a predefined threshold (default: 0.05). Each CpG can fall into one of six categories described in Table 1. The marginal power is calculated as the proportion of True Positives among all truly differentially methylated CpGs
Fig. 2
pwrEWAS Shiny User-Interface. (1) User-specific inputs; (2) Advanced input settings to optimize run time; (3) Link to vignette for detailed description of inputs and outputs, instructions and an example including interpretations of the example results; (4) Power curve as a function of sample size by effect size (Δ_β_); (5) Estimated power average over simulation by sample size and effect size (Δ_β_); (6) Probability of detection at least one true positive; (7) Distribution of simulated differences in DNAm (Δ_β_) for different target Δ_β_ ’s; (8) Log of input parameter and run time
Fig. 3
Empirical assessment of the number of simulations. To assess the number of simulated data sets (number of simulations) required to obtain consistent results for power, pwrEWAS was run for a variety of number of simulations (5–100 simulations), each 100 times and each with the same remaining input parameters. a shows the distribution of power estimates for 100 runs within each of the assumed number of simulations. b visualizes the variance of power estimates for each of the assumed number of simulations. Given the relative stability of variance estimates beyond 50 simulations, 50 was selected as the default value for the number of simulations in pwrEWAS
References
MeSH terms
Grants and funding
- P20 GM103418/GM/NIGMS NIH HHS/United States
- P20 GM130423/GM/NIGMS NIH HHS/United States
- R01 GM103428/GM/NIGMS NIH HHS/United States
- P20GM103428/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Molecular Biology Databases