Statistical power and significance testing in large-scale genetic studies (original) (raw)
Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1925). Google Scholar
Neyman, J. & Pearson, E. S. On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond. A231, 289–337 (1933). Article Google Scholar
Nickerson, R. S. Null hypothesis significance testing: a review of an old and continuing controversy. Psychol. Methods5, 241–301 (2000). ArticleCASPubMed Google Scholar
Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet.7, 781–791 (2006). ArticleCASPubMed Google Scholar
Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nature Rev. Genet.10, 681–690 (2009). This is a highly readable account of Bayesian approaches for the analysis of genetic association studies. ArticleCASPubMed Google Scholar
Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med.4, 45–61 (2002). ArticleCASPubMed Google Scholar
Ioannidis, J. P. A. Genetic associations: false or true? Trends Mol. Med.9, 135–138 (2003). ArticlePubMed Google Scholar
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet.9, 356–369 (2008). ArticleCASPubMed Google Scholar
The International HapMap Consortium. A haplotype map of the human genome. Nature437, 1299–1320 (2005).
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet.6, 95–108 (2005). ArticleCASPubMed Google Scholar
Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet.6, 109–118 (2005). ArticleCASPubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575 (2007). ArticleCASPubMedPubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet.38, 904–909 (2006). ArticleCASPubMed Google Scholar
Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol.32, 381–385 (2008). ArticlePubMed Google Scholar
Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol.32, 227–234 (2008). ArticlePubMedPubMed Central Google Scholar
Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol.32, 179–185 (2008). ArticlePubMed Google Scholar
Voight, B. F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet.8, e1002793 (2012). ArticleCASPubMedPubMed Central Google Scholar
Juran, B. D. et al. Immunochip analyses identify a novel risk locus for primary biliary cirrhosis at 13q14, multiple independent associations at four established risk loci and epistasis between 1p31 and 7q32 risk variants. Hum. Mol. Genet.21, 5209–5221 (2012). ArticleCASPubMedPubMed Central Google Scholar
Duggal, P., Gillanders, E. M., Holmes, T. N. & Bailey-Wilson, J. E. Establishing an adjusted _p_-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics9, 516 (2008). ArticlePubMedPubMed CentralCAS Google Scholar
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet.74, 765–769 (2004). ArticleCASPubMedPubMed Central Google Scholar
Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol.33, 559–568 (2009). ArticlePubMed Google Scholar
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity95, 221–227 (2005). ArticleCASPubMed Google Scholar
Moskvina, V. & Schmidt, K. M. On multiple-testing correction in genome-wide association studies. Genet. Epidemiol.32, 567–573 (2008). ArticlePubMed Google Scholar
Li, M. X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective number of independent tests and significant _p_-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet.131, 747–756 (2012). ArticleCASPubMed Google Scholar
North, B. V., Curtis, D. & Sham, P. C. A note on the calculation of empirical P values from Monte Carlo procedures. Am. J. Hum. Genet.71, 439–441 (2002). ArticleCASPubMedPubMed Central Google Scholar
North, B. V., Curtis, D. & Sham, P. C. A note on calculation of empirical P values from Monte Carlo procedure. Am. J. Hum. Genet.72, 498–499 (2003). ArticleCASPubMedPubMed Central Google Scholar
Dudbridge, F. & Koeleman, B. P. C. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet.75, 424–435 (2004). ArticleCASPubMedPubMed Central Google Scholar
Seaman, S. R. & Müller-Myhsok, B. Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am. J. Hum. Genet.76, 399–408 (2005). ArticleCASPubMedPubMed Central Google Scholar
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst.96, 434–442 (2004). ArticlePubMedPubMed Central Google Scholar
Panagiotou, O. A., Ioannidis, J. P. & Genome-Wide Significance Project. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int. J. Epidemiol.41, 273–286 (2011). ArticlePubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol.57, 289–300 (1995). Google Scholar
Wakefield, J. Bayes factors for genome-wide association studies: comparison with _P_-values. Genet. Epidemiol.33, 79–86 (2009). ArticlePubMed Google Scholar
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet.90, 7–24 (2012). This paper summarizes and interprets GWAS findings on common diseases and quantitative traits. ArticleCASPubMedPubMed Central Google Scholar
Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics19, 149–150 (2003). ArticleCASPubMed Google Scholar
Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology19, 640–648 (2008). ArticlePubMed Google Scholar
Zhong, H. & Prentice, R. L. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics9, 621–634 (2008). ArticlePubMedPubMed Central Google Scholar
Ghosh, A., Zou, F. & Wright, F. A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet.82, 1064–1074 (2008). ArticleCASPubMedPubMed Central Google Scholar
Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet.80, 605–615 (2007). ArticleCASPubMedPubMed Central Google Scholar
Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet.66, 1616–1630 (2000). ArticleCASPubMedPubMed Central Google Scholar
Pirinen, M., Donnelly, P. & Spencer, C. C. A. Including known covariates can reduce power to detect genetic effects in case–control studies. Nature Genet.44, 848–851 (2012). ArticleCASPubMed Google Scholar
Li, Q., Zheng, G., Li, Z. & Yu, K. Efficient approximation of _P_-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann. Hum. Genet.72, 397–406 (2008). ArticlePubMed Google Scholar
González, J. R. et al. Maximizing association statistics over genetic models. Genet. Epidemiol.32, 246–254 (2008). ArticlePubMed Google Scholar
So, H.-C. & Sham, P. C. Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates. Behav. Genet.41, 768–775 (2011). ArticlePubMedPubMed Central Google Scholar
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet.12, 745–755 (2011). ArticleCASPubMed Google Scholar
Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nature Genet.44, 623–630 (2012). ArticleCASPubMed Google Scholar
Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet.80, 727–739 (2007). ArticleCASPubMedPubMed Central Google Scholar
Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science337, 100–104 (2012). ArticleCASPubMedPubMed Central Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature493, 216–220 (2013). ArticleCASPubMed Google Scholar
Li, B. & Leal, S. M. Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet.5, e1000481 (2009). ArticlePubMedPubMed CentralCAS Google Scholar
Liu, D. J. & Leal, S. M. Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am. J. Hum. Genet.87, 790–801 (2010). ArticleCASPubMedPubMed Central Google Scholar
Li, M. X., Gui, H. S., Kwan, J. S. H., Bao, S. Y. & Sham, P. C. A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res.40, e53 (2012). ArticleCASPubMedPubMed Central Google Scholar
Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genet.42, 790–793 (2010). ArticleCASPubMed Google Scholar
Zhi, D. & Chen, R. Statistical guidance for experimental design and data analysis of mutation detection in rare monogenic mendelian diseases by exome sequencing. PLoS ONE7, e31358 (2012). ArticleCASPubMedPubMed Central Google Scholar
Feng, B.-J., Tavtigian, S. V., Southey, M. C. & Goldgar, D. E. Design considerations for massively parallel sequencing studies of complex human disease. PLoS ONE6, e23221 (2011). ArticleCASPubMedPubMed Central Google Scholar
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet.83, 311–321 (2008). This is one of the first association tests for rare variants. ArticleCASPubMedPubMed Central Google Scholar
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet.5, e1000384 (2009). ArticlePubMedPubMed CentralCAS Google Scholar
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet.86, 982 (2010). ArticleCASPubMed Central Google Scholar
Lin, D.-Y. & Tang, Z.-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet.89, 354–367 (2011). ArticleCASPubMedPubMed Central Google Scholar
Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet.11, 773–785 (2010). ArticleCASPubMed Google Scholar
Stitziel, N. O., Kiezun, A. & Sunyaev, S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol.12, 227 (2011). ArticlePubMedPubMed Central Google Scholar
Basu, S. & Pan, W. Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol.35, 606–619 (2011). ArticlePubMedPubMed Central Google Scholar
Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. T. & Richards, J. B. The empirical power of rare variant association methods: results from Sanger sequencing in 1,998 individuals. PLoS Genet.8, e1002496 (2012). ArticleCASPubMedPubMed Central Google Scholar
Ladouceur, M., Zheng, H.-F., Greenwood, C. M. T. & Richards, J. B. Empirical power of very rare variants for common traits and disease: results from Sanger sequencing 1998 individuals. Eur. J. Hum. Genet.21, 1027–1030 (2013). ArticlePubMedPubMed Central Google Scholar
Saad, M., Pierre, A. S., Bohossian, N., Macé, M. & Martinez, M. Comparative study of statistical methods for detecting association with rare variants in exome-resequencing data. BMC Proc.5, S33 (2011). ArticlePubMedPubMed Central Google Scholar
Wu, Michael, C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet.89, 82–93 (2011). This is the original paper that describes the SKAT for rare-variant association. ArticleCASPubMedPubMed Central Google Scholar
Liu, L. et al. Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet.9, e1003443 (2013). ArticleCASPubMedPubMed Central Google Scholar
Zuk, O. et al. Searching for missing heritability: Designing rare variant association studies. Proc. Natl Acad. Sci. USA111, E455–E464 (2013). This paper presents a framework for power calculation and ways to improve power for rare-variant studies. ArticleCAS Google Scholar
Li, D., Lewinger, J. P., Gauderman, W. J., Murcray, C. E. & Conti, D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet. Epidemiol.35, 790–799 (2011). ArticlePubMedPubMed Central Google Scholar
Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J. A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science324, 387–389 (2009). ArticleCASPubMedPubMed Central Google Scholar
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D. & Lin, X. Family-based association tests for sequence data, and comparisons with population-based association tests. Eur. J. Hum. Genet.21, 1158–1162 (2013). ArticlePubMedPubMed Central Google Scholar
Lim, Elaine, T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron77, 235–242 (2013). ArticleCASPubMedPubMed Central Google Scholar
Longmate, J. A., Larson, G. P., Krontiris, T. G. & Sommer, S. S. Three ways of combining genotyping and resequencing in case–control association studies. PLoS ONE5, e14318 (2010). ArticleCASPubMedPubMed Central Google Scholar
Aschard, H. et al. Combining effects from rare and common genetic variants in an exome-wide association study of sequence data. BMC Proc.5, S44 (2011). ArticlePubMedPubMed Central Google Scholar
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet.9, e1003671 (2013). ArticleCASPubMedPubMed Central Google Scholar
Ye, K. Q. & Engelman, C. D. Detecting multiple causal rare variants in exome sequence data. Genet. Epidemiol.35, S18–S21 (2011). ArticlePubMedPubMed Central Google Scholar
Li, B., Wang, G. & Leal, S. M. SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits. Bioinformatics28, 2703–2704 (2012). ArticleCASPubMedPubMed Central Google Scholar
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nature Genet.44, 243–246 (2012). ArticleCASPubMed Google Scholar
Lee, S., Teslovich, Tanya, M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet.93, 42–53 (2013). ArticleCASPubMedPubMed Central Google Scholar
Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am. J. Hum. Genet.93, 236–248 (2013). References 83 and 84 propose powerful and convenient score tests for meta-analyses of rare-variant association studies. ArticleCASPubMedPubMed Central Google Scholar
Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics13, 762–775 (2012). This paper describes the SKAT power calculation tool. ArticlePubMedPubMed Central Google Scholar
Rees, E. et al. Analysis of copy number variations at 15 schizophrenia-associated loci. Br. J. Psychiatry204, 108–114 (2013). ArticlePubMed Google Scholar
Patnaik, P. B. The power function of the test for the difference between two proportions in a 2 × 2 table. Biometrika35, 157 (1948). CASPubMed Google Scholar
Sidak, Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Statist. Associ.62, 626 (1967). Google Scholar
Davison, A. C. & Hinkley, D. V. Bootstrap Methods and Their Application (Cambridge Univ. Press, 1997). Book Google Scholar
Patnaik, P. B. The non-central χ2 - and F-distribution and their applications. Biometrika36, 202 (1949). CASPubMed Google Scholar
Whittaker, J. C. & Lewis, C. M. Power comparisons of the transmission/disequilibrium test and sib–transmission/disequilibrium-test statistics. Am. J. Hum. Genet.65, 578–580 (1999). ArticleCASPubMedPubMed Central Google Scholar
Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet.64, 259–267 (1999). ArticleCASPubMedPubMed Central Google Scholar
Kwan, J. S. H., Cherny, S. S., Kung, A. W. C. & Sham, P. C. Novel sib pair selection strategy increases power in quantitative association analysis. Behav. Genet.39, 571–579 (2009). ArticlePubMed Google Scholar
Luan, J. Sample size determination for studies of gene–environment interaction. Int. J. Epidemiol.30, 1035–1040 (2001). ArticleCASPubMed Google Scholar
Gauderman, W. J. Sample size requirements for association studies of gene–gene interaction. Am. J. Epidemiol.155, 478–484 (2002). ArticlePubMed Google Scholar
Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Statist. Med.21, 35–50 (2002). Article Google Scholar