Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set - PubMed (original) (raw)
Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set
Masahiro Kanai et al. J Hum Genet. 2016 Oct.
Abstract
To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10-8, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were Psig=3.24 × 10-8 (AFR), 9.26 × 10-8 (EUR), 1.83 × 10-7 (AMR), 1.61 × 10-7 (EAS) and 9.46 × 10-8 (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (ΔAFR), which yielded Psig=3.25 × 10-8 (ALL) and 4.20 × 10-8 (ΔAFR). Our results indicate that the current threshold (P=5.0 × 10-8) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.
Figures
Figure 1
The −log10 _P_min distributions for five ancestral populations and meta-analysis results. We conducted GWAS simulations using the 1000 Genomes Phase 3 data set and measured the minimum _P_-value of the variants (_P_min). Each panel represents a population/meta-analysis result. Each vertical bar in the panel represents the top five percentile of −log10 _P_min (that is, the estimated empirical genome-wide significance −log10 _P_sig). The dotted vertical bar represents the common genome-wide significance threshold of 5.0 × 10−8. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian; ALL, meta-analysis across all ancestral populations; ΔAFR, meta-analysis including all ancestral populations except for AFR (that is, EUR, AMR, EAS and SAS).
Figure 2
The relationship between −log10 _P_LD and −log10 _P_sig. We calculated the LD-based genome-wide significance _P_LD based on the effective number of independent variants, which was estimated by applying LD pruning with a maximum _r_2 threshold of 0.5. Whereas −log10 _P_sig showed approximately positive correlation with −log10 _P_LD for AFR, EUR, EAS and SAS (blue), AMR (red) is an outlier. The error bars represent the 95% CI for −log10 _P_sig. The dotted lines represent the common genome-wide significance threshold of _P_=5.0 × 10−8. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian.
References
- Ioannidis, J. P. A. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007). - PubMed
- Martin, L. J., Woo, J. G., Avery, C. L., Chen, H.-S. & North, K. E. Multiple testing in the genomics era: Findings from Genetic Analysis Workshop 15, Group 15. Genet. Epidemiol. 31, S124–S131 (2007). - PubMed
- Bonferroni, C. E. Teoria statistica delle classi e calcolo delle probabilità. Pubbl. R. Ist. Super. Sci. Econ. Commer. Firenze 8, 3–62 (1936).
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources