Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data - PubMed (original) (raw)
. 2013 Dec;31(12):1102-10.
doi: 10.1038/nbt.2749.
Lisa Bastarache, Marylyn D Ritchie, Robert J Carroll, Raquel Zink, Jonathan D Mosley, Julie R Field, Jill M Pulley, Andrea H Ramirez, Erica Bowton, Melissa A Basford, David S Carrell, Peggy L Peissig, Abel N Kho, Jennifer A Pacheco, Luke V Rasmussen, David R Crosslin, Paul K Crane, Jyotishman Pathak, Suzette J Bielinski, Sarah A Pendergrass, Hua Xu, Lucia A Hindorff, Rongling Li, Teri A Manolio, Christopher G Chute, Rex L Chisholm, Eric B Larson, Gail P Jarvik, Murray H Brilliant, Catherine A McCarty, Iftikhar J Kullo, Jonathan L Haines, Dana C Crawford, Daniel R Masys, Dan M Roden
- PMID: 24270849
- PMCID: PMC3969265
- DOI: 10.1038/nbt.2749
Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data
Joshua C Denny et al. Nat Biotechnol. 2013 Dec.
Abstract
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻⁶ (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Conflict of interest statement
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Figures
Figure 1
PheWAS replication of NHGRI Catalog SNP-phenotype associations. (a) Each point represents the −log10(P) of a single SNP-phenotype association tested with PheWAS. This study is restricted to SNP-phenotype associations that achieved genome-wide significance (P ≤ 5 × 10−8) in at least one prior GWAS study that included individuals of European ancestry. Numbers in parentheses beside each phenotype represent the sample size within the PheWAS data set. The vertical blue line represents P = 0.05. Binary traits refer to all adequately powered, binary traits in the NHGRI Catalog with exact matches to a PheWAS phenotype. For example, 5/5 catalog SNPs associated with rheumatoid arthritis were replicated at P < 0.05 in PheWAS, and 9/15 SNPs associated with type 2 diabetes were replicated. Continuous traits are those numerically defined traits in the NHGRI Catalog that are related to PheWAS diseases (e.g., “iron deficiency anemia” was the PheWAS trait paired with the “serum iron level” catalog trait). (b) Replication rates of SNP-phenotype associations at different bins of statistical power. Association count refers to the number of SNP-phenotype associations replicated or not replicated at each bin of statistical power (e.g., all tested associations with power <0.1, power 0.1–0.2). The black line represents a linear regression weighted using the number of associations in each bin (y = 0.64×, _r_2 = 0.96). (c) Replication rate of NHGRI Catalog associations by number of unique publications citing the original SNP-phenotype association. Association count refers to the number of SNP-phenotype associations (among either adequately powered binary or continuous traits) with the corresponding number of publications. (d) Replication rate of NHGRI Catalog associations by discovery _P_-value. The dashed line indicates P = 5 × 10−8.
Figure 2
GWAS and PheWAS associations in the genome. Each diamond represents a unique phenotype association at each SNP. Red diamonds represent associations in the NHGRI Catalog only (including phenotypes not present in the PheWAS catalog), green diamonds represent NHGRI Catalog associations replicated by PheWAS (P < 0.05), and blue diamonds represent new phenotype associations identified by PheWAS (P < 4.6 × 10−6, or a FDR < 0.1). Numbers to the right and left indicate chromosomes.
Figure 3
PheWAS plots for four SNPs. Each panel represents 1,358 phenotypes tested for association with a particular SNP, using logistic regression assuming an additive genetic model adjusted for age, sex, study site and the first three principal components. Phenotypes are grouped along the x axis by categorization within the PheWAS code hierarchy. The upper red lines indicate P = 4.6 × 10−6 (FDR = 0.1 for entire PheWAS); lower blue lines indicate P = 0.05; dashed lines are a single-SNP Bonferroni correction (P = 0.05/1,358). Diamonds encircling phenotype circles represent known NHGRI Catalog associations. (a) PheWAS associations for rs12203592 in IRF4, previously associated with hair and eye color, freckling and progressive supranuclear palsy. (b) PheWAS associations for rs2853676 in TERT, previously associated with glioma. (c) PheWAS associations for rs4977574 near CDKN2BAS at chr9p21, previously associated with myocardial infarction, and in LD with carotid stenosis. (d) PheWAS associations for rs660895 near HLA-DRB1, previously associated with rheumatoid arthritis. Results and plots for all SNPs included in the present study are available at
.
Figure 4
Risk variants for skin phenotypes have different pleiotropy patterns. Association odds ratios are graphed on the x axis and _P_-values (numbers next to the bars) are from the PheWAS analysis for that SNP. All SNPs use the minor allele as the coded allele, except rs2853676 (TERT). Darker colored bars represent significant associations, calculated as P = 0.05 divided by the number of associations displayed, or 0.05/(6 phenotypes*6 SNPs) = 1.4 × 10−3. Tests for heterogeneity revealed significant heterogeneity among the six phenotypes (_I_2 = 59–94%, all P < 0.05) and among the six SNPs (_I_2 = 23–83%, all P < 0.05). Bars oriented leftward toward “protect” represent SNPs in which the coded allele favors decreased prevalence of disease, and bars oriented rightward toward “risk” represent coded alleles favoring increased prevalence of disease.
Comment in
- Mining the ultimate phenome repository.
Shah NH. Shah NH. Nat Biotechnol. 2013 Dec;31(12):1095-7. doi: 10.1038/nbt.2757. Nat Biotechnol. 2013. PMID: 24316646 Free PMC article. - Disease genetics: phenome-wide association studies go large.
Flintoft L. Flintoft L. Nat Rev Genet. 2014 Jan;15(1):2. doi: 10.1038/nrg3637. Epub 2013 Dec 10. Nat Rev Genet. 2014. PMID: 24322724 No abstract available. - Opportunities for drug repositioning from phenome-wide association studies.
Rastegar-Mojarad M, Ye Z, Kolesar JM, Hebbring SJ, Lin SM. Rastegar-Mojarad M, et al. Nat Biotechnol. 2015 Apr;33(4):342-5. doi: 10.1038/nbt.3183. Nat Biotechnol. 2015. PMID: 25850054 No abstract available.
References
- Helgadottir A, et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–1493. - PubMed
- Helgadottir A, et al. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet. 2008;40:217–224. - PubMed
- Lees CW, Barrett JC, Parkes M, Satsangi J. New IBD genetics: common pathways with other diseases. Gut. 2011;60:1739–1753. - PubMed
Publication types
MeSH terms
Grants and funding
- U01-HG004424/HG/NHGRI NIH HHS/United States
- U01-HG004610/HG/NHGRI NIH HHS/United States
- U01-HG006389/HG/NHGRI NIH HHS/United States
- U01-HG006385/HG/NHGRI NIH HHS/United States
- U01 HG004603/HG/NHGRI NIH HHS/United States
- U01 HG004424/HG/NHGRI NIH HHS/United States
- RC2-GM092318/GM/NIGMS NIH HHS/United States
- U01 HG004608/HG/NHGRI NIH HHS/United States
- U01-HG004603/HG/NHGRI NIH HHS/United States
- U01-HG004599/HG/NHGRI NIH HHS/United States
- U01 HG006385/HG/NHGRI NIH HHS/United States
- UL1 TR000445/TR/NCATS NIH HHS/United States
- U01-HG006379/HG/NHGRI NIH HHS/United States
- U01 HG006375/HG/NHGRI NIH HHS/United States
- U01 HG004438/HG/NHGRI NIH HHS/United States
- UL1 TR000150/TR/NCATS NIH HHS/United States
- U01-HG006388/HG/NHGRI NIH HHS/United States
- UL1 TR000427/TR/NCATS NIH HHS/United States
- U01-HG006375/HG/NHGRI NIH HHS/United States
- T32 GM007569/GM/NIGMS NIH HHS/United States
- U01-HG004608/HG/NHGRI NIH HHS/United States
- T15 LM007450/LM/NLM NIH HHS/United States
- R01-LM010685/LM/NLM NIH HHS/United States
- 2 UL1 TR000445/TR/NCATS NIH HHS/United States
- R01 LM010685/LM/NLM NIH HHS/United States
- U01 HG004609/HG/NHGRI NIH HHS/United States
- U01 HG006389/HG/NHGRI NIH HHS/United States
- U01 HG004599/HG/NHGRI NIH HHS/United States
- 16FTF30130005/AHA/American Heart Association-American Stroke Association/United States
- P30 CA060553/CA/NCI NIH HHS/United States
- UL1TR000427/TR/NCATS NIH HHS/United States
- U01 HG006388/HG/NHGRI NIH HHS/United States
- U01-HG006378/HG/NHGRI NIH HHS/United States
- U01 HG006378/HG/NHGRI NIH HHS/United States
- U01-HG004609/HG/NHGRI NIH HHS/United States
- R01 GM105688/GM/NIGMS NIH HHS/United States
- U01 HG004610/HG/NHGRI NIH HHS/United States
- UL1 RR024975/RR/NCRR NIH HHS/United States
- U01 HG006379/HG/NHGRI NIH HHS/United States
- U01-HG004438/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical