Identification of 64 Novel Genetic Loci Provides... : Circulation Research (original) (raw)

Coronary artery disease (CAD) is the predominant cause of ischemic heart disease often leading to myocardial infarction and a leading cause of death. Globally, deaths because of ischemic heart disease increased by 16.6% from 2005 to 2015 to 8.9 million deaths. However, the age-standardized mortality rates are decreasing (fell by 12.8%) because of preventive and treatment strategies established on evolving knowledge of the underlying pathophysiology of CAD.

Editorial, see p 391

In This Issue, see p 385

CAD is a complex disease, resulting from numerous additive and interacting contributions in an individual’s environment and lifestyle in combination with their underlying genetic architecture. Since the first genome-wide association studies (GWAS) for CAD in 2007, multiple additional studies with progressively larger sample sizes identified 97 genome-wide significant genetic loci associated with CAD at the time of analysis. The continuous effort to identify additional loci associated with CAD and share these early with the scientific community is important, especially to enhance our understanding of the biological underpinnings of CAD and to catalyze the development of drugs. A comprehensive understanding of the genetic architecture of CAD is also essential to enable precision medicine approaches by identifying subgroups of patients at increased risk of CAD or its complications and might identify those with a specific driving pathophysiology in whom a particular therapeutic or preventive approach would be most useful.

To further our knowledge of the genetic architecture of CAD, we performed a de novo GWAS of the UK Biobank resource and meta-analyses with CARDIoGRAMplusC4D data. Our approach led to the identification of 64 novel loci associated with CAD, expanding the grand total to 161. These loci were interrogated using bioinformatic approaches to catalog and interpret the potential biological relevance of our findings. We also performed network and gene-set analyses and proposed the omnigenic model to explain our findings. This expanding resource is now available for other investigators to help to further elucidate the underlying biology and relevance.

Methods

The data that support the findings of this study are available from the corresponding author on reasonable request. The de novo GWAS analysis and meta-analysis have been posted on Mendeley (doi:10.17632/2zdd47c94h.1; doi:10.17632/gbbsrpx6bs.1). A summary of the methods is provided below, and a more detailed description of the experimental procedures is provided in the Online Data Supplement.

Study Design and Samples

The study design consisted of a reciprocal 2-stage sequential discovery and replication approach (Online Figure I) providing the most robust statistical evidence followed by an overall meta-analysis of all available data for which currently no replication data were available in this study. First, using the UK Biobank resource, we conducted a GWAS to discover single-nucleotide polymorphisms (SNPs) associated with CAD. In stage 2, we took forward all promising SNPs reaching nominal significance (P<0.0001) for replication in CARDIoGRAMplusC4D data. Replicating SNPs (P<0.05 after Bonferroni adjustment) were meta-analyzed and considered true when surpassing the genome-wide significance threshold (P<5×10−8). The reciprocal stage 1 entailed the identification for all promising SNPs (P<0.0001) in CARDIoGRAMplusC4D and replication in UK Biobank (P<0.05 after Bonferroni adjustment) followed by meta-analysis. Again, SNPs replicating and surpassing the genome-wide significance threshold were considered true. A sentinel SNP in a locus was defined as the most significant variant in a 1-mb region that was independent from other sentinel SNPs (r2<0.1). A locus was defined as a region of 1 mb at either side of the sentinel SNP. A locus was considered novel if the sentinel SNP was not within a 1-mb window (at either side) of earlier reported genome-wide significant SNPs (Online Table I). Finally, we performed a genome-wide meta-analysis of the UK Biobank resource and CARDIoGRAMplusC4D to identify additional CAD-associated loci (P<5×10−8 in meta-analysis). A potential sample overlap between the UK Biobank and cohorts of CARDIoGRAMplusC4D was estimated to be <0.1%; no evidence was found that this biased the test statistics (Online Data Supplement).

Candidate Genes and Insights in Biology

Candidate causal genes at each of the loci were prioritized based on proximity, expression quantitative trait locus (eQTL) data, DEPICT analyses (Data-Driven Expression-Prioritized Integration for Complex Traits), and long-range chromatin interactions of variants with gene promoters (Online Data Supplement)., Summary information of genes was obtained via queries in GeneCards, EntrezGene, UniProt, and Tocris. The Mouse Genomic Informatics database was used for obtaining insights into mammalian phenotypes associated with disruption of candidate genes. DEPICT was also used to test for enrichment of gene sets and identify relevant tissues and cell types. Ingenuity pathway analysis (June 2017 release) was performed to strengthen the biological relevancy of the novel loci.

Insights in Loci by Associations With Other Phenotypes

The GWAS catalog was queried and a phenome scan was performed by intersecting the identified loci with the GWAS catalog and by testing the association of the newly identified SNPs with a wide range of phenotypes using linear or logistic regression analysis in UK Biobank (Online Data Supplement). Genetic risk scores (GRS) were constructed using effect estimates obtained from the CARDIoGRAMplusC4D data as described previously. Multivariable Cox proportional hazards models were fitted for quintiles of the GRS in the UK Biobank resource, to assess the extent to which the GRS could predict new-onset atrial fibrillation/flutter and heart failure.

Regulatory DNA and Fine Mapping of Probable Causal Variants

To systematically characterize the functional, cellular, and regulatory contribution of genetic variation, we used GARFIELD, analyzing the enrichment of genome-wide association summary statistics in tissue-specific functional elements at given significance thresholds. Probabilistic Annotation Integrator was used to fine-map loci by integrating genetic association signal strength with genomic functional annotation data. We explored the potential target genes of these candidate causal variants by determining their direct effects on protein function (missense variants) and evidence connecting the causal variant in an untranslated region (Utr)-3′ region to gene expression (eQTL) or physical interactions (Hi-C) with the promotor of an eQTL gene. Determination of potential causal mechanisms of the potential causal variants based on (1) missense variation, (2) chromatin interaction between the causal variant and the promotor of a gene for which the causal variant was also significantly associated with gene expression by eQTL analyses, or (3) Utr3′ overlapping variants that were also significantly associated with gene expression of the same gene corresponding to the Utr3′ position. In addition, for genes/mechanisms to be prioritized by eQTL analyses and chromatin interactions or Utr′3, the respective causal variant was required to be in an enhancer region.

Results

Genome-Wide Analyses of 34 541 Cases and 261 984 Controls

The stage 1 GWAS analysis in UK Biobank (34 541 cases and 261 984 controls; Online Table II) of 7 947 838 SNPs revealed 630 suggestive SNPs (P<0.0001) in 442 loci (Online Table III). Eighty-six independent SNPs in 75 loci both replicated (P<0.05 Bonferroni adjusted) in stage 2 in ≤88 192 cases and 162 544 controls of CARDIoGRAMplusC4D, and achieved genome-wide significance (P<5×10−8) with no evidence of heterogeneity of effects (_P_het≥0.10). Thirteen of the 75 loci are not established CAD-associated loci (Table 1).

Next, we reanalyzed the data from the MetaboChip meta-analysis of CARDIoGRAMplusC4D, the CARDIoGRAMplusC4D 1000 Genomes meta-analysis, and the CARDIoGRAM Exome array data to identify the promising SNPs (P<0.0001). We identified 568 promising SNPs located in 375 loci (Online Table IV). One hundred and thirteen independent SNPs in 96 loci both replicated (P<0.05 Bonferroni adjusted) in stage 2, UK Biobank, and achieved genome-wide significance in meta-analysis (P<5×10−8), including 21 additional novel loci (Table 1; Online Table V).

Finally, we performed a meta-analysis of CARDIoGRAMplusC4D and the CARDIoGRAMplusC4D 1000 Genomes meta-analysis with UK Biobank and identified 30 additional loci for which no replication test was available (Table 1; Online Table VI) increasing the total number of genome-wide significant CAD loci to 161 (Online Figure II). The novel variants were common (>5%, except for 1, rs112635299 near SERPINA1). Online Figure III shows the regional association plot of each novel locus. For some variants, a dominant or recessive linkage model appears to be a better fit compared with an additive model (Online Table VII). Complete summary statistics of all SNPs in UK Biobank and the UK Biobank CARDIoGRAMplusC4D meta-analysis are available as download on www.cardiomics.net.

Candidate Genes and Deeper Insights Into Biology

To disentangle whether associations were driven more by acute myocardial infarction as opposed to stable CAD, we performed multinomial logistic regression analyses for all genome-wide significant (P<5×10−8) loci in UK Biobank. In total, 17 666 of 34 541 CAD individuals were diagnosed with myocardial infarction. None of the novel loci and only 2 previously identified variants (rs9349379 and rs10947789) appear to be mainly driven by its association with myocardial infarction rather than stable CAD (false discovery rate [FDR], P<0.05; Online Table VIII).

We further explored the potential biology of the 64 novel CAD-associated loci by prioritizing 155 candidate causal genes in these loci: 69 genes were in proximity (the nearest gene and any additional gene within 10 kb) of the lead variant, 9 genes contained coding genetic variation in linkage disequilibrium (r2>0.8) with the lead variant (Online Table IX), 50 genes were selected based on eQTL analyses (Online Table X), 64 genes showed significant chromatin interactions (Hi-C) between the genetic variant and promoter of the gene (Online Table XI), and 60 genes were prioritized based on DEPICT analyses (Online Table XII). Of the 155 candidate genes, 63 were prioritized by multiple methods of identification, which may be used to prioritize candidate causal genes. A summary of the current function annotation of each novel candidate gene is provided in Online Table XIII, and knowledge on pharmacological compounds and nutrients influencing these genes is provided in Online Table XIV. Next, we performed a systematic search in the Mouse Genome Informatics database to identify the effect of mutations in orthologous genes for these candidate causal genes (details in Online Table XV). In brief, we identified 34 genes that expressed at least 1 cardiovascular system phenotype (AGT, ARHGAP42, BACH1, CALCRL, CASQ2, CCM2, CDC123, CDKN1A, FIGN, FOXC1, GIT1, GNPAT, HCRT, HSD17B12, MAP1S, MAP3K1, MSANTD1, NGF, NPHP3, PCIF1, PDS5B, PLCG1, PLEKHA1, PPP2R3A, PRDM16, PRKCE, RAC1, SEMA5A, SH3PXD2A, TFPI, TIPARP, TMEM106B, VEGFA, and ZFPM2) and 34 genes that affected other potentially plausible traits linked to CAD, including metabolic/lipid/adipose/weight abnormalities (AGT, CORO6, FIGN, GIT1, KAT2A, NGF, PPP2R3A, NPHH3, SH3PXD2A, TMEM106B, VEGFA, ZHX3, OPTN, FAM213A, DNAJC7, and COPRS), abnormalities in inflammation or white blood cells (DHX58, FHL3, HNRNPD, PLCG2, PRDM16, TFPI, VEGFA, ZNF335, PRKCE, MYO1G, RAC1, and ARID4A), and abnormalities in platelets or coagulation (FHL3, PLCG2, TFPI, VEGFA, DST, and KLF4).

Novel Insights From Pathway Analyses

Ingenuity pathway analysis restricted to the 155 candidate causal genes confirmed that these are enriched for effects on the cardiovascular system and cell cycle functions (Online Table XVI). Pathway insights provided by the DEPICT framework identified 1525 reconstituted gene sets that could be captured in 156 meta gene sets (Online Table XVII). The 4 most significant metasets were complete embryonic lethality during organogenesis, blood vessel development, anemia, and SRC PPI subnetwork. The platelet α-granule lumen, SRC PPI subnetwork, blood vessel development, and hemostasis had the largest betweenness centrality—an indicator of a node’s centrality in the network. The tissue enrichment analyses by DEPICT indicated blood vessels as the most relevant tissue (_P_=4×10−7); 41 additional tissues or cell types were significantly enriched at FDR<0.05 (Online Table XVIII). We compared the contribution of novel information with previous work. The previous CARDIoGRAMplusC4D analysis led to 457 reconstituted gene sets (at FDR<0.05); the addition of the intermediate data set UK Biobank of 150 k individuals identified a total of 889 significant gene sets, substantially less than the current 1525 gene sets (Figure 1; Online Table XVII). Considering all 10 968 possible gene sets, this study represents an increase from 4.16% to 13.90% of all gene sets involved in CAD since the 1000 Genomes analysis of CARDIoGRAMplusC4D in 2015. Genes implicated by DEPICT on the FDR<0.05 level are 94 in the previous data, which has increased to 540 genes.

Open multimedia modal

Figure 1Open multimedia modal

Network analyses of reconstituted gene sets. The total number of significant gene sets involved in coronary artery disease (CAD) increased to 13.90% since the 1000 Genome genome-wide association studies of CARDIoGRAMplusC4D, considering all possible gene sets. Clustering by modularity using Gephi software indicated that pathways specific for cardiovascular/heart development, inflammation, lipids, kidney and coagulation clustered together. PPI networks & Other indicates a remaining bin predominantly populated by protein–protein interaction networks.

Insights in Loci by Associations With Other Phenotypes

To increase our understanding of potentially mediating mechanisms at the genetic variant level, we searched the GWAS catalog for previously reported variants. Of the 64 novel loci, 23 loci were in linkage disequilibrium (r2>0.6) with genetic variants previously reported to be associated with other traits surpassing the genome-wide significant (P<5×10−8) threshold (Online Table XIX). We found associations with anthropometric measurements (rs6905288, rs1591805, rs3936511, and rs840616), antineutrophil antibody-associated vasculitis (rs112635299), angiotensinogen measurements (rs699), coffee consumption (rs13723), C-reactive protein (rs667920), pulmonary function (rs61848342, rs13723, and rs112635299), fibrinogen levels (rs67920, rs16844401, and rs2074158), glomerular filtration rate (rs12500824), high-density lipoprotein cholesterol (rs667920, rs10512861, and rs6905288), low-density lipoprotein cholesterol (rs10512861), total cholesterol (rs6997340), triglycerides (rs667920, rs3936511, rs6905288, and rs6997340), diabetes mellitus (rs1591805 and rs3936511), blood pressure indices (rs260020, rs17080091, rs61776719, rs7696431, and rs1317507), transferrin levels (rs6997340), QRS amplitude (rs13723), abdominal aortic aneurysm (rs885150 and rs3827066), adiponectin measurements (rs6905288), and age at menarche (rs1591805); full details can be found in Online Table XIX. We also explored the association of the 64 lead SNPs with a range of traits in UK Biobank resource. Consistent with the GWAS-catalog search and in keeping with earlier observations in established CAD loci, several of our novel loci were associated with hyperlipidemia, blood pressure traits, diabetes mellitus, and anthropometric traits (Figure 2). For example, rs6905288 (VEGFA) was also associated with waist-to-hip ratio and hyperlipidemia, and rs61776719 (FHL3 and UTP11L) was also closely associated with pulse pressure in UK Biobank. Interestingly, we observed that 15 of 64 loci were associated with platelet counts.

Open multimedia modal

Figure 2Open multimedia modal

Heatmap of associations in UK Biobank with novel loci. Heatmap of z scores for different diseases and phenotypes in UK Biobank, aligned to increased risk of coronary artery disease. Only significant associations (false discovery rate<0.01) are shown. The genetic risk score constructed with the known and novel loci, weighted using coefficients of CARDIoGRAMplusC4D, is highlighted by the red rectangle. BMI indicates body mass index; COPD, chronic obstructive pulmonary disease; RBC, red blood cell; and TIA, transient ischemic attack.

Genetic Risk for CAD, and Association With CAD Risk Factors and Outcome

To explore potential clinical relevance, we constructed a GRS, weighted for their effects in CARDIoGRAMplusC4D by multiplying the effect sizes with the number of effect variants of each variant in each individual, and divided this GRS into quintiles. The associations with many different traits and diseases from the UK Biobank are visualized in Figure 2. The risk of a future diagnosis of atrial fibrillation and heart failure in UK Biobank participants was higher in quantile 5 individuals as compared with quantile 1 (hazard ratio, 1.18 [95% confidence interval, 1.10–1.27; _P_=1.2×10−6] and 1.59 [95% confidence interval, 1.43–1.77; _P_=3.3×10−18], respectively; Online Figure IV). In addition, all-cause mortality and especially cardiovascular mortality was higher in individuals of quantile 5 compared with quantile 1 (hazard ratio, 1.12 [95% confidence interval, 1.06–1.19; _P_=4×10−4] and 1.94 [95% confidence interval, 1.70–2.21; _P_=2×10−23], respectively; Online Figure IV).

Role of Regulatory DNA and Fine Mapping of Candidate Causal Variants

Across the genome, virtually all tissues showed significant enrichment of DNase I hypersensitivity sites providing limited indications for involved biology (Figure 3A and 3B). Minimal differential enrichment of functional elements for the identified genetic loci was observed in blood vessels and liver. To facilitate future functional studies directed at causal variants and molecular mechanisms, we prioritized variants via the probabilistic framework of Probabilistic Annotation Integrator. Because no clear differential enrichment was observed for tissue-specific functional elements, we focused on DNA annotations from the study of Finucane et al that are not specific for tissue or cell types. Probabilistic Annotation Integrator determined the significance of each annotation to be causal (Figure 3C and 3D), and a model was constructed using linkage disequilibrium information, P value distribution, and information on coding variation, conservation and H3K4me1 sites to prioritize potential causal SNPs of all 161 (known and novel) loci. This analysis yielded 28 variants ≥95% confidence level for which we prioritized candidate genes (Online Table XX; Table 2).

Open multimedia modal

Figure 3Open multimedia modal

The role of regulatory DNA underlying coronary artery disease (CAD)-associated single-nucleotide polymorphisms (SNPs). Enrichment of genome-wide association analysis P values in Dnase I hypersensitive sites (DHS). CAD SNPs at different genome-wide association study (GWAS) threshold were significantly enriched in DHS footprints (A) and hot spots (B) across many different tissues and cell types. The fold enrichment was highly significant for most tissues and cell types (P<1×10−8) as indicated by the 4 colored circles next to the labels, 3 colored circles indicate P<1×10−7. Label sizes of tissue types were downsized because of space limitations; tissue types may be represented by multiple samples, indicated by hash marks of the same color. C, Subsequent prioritization of potential causal annotations underlying the 161 CAD loci also suggested that regions of DHS may be underlying the associations, but coding variants, conservation, 5′ untranslated region (UTR), and H3K4me1 annotations were more likely to be causal. D, Posterior probabilities for causality for each variant in the 164 CAD loci were calculated by an empirical Bayes approach implemented in the Probabilistic Annotation Integrator Framework, taking into account linkage disequilibrium (LD), association statistics, and the potentially causal annotations and summarized in Table 2 and Online Table XX. CTCF indicates transcriptional repressor CTCF; DGF, digital genomic footprint by Dnase1 hypersensitivity; FANTOM5, functional annotation of the mammalian genome V5; TFBS, transcription factor binding site; and TSS, transcription start site.

For example, rs974819 was prioritized as causal variant and could be linked to PDGFD by Hi-C evidence and eQTL data in relevant tissues (Online Figure V). In total, 15 of the 28 fine-mapped loci could be pinpointed to 1 single potential causal mechanism implicating a single gene. For 2 loci, there were 2 potential causal mechanisms (TRPC4AP/PROCR and MRPS6/SLC5A3) with equal evidence.

Discussion

The present study is the largest genetic association study of CAD performed to date. We report on the primary results and downstream bioinformatic analyses of the meta-analysis of de novo GWAS data derived from the UK Biobank combined with existing data from CARDIoGRAMplusC4D, leading to the inclusion of ≤122 733 cases and 424 528 controls. This study contributes to the existing literature by reporting 64 novel genetic loci representing 38% of all 161 GWAS-identified CAD loci to date. For the novel loci, a detailed catalog of 155 candidate genes (based on proximity, gene-expression data, coding variation, and physical chromatin interaction) is provided. We demonstrate that the increase in significantly associated CAD loci results in a large expansion of implicated reconstituted gene networks, from 4% to almost 14%. Finally, by integrating genetic association strength, linkage disequilibrium, and functional annotation data, we performed fine mapping of all 161 CAD loci, providing a novel credible list of causal variants and plausible genes to be prioritized for functional validation.

The 64 novel genetic loci reported in this single article are exceptionally large compared with previous articles, including those of CARDIoGRAMplusC4D and others reporting on 10 to 15 novel loci each. Thirty-four of the 64 loci are significant in a robust reciprocal replication strategy between CARDIoGRAMplusC4D and the UK Biobank, but another 30 are genome-wide significant in the overall meta-analysis as is commonly considered sufficient evidence., The obvious reason for the large number of novel loci is the considerable number of novel CAD cases and non-CAD controls compared with these earlier efforts combined with less heterogeneity in samples, collection, and definitions used. By increasing the sample size, more loci can be identified, more genes can be implicated, and more gene networks or pathways can be constructed. Not only is the increase of associated loci in the past decade rapidly outpacing functional validation, even understanding biological networks seems to insufficiently accommodate the increased amount of GWAS hits under the conceptual polygenetic model. This can be illustrated by the large increase of reconstituted gene networks observed in our study. For the first time, we show that almost 14% of all existing gene networks are involved in the complex CAD trait (Figure 1), and this will only increase when further samples are added to the GWAS study making it increasingly more difficult to consider these all to be key pathways. In our data, we also observed genetic association signals to be spread across most of the genome, and many of the novel 155 candidate genes do not have an obvious connection to CAD. In addition, virtually all cell types showed significant enrichment of DNase I hypersensitivity and other functional elements. These notions are all supportive of the omnigenic model, which has recently been proposed by the Pritchard team suggesting that prevailing conceptual models for complex diseases are incomplete. The omnigenic model hypothesizes that all gene-regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells can influence the function of core disease-related genes and a major proportion of heritability can be explained by effects of genes outside key pathways. To further our knowledge, it is questionable whether further increasing the GWAS sample size will resolve the outstanding issues concerning our incomplete understanding of cellular regulatory networks and our ability to differentiate core genes from peripheral genes. If the omnigenic model is indeed correct, detailed mapping of cell-specific regulatory networks will be essential to understand CAD.

To facilitate functional research based on our findings, we not only provided extensive bioinformatic analyses of coding variation, gene expression, and chromatin interactions for the 64 novel loci but also performed novel fine mapping and presented statistically convincing arguments for causal genetic variants at 28 loci, linking 19 genes in the 161 CAD loci. In the known loci, these genes included APOE, PCSK9, ANGPTL4, and SORT1, all implicated as core genes in lipid metabolism. Recently, PCSK9 has been validated in clinical trials, and functional studies are also supporting a key role for SORT1. More recently, EDN1 has indeed been identified as the likely causal gene in the pathogenesis of CAD instead of the nearby PHACTR. In the novel loci, we found evidence for causal variants linked to FNDC3B (Fibronectin Type III Domain Containing 3B), CCM2 (CCM2 Scaffolding Protein), and TRIM5 (Tripartite Motif Containing 5). Indeed, the functional link between these genes and CAD is not obvious and remains to be determined. FNDC3B has been suggested to function as a positive regulator of adipogenesis.CCM2 has been implicated in abnormal vascular morphogenesis in the brain, leading to cerebral cavernous malformations but is also expressed in the heart. Although its effect in the coronary arteries has not been investigated, Ccm2 knockdown in the mouse brain endothelial cells leads to increased monolayer permeability, decreased tubule formation, and reduced cell migration after wound healing.TRIM5 has been suggested to promote innate immune signaling, and its activity is amplified by retroviral infections. All SNP-gene mechanisms proposed in this article should be experimentally sought out. Also, the analyses were restricted to variants available in the Haplotype Reference Consortium imputation panel. Although this is the largest imputation panel to date, it only comprised SNPs; future fine-mapping efforts are necessary that include non-SNPs as well, such as indels, to cover the additional aspects of the human variation landscape. However, a 95% credible set that contains just 1 potential causal variant per locus provides a first starting point for generating new hypotheses and scientific explorations.

In our current work, we validated our previous finding that these genetic variants of CAD also predict the risk of atrial fibrillation, heart failure, and extended it to all-cause death. We also aimed to differentiate between stable CAD and acute myocardial infarction by performing multinomial logistic regression analyses. Most loci were not driven by 1 clinical presentation specifically. However, for 2 previously identified loci (rs9349379 [_EDN1_] and rs10947789 [_KCNK5_]), we found statistical evidence that these loci may be driven by acute myocardial infarction and not stable CAD. Also, for this observation, functional hypotheses are to be developed and tested. Our variants might be driven mainly by nonfatal CAD, and different variants might exist for fatal heart disease.

Some limitations of the current work are to be acknowledged. This work is based on statistical evidence and does not provide functional experimental validation. The genetic variants identified and the genes prioritized require further direct investigations in future studies to elucidate their role, and function, in the development and progression of CAD. However, in the short term, these data open up new possibilities to improve quantitative measures of genetic risk prediction. Recent data suggests that instead of operating in a deterministic fashion, high genetic risk is indeed modifiable by lifestyle, pharmacotherapy, and also by incorporation of genetic risk into shared decision-making sessions with patients.

In conclusion, our GWAS, meta-analyses, and bioinformatic analyses provide several novel insights into the biology of CAD. We report 64 novel loci, link 155 candidate genes, and performed fine mapping of all old and novel loci, providing a credible list of causal genetic variants. However, with the ever-increasing sample size, our work is the first to indicate that an omnigenic model may be more appropriate to accommodate the complex genetic architecture of CAD, compared with a polygenic model. In addition to an expanded view, it also suggests new methods and tools are required to further our understanding of CAD biology through genetics.

Acknowledgments

This research has been conducted using the UK Biobank resource under application number 12006 and 15031. We thank the CARDIoGRAMplusC4D investigators for making their data publicly available. We would like to thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high-performance computing cluster.

Sources of Funding

N. Verweij is supported by Marie Sklodowska-Curie GF (call: H2020-MSCA-IF-2014; project identifier: 661395) and an NWO VENI grant (016.186.125). We acknowledge the support from the Netherlands Cardiovascular Research Initiative—an initiative with support of the Dutch Heart Foundation, CVON2015-17 EARLY-SYNERGY.

Disclosures

None.

CAD: coronary artery disease

CCM2: CCM2 scaffolding protein

DEPICT: data-driven expression-prioritized integration for complex

eQTL: expression quantitative trait locus

FNDC3B: fibronectin type III domain containing 3B

GWAS: genome-wide association study

SNP: single-nucleotide polymorphism

TRIM5: tripartite motif containing 5

References

1. Wang H, Naghavi M, Allen C, Barber R, Bhutta ZA, Carter C, Casey C, Charlson F, Chen C, Coates M, Dandona H. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 2016;388:1459–1544
2. Samani NJ, Erdmann J, Hall AS, et alWTCCC and the Cardiogenics Consortium. Genomewide association analysis of coronary artery disease. N Engl J Med. 2007;357:443–453 doi: 10.1056/NEJMoa072366.
3. Helgadottir A, Thorleifsson G, Manolescu A, et al A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–1493 doi: 10.1126/science.1142842.
4. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–1491 doi: 10.1126/science.1142447.
5. Schunkert H, König IR, Kathiresan S, et alCardiogenics; CARDIoGRAM Consortium. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011;43:333–338 doi: 10.1038/ng.784.
6. Deloukas P, Kanoni S, Willenborg C, et alCARDIoGRAMplusC4D Consortium. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet. 2013;45:25–33 doi: 10.1038/ng.2480.
7. Nikpay M, Goel A, Won HH, et al A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47:1121–1130 doi: 10.1038/ng.3396.
8. Verweij N, Eppinga RN, Hagemeijer Y, van der Harst P. Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure. Sci Rep. 2017;7:2761 doi: 10.1038/s41598-017-03062-8.
9. Howson JMM, Zhao W, Barnes DR, et alCARDIoGRAMplusC4D; EPIC-CVD. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat Genet. 2017;49:1113–1119 doi: 10.1038/ng.3874.
10. Nelson CP, Goel A, Butterworth AS, et alEPIC-CVD Consortium; CARDIoGRAMplusC4D; UK Biobank CardioMetabolic Consortium CHD Working Group. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet. 2017;49:1385–1391 doi: 10.1038/ng.3913.
11. Khera AV, Kathiresan S. Genetics of coronary artery disease: discovery, biology and clinical translation. Nat Rev Genet. 2017;18:331–344 doi: 10.1038/nrg.2016.160.
12. Pers TH, Karjalainen JM, Chan Y, et alGenetic Investigation of ANthropometric Traits (GIANT) Consortium. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun. 2015;6:5890 doi: 10.1038/ncomms6890.
13. van der Harst P, van Setten J, Verweij N, et al 52 genetic loci influencing myocardial mass. J Am Coll Cardiol. 2016;68:1435–1448 doi: 10.1016/j.jacc.2016.07.729.
14. Iotchkova V, Huang J, Morris JA, et alUK10K Consortium. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat Genet. 2016;48:1303–1312 doi: 10.1038/ng.3668.
15. Kichaev G, Yang WY, Lindstrom S, Hormozdiari F, Eskin E, Price AL, Kraft P, Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10:e1004722 doi: 10.1371/journal.pgen.1004722.
16. Stitziel NO, Stirrups KE, Masca NG, et alMyocardial Infarction Genetics and CARDIoGRAM Exome Consortia Investigators. Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary disease. N Engl J Med. 2016;374:1134–1144 doi: 10.1056/NEJMoa1507652.
17. Finucane HK, Bulik-Sullivan B, Gusev A, et alReproGen Consortium; Schizophrenia Working Group of the Psychiatric Genomics Consortium; RACI Consortium. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228–1235 doi: 10.1038/ng.3404.
18. Klarin D, Zhu QM, Emdin CA, et alCARDIoGRAMplusC4D Consortium. Genetic analysis in UK biobank links insulin resistance and transendothelial migration pathways to coronary artery disease. Nat Genet. 2017;49:1392–1397 doi: 10.1038/ng.3914.
19. Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186 doi: 10.1016/j.cell.2017.05.038.
20. Ridker PM, Revkin J, Amarenco P, et alSPIRE Cardiovascular Outcome Investigators. Cardiovascular efficacy and safety of bococizumab in high-risk patients. N Engl J Med. 2017;376:1527–1539 doi: 10.1056/NEJMoa1701488.
21. Musunuru K, Strong A, Frank-Kamenetsky M, et al From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature. 2010;466:714–719 doi: 10.1038/nature09266.
22. Gupta RM, Hadaya J, Trehan A, et al A genetic variant associated with five vascular diseases is a distal regulator of endothelin-1 gene expression. Cell. 2017;170:522.e15–533.e15 doi: 10.1016/j.cell.2017.06.049.
23. Kishimoto K, Kato A, Osada S, Nishizuka M, Imagawa M. Fad104, a positive regulator of adipogenesis, negatively regulates osteoblast differentiation. Biochem Biophys Res Commun. 2010;397:187–191 doi: 10.1016/j.bbrc.2010.05.077.
24. Liquori CL, Berg MJ, Siegel AM, et al Mutations in a gene encoding a novel protein containing a phosphotyrosine-binding domain cause type 2 cerebral cavernous malformations. Am J Hum Genet. 2003;73:1459–1464 doi: 10.1086/380314.
25. Crose LE, Hilder TL, Sciaky N, Johnson GL. Cerebral cavernous malformation 2 protein promotes smad ubiquitin regulatory factor 1-mediated RhoA degradation in endothelial cells. J Biol Chem. 2009;284:13301–13305 doi: 10.1074/jbc.C900009200.
26. Pertel T, Hausmann S, Morger D, et al TRIM5 is an innate immune sensor for the retrovirus capsid lattice. Nature. 2011;472:361–365 doi: 10.1038/nature09976.
27. Khera AV, Emdin CA, Drake I, et al Genetic risk, adherence to a healthy lifestyle, and coronary disease. N Engl J Med. 2016;375:2349–2358 doi: 10.1056/NEJMoa1605086.
28. Mega JL, Stitziel NO, Smith JG, et al Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet. 2015;385:2264–2271 doi: 10.1016/S0140-6736(14)61730-X.
29. Kullo IJ, Jouni H, Austin EE, Brown SA, Kruisselbrink TM, Isseh IN, Haddad RA, Marroush TS, Shameer K, Olson JE, Broeckel U, Green RC, Schaid DJ, Montori VM, Bailey KR. Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low-density lipoprotein cholesterol levels (the MI-GENES clinical trial). Circulation. 2016;133:1181–1188 doi: 10.1161/CIRCULATIONAHA.115.020109.

Keywords :

computational biology; coronary artery disease; genetics; genome-wide association study; sample size