Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis (original) (raw)

. Author manuscript; available in PMC: 2011 Apr 21.

Published in final edited form as: Nat Genet. 2010 Jul;42(7):579–589. doi: 10.1038/ng.609

Abstract

By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combined_P_ < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.


Type 2 diabetes (T2D) is characterized by insulin resistance and deficient beta-cell function1. The escalating prevalence of T2D and the limitations of currently available preventative and therapeutic options highlight the need for a more complete understanding of T2D pathogenesis. To date, approximately 25 genome-wide significant common variant associations with T2D have been described, mostly through genome-wide association (GWA) analyses2-13. The identities of the variants and genes mediating the susceptibility effects at most of these signals have yet to be established, and the known variants account for less than 10% of the overall estimated genetic contribution to T2D predisposition. Although some of the unexplained heritability will reflect variants poorly captured by existing GWA platforms, we reasoned that an expanded meta-analysis of existing GWA data would offer augmented power to detect additional common variant signals of modest effect.

RESULTS

GWA meta-analysis and replication

We conducted a meta-analysis of eight T2D GWA studies comprising 8,130 T2D cases and 38,987 controls of European descent. We combined case-control data from the Wellcome Trust Case Control Consortium (WTCCC), Diabetes Genetics Initiative (DGI) and Finland-US Investigation of NIDDM genetics (FUSION) scans (the subjects of a previous joint analysis7), with those from scans performed by deCODE genetics6, the Diabetes Gene Discovery Group2, the Cooperative Health Research in the Region of Augsburg group (KORAgen), the Rotterdam study and the European Special Population Research Network (EUROSPAN). The effective sample size (n = 22,044) of stage 1 of the current (hereafter designated ‘DIAGRAM+’) meta-analysis was more than twice that of the earlier DIAGRAM (DIAbetes Genetics Replication and Meta-analysis) study7. After genomic control correction of each component study, we combined association data for 2,426,886 imputed and genotyped autosomal SNPs into a fixed-effects, additive-model meta-analysis using the inverse-variance method (Online Methods, Fig. 1, Supplementary Tables 1 and 2 and Supplementary Note). We observed only modest genomic control inflation (_λ_gc = 1.07), suggesting that the observed results were not due to population stratification. After removing SNPs within established T2D loci (Supplementary Table 3), the resulting quantile-quantile plot was consistent with a modest excess of disease associations of relatively small effect (Supplementary Note). Weak evidence for association at HLA variants strongly associated with autoimmune forms of diabetes (Supplementary Table 3 and Supplementary Note) suggested some case admixture involving subjects with type 1 diabetes or latent autoimmune diabetes of adult-hood; however, failure to detect T2D associations at other non-HLA type 1 diabetes susceptibility loci (for example, INS, PTPN22 and IL2RA) indicated that any such misclassification was too modest to drive stage 1 associations outside the HLA. The stage 1 meta-analysis also provided further confirmation of many previously reported signals and, at some of these, refinement of the peak association signal (Fig. 1, Supplementary Table 3 and Supplementary Note).

Figure 1.

Figure 1

Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).

We selected for stage 2 follow-up the most strongly associated SNP from each of the 23 new autosomal regions showing the most compelling evidence for association (all P < 10−5 in stage 1; Supplementary Table 3). We combined exclusively in silico data from three GWA samples (Atherosclerosis Risk in Communities (ARIC) study, Nurses’ Health Study and Framingham Heart Study) not included in the primary meta-analysis (2,832 cases and 15,843 controls) with additional (predominantly de novo) genotyping in up to 31,580 cases and 44,082 controls, for a maximum possible stage 2 sample size of 34,412 cases and 59,925 controls (effective sample size of 79,246), all of European descent (Supplementary Tables 1 and 2).

Stage 2 analyses indicated that the set of 23 signals was enriched for true association signals. In all, 21 showed directional consistency of effect between stage 1 and 2 (binomial test, P ~3.3 × 10−5), and for 15, the stage 2 P value was <0.05 (Supplementary Note). In joint analysis of stage 1 and 2 data (up to 42,542 cases and 98,912 controls), 13 autosomal loci exceeded the threshold for genome-wide significance (P ranging from 2.8 × 10−8 to 1.4 × 10−22) with allele-specific odds ratios (ORs) between 1.06 and 1.14 (Table 1 and Fig. 2). All signals remained close to or beyond genome-wide significance thresholds (the least significant P value was 5.2 × 10−8) when we repeated analyses after implementing a second (post meta-analysis) round of genomic control adjustment within stage 1 data (Supplementary Note).

Table 1.

Association results for stage 1 + 2 which exceed a genome-wide threshold (overall P value < 5 × 10−8)

Stage 1d Stage 2d Stage 1 + 2d
SNP Chr. Position B36(base pair) Riskalleleb Nonriskalleleb Frequencyrisk allele(HapmapCEU) Nearbygenec OR (95%CI) P value OR (95% CI) P value OR (95% CI) P value Powere
up to 8,130 casesand 38,987 controls up to 34,412 casesand 59,925 controlsf up to 42,542 casesand 98,912 controls
New T2D susceptibility loci
rs243021 2 60,438,323 A G 0.46 BCL11A 1.09(1.05–1.13) 8.1 × 10−6 1.08(1.06–1.10) 6.2 × 10−11 1.08(1.06–1.10) 2.9 × 10−15 0.60
rs4457053 5 76,460,705 G A 0.26 ZBED3 1.16(1.10–1.23) 4.2 × 10−8 1.07(1.04–1.10) 2.7 × 10−7 1.08(1.06–1.11) 2.8 × 10−12 0.25
rs972283 7 130,117,394 G A 0.55 KLF14 1.10(1.06–1.15) 1.8 × 10−6 1.06(1.03–1.09) 6.4 × 10−6 1.07(1.05–1.10) 2.2 × 10−10 0.19
rs896854 8 96,029,687 T C 0.48 TP53INP1 1.10(1.06–1.15) 1.2 × 10−6 1.05(1.03–1.08) 2.2 × 10−5 1.06(1.04–1.09) 9.9 × 10−10 0.08
rs13292136 9 81,141,948 C T 0.93 CHCHD9 1.20(1.11–1.29) 1.5 × 10−6 1.08(1.04–1.13) 2.4 × 10−4 1.11(1.07–1.15) 2.8 × 10−8 0.02
rs231362 11 2,648,047 G A 0.52 KCNQ1 1.11(1.06–1.16) 6.4 × 10−6 1.07(1.05–1.09) 3.2 × 10−9 1.08(1.06–1.10) 2.8 × 10−13 0.38
rs1552224 11 72,110,746 A C 0.88 CENTD2 1.13(1.07–1.19) 7.0 × 10−6 1.14(1.11–1.18) 3.2 × 10−18 1.14(1.11–1.17) 1.4 × 10−22 0.58
rs1531343a 12 64,461,161 C G 0.10 HMGA2 1.20(1.12–1.29) 1.7 × 10−7 1.08(1.04–1.12) 1.1 × 10−4 1.10(1.07–1.14) 3.6 × 10−9 0.07
rs7957197 12 119,945,069 T A 0.85 HNF1A 1.14(1.08–1.19) 4.6 × 10−7 1.05(1.02–1.09) 4.6 × 10−4 1.07(1.05–1.10) 2.4 × 10−8 0.01
rs11634397 15 78,219,277 G A 0.60 ZFAND6 1.11(1.06–1.16) 5.1 × 10−6 1.05(1.03–1.08) 1.2 × 10−5 1.06(1.04–1.08) 2.4 × 10−9 0.07
rs8042680 15 89,322,341 A C 0.22 PRC1 1.10(1.06–1.15) 8.2 × 10−6 1.06(1.03–1.08) 1.6 × 10−6 1.07(1.05–1.09) 2.4 × 10−10 0.09
rs5945326 X 152,553,116 A G 0.79 DUSP9 1.25(1.14–1.37) 2.3 × 10−6 1.32(1.16–1.49)f 2.3 × 10−5 1.27(1.18–1.37) 3.0 × 10−10 0.99f
Previously known
rs7578326 2 226,728,897 A G 0.64 IRS1 1.12(1.07–1.17) 8.7 × 10−7 1.10(1.08–1.13) 2.2 × 10−15 1.11(1.08–1.13) 5.4 × 10−20 0.83
rs1387153 11 92,313,476 T C 0.28 MTNR1B 1.12(1.07–1.17) 1.0 × 10−6 1.08(1.05–1.10) 4.4 × 10−10 1.09(1.06–1.11) 7.8 × 10−15 0.46

Figure 2.

Figure 2

Regional plots of the 12 newly discovered T2D loci. Genotyped and imputed SNPs passing quality control measures across all stage 1 studies are plotted with their meta-analysis P values (as −log10 values) as a function of genomic position (NCBI Build 36). In each panel, the index association SNP is represented by a diamond, with stage 1 meta-analysis results denoted by a red diamond and the combined stage 1 and stage 2 meta-analysis results denoted with a clear symbol. Estimated recombination rates (taken from HapMap CEU) are plotted to reflect the local LD structure. Color of remaining SNPs (circles) indicates LD with the index SNP according to a scale from _r_2 = 0 to _r_2 = 1 based on pairwise _r_2 values from HapMap CEU (red, _r_2 = 0.8–1.0; orange, _r_2 = 0.6–0.8; green, _r_2 = 0.4–0.6; blue, _r_2 = 0.2–0.4; black, _r_2 < 0.2; gray, no _r_2 value available). Gene annotations were taken from the University of California Santa Cruz genome browser.

We extended our search for susceptibility signals to the X chromosome, identifying one further signal in the stage 1 discovery samples meeting our criteria for follow-up (represented by rs5945326, near DUSP9, P = 2.3 × 10−6). This SNP showed strong evidence for replication in 8,535 cases and 12,326 controls (OR (allowing for X-inactivation) 1.32 (95% CI 1.16–1.49), P = 2.3 × 10−5), for a combined association P value of 3.0 × 10−10 (OR 1.27 (95% CI 1.18–1.37)) (Table 1 and Fig. 2).

Fourteen signals reaching genome-wide significance

Two of the 14 signals reaching genome-wide significance on joint analysis (those near MTNR1B and IRS1) represent loci for which T2D associations have been recently reported in samples which partially overlap with those studied here10,14-16 (Table 1).

A third signal (rs231362) on 11p15 overlaps both intron 11 of KCNQ1 and the KCNQ1OT1 transcript that controls regional imprinting17 and influences expression of nearby genes including CDKN1C, a known regulator of beta-cell development18. This signal maps ~150 kb from T2D-associated SNPs in the 3′ end of KCNQ1 first identified in East Asian GWA scans8,9. SNPs within the 3′ signal were also detected in the current DIAGRAM+ meta-analysis (for example, rs163184, P = 6.8 × 10−5), but they failed to meet the threshold for initiating replication. A SNP in the 3′ region (rs2237895) that was reported to reach genome-wide significance in Danish samples9 was neither typed nor imputed in the DIAGRAM+ studies. In our European-descent samples, rs231362 and SNPs in the 3′ signal were not correlated (_r_2 < 0.05), and conditional analyses (see below) establish these SNPs as independent (Fig. 2 and Supplementary Table 4). Further analysis in Icelandic samples has shown that both associations are restricted to the maternally transmitted allele11. Both T2D loci are independent of the common variant associations with electrocardiographic QT intervals that map at the 5′ end of KCNQ1 (_r_2 < 0.02, _D_′ < 0.35 in HapMap European CEU data)19,20 (Supplementary Table 5).

Of the remaining loci, two (near BCL11A and HNF1A) have been highlighted in previous studies7,21-23 but are now shown to reach genome-wide significance. Rare mutations in HNF1A account for a substantial proportion of cases of maturity onset diabetes of the young, and a population-specific variant (G319S) influences T2D risk in Oji-Cree Indians24. Confirmation of a common variant association at HNF1A brings to five the number of loci known to harbor both rare mutations causal for monogenic forms of diabetes and common variants predisposing to multifactorial diabetes, the others being PPARG, KCNJ11, WFS1 and HNF1B. A T2D association in the BCL11A region was suggested by the earlier DIAGRAM meta-analysis (rs10490072, P = 3 × 10−5), but replication was inconclusive7; there is only modest linkage disequilibrium (LD) between rs10490072 and the lead SNP from the present analysis (rs243021, _r_2 = 0.22, _D_′ = 0.73 in HapMap CEU).

The remaining nine signals map near the genes HMGA2, CENTD2, KLF14, PRC1, TP53INP1, ZBED3, ZFAND6, CHCHD9 and DUSP9 (Table 1 and Figs. 1 and 2) and represent new T2D risk loci uncovered by the DIAGRAM+ meta-analysis.

Understanding the genetic architecture of type 2 diabetes

Combining newly identified and previously reported loci and assuming a multiplicative model, the sibling relative risk attributable to the 32 T2D susceptibility variants described in this paper is ~1.14. With addition of the five T2D loci recently identified by the Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC) investigators12,13 and incorporation of estimates of parent-of-origin–specific effect sizes observed at the KCNQ1 and KLF14 signals and at a recently described locus on chromosome 11p15 (which confers substantial risk when paternally inherited but is protective when maternally transmitted11), this figure rises to ~1.16. Given estimates of sibling relative risk for T2D in Europeans of ~3 (ref. 25), variant discovery efforts to date have therefore explained only ~10% of observed familial clustering. We used available data to evaluate several mechanisms that might be contributing to that proportion of familiality which reflects residual, unexplained heritability26.

Copy number variants (CNVs)

We re-examined stage 1 data looking for associations with SNPs known to provide robust, high-LD tags for common CNVs in European populations. After combining four inventories of CNV-tagging SNPs that survey at least 40% of common CNVs genome-wide >1 kb in size, we found no convincing evidence that this class of variants contributes substantially to T2D risk (Supplementary Note).

Secondary signals revealed by conditional analysis

If there are additional independent susceptibility variants at the loci identified, total genetic variance attributable to these regions will be underestimated when based on the lead common variants alone. To explore the potential for independent secondary alleles, we repeated the stage 1 meta-analysis after simultaneously conditioning on 30 known and newly discovered autosomal loci (Supplementary Note). Using a cutoff of P < 1 × 10−4 (to reflect approximate adjustment for the number of independent SNPs in a ~2 Mb interval), we found preliminary evidence for secondary signals at five loci (TP53INP1, CDKN2A, HHEX-IDE and KCNJ11, in addition to that at KCNQ1; Fig. 1, Supplementary Fig. 1 and Supplementary Table 4). At CDKN2A, the secondary signal is consistent with evidence that haplotype-based analyses generate considerably stronger evidence for association than either signal alone3,27. Further fine-mapping efforts will be required to confirm the secondary signals at TP53INP1, HHEX-IDE and KCNJ11.

The conditional analysis also provided preliminary (P < 10−5, our stage 1 threshold) evidence for 19 signals outside known loci (Fig. 1 and Supplementary Table 6). The most notable signal (rs1481279, conditional P = 8.4 × 10−9) maps near NHEDC1 and corresponds to one of the signals of interest following stage 1 (rs7674212, P = 1.7 × 10−7 unconditioned). Failure to replicate that signal (P = 0.3 in 21,889 stage 2 cases and 39,568 controls) suggests this was a false positive (Supplementary Note). Several regions showed substantial incremental evidence for association in the conditional analysis as compared to unadjusted analyses and represent potential targets for large-scale replication and gene-gene interaction analyses. Indeed, one of these regions (at rs11708067 near ADCY5, unadjusted P = 1.7 × 10−4, conditional P = 2.2 × 10−6) has recently been shown, following initial identification through GWA analysis of continuous glucose measures, to have genome-wide significant associations with T2D in large-scale case-control analyses that involved several DIAGRAM+ samples12,13.

Etiological heterogeneity

To determine whether etiological heterogeneity might have compromised power to detect genuine T2D susceptibility signals, we performed BMI- and age-of-diagnosis (AOD)-stratified analyses within stage 1 data. We compared effect size estimates for all known T2D risk variants in 2,877 obese (defined as BMI > 30 kgm−2) and 4,048 nonobese (BMI ≤ 30 kgm−2) T2D cases when compared to similarly stratified controls (Supplementary Note). Although risk estimates for 23 of the 30 autosomal loci were numerically greater in the nonobese comparison than in the obese comparison (binomial test, P = 0.0018), only TCF7L2 (P < 0.001) and _BCL11A_ (_P_ = 0.02) showed significant (_P_ < 0.05) evidence for effect-size heterogeneity. For AOD, we compared risk-locus genotypes for 1,317 cases with AOD <45 years of age and 4,283 cases with AOD >45 years of age, as well as continuous analyses of AOD within all cases (n = 7,104; Supplementary Note), and found no strong evidence of differential effects. Although recognizing that BMI at examination and AOD are imperfect measures of BMI and age at disease onset, we conclude that a focus on more homogeneous subsamples would not have provided more efficient identification of known T2D susceptibility variants. Furthermore, these data argue against the potential for these common variant signals to afford clinically useful subclassification of individuals with T2D.

Overlap with GWA signals for other diseases

We noted that seven of the newly discovered autosomal loci (near BCL11A, ZBED3, KLF14, CHCHD9, HMGA2, HNF1A and PRC1) are characterized by strong (P < 10−6) associations with phenotypes other than T2D (Supplementary Table 5). In each case, these appear to be distinct and independent signals. For example, variants at the 3′ end of HMGA2 (~180 kb distant from the T2D signal) have widely replicated effects on adult height28 but are weakly correlated with the T2D-associated SNP rs1531343 (_r_2 < 0.01, _D_′ < 0.15 in HapMap CEU). The KLF14 region harbors distinct signals for both T2D and basal cell carcinoma29. At HNF1A, previous studies have reported a cluster of associations, with phenotypes including low-density lipoptrotein (LDL) cholesterol30 and circulating C-reactive protein levels31-33, mapping ~18–72 kb from the peak T2D signal. Though these two sets of HNF1A signals maintain appreciable LD in European samples (_r_2 ~0.1, _D_′ ~1), they are likely to be independent; the T2D association at the lead SNP for lipids (rs2650000) is far weaker than the association at rs7957197 (P = 0.003 compared to P = 4.6 × 10−7 in stage 1 samples), whereas LDL cholesterol shows a reciprocal pattern of association (P = 7 × 10−9 at rs2650000 compared to P = 0.73 at rs7957197 in the same lipid meta-analysis data30).

If we include the KCNQ1 associations described above, previous reports at JAZF1, CDKN2A and CDKAL1 (refs. 34-40) and other signals identified by systematic analysis of the National Human Genome Research Institute (NHGRI) GWA catalog41 (Supplementary Table 5 and Supplementary Note), at least 13 of 30 autosomal T2D loci show this pattern of closely approximated (within 500 kb) but distinct associations with traits other than T2D or related anthropometric and glycemic phenotypes. This is in addition to what appear to be coincident signals involving T2D susceptibility variants at IRS1 (associated with coronary disease), JAZF1 (associated with height) and HNF1B (associated with prostate cancer) (Supplementary Table 5). Simulations conducted using the NHGRI catalog as a reference set indicate that the number of non-T2D signals observed at T2D loci significantly exceeds expectation (P ~1.6 × 10−3 for non-T2D signals within 500 kb of T2D loci, P ~7.0 × 10−5 (n = 8) for non-T2D signals within 100 kb of T2D loci). Many of these instances of colocalization may represent variants within different regulatory domains that result in tissue- and disease-specific effects mediated through the same genes and pathways.

Understanding the biology of T2D-susceptibility loci

This analysis takes the number of independent loci showing genome-wide significant associations with T2D beyond 35. For some, such as those at KCNJ11 and SLC30A8, the molecular mechanisms responsible for the susceptibility effect can be assigned with some confidence42. At others, the identities of the causal variants, the genes through which they act and the pathophysiological processes which they influence remain obscure. We used several approaches designed to link DIAGRAM+ and previously reported T2D association signals to biological insights relevant to T2D pathogenesis.

Physiological analyses

Variants at FTO are known to influence T2D predisposition through an effect on BMI. In ~21,000 population sample individuals from the GWA meta-analysis of adult BMI completed by the Genetic Investigation of ANthropmetric Traits (GIANT) consortium43, no other autosomal T2D susceptibility locus had the property that the T2D risk allele was significantly associated with higher BMI (Supplementary Note). FTO is therefore the only one of the known T2D signals driven by a strong primary causal association with obesity.

We also examined the effect of T2D susceptibility alleles on continuous glycemic measures in up to 46,186 nondiabetic subjects from the MAGIC meta-analysis12,13. Coefficients for association between the T2D risk allele and higher fasting glucose were positive for 28 of the 31 loci, and 17 of these T2D loci showed significant (P < 0.05) directionally consistent associations with fasting glucose (Fig. 3 and Supplementary Note). However, the magnitudes of effect sizes for fasting glucose and T2D were only weakly correlated (Supplementary Fig. 2 and Supplementary Note), indicating that the mechanisms influencing normal glucose homeostasis and those responsible for the development of T2D are not entirely congruent. T2D risk alleles at four loci (at PPARG, FTO, IRS1 and KLF14) were associated (P < 0.05) with higher fasting insulin, consistent with a primary effect on insulin action, whereas at three other loci (at TCF7L2, CENTD2 and CDKAL1), the association with reduced fasting insulin indicates beta-cell dysfunction (Fig. 3). Indices of beta-cell function (HOMA-B) and insulin sensitivity (HOMA-IR) derived from paired fasting glucose and insulin measures from ~37,000 individuals supported these mechanistic inferences (Fig. 3). In all, risk alleles at ten loci (the previously reported loci at MTNR1B, SLC30A8, THADA, TCF7L2, KCNQ1, CAMK1D, CDKAL1, IGF2BP2 and HNF1B and the newly discovered locus at CENTD2) were associated (P < 0.05) with reduced beta-cell function, and three loci (previously reported loci at PPARG and FTO and the newly discovered locus at KLF14) were associated with reduced insulin sensitivity. The associations with improved insulin sensitivity evident for risk alleles at TCF7L2, IGF2BP2 and CDKAL1 probably reflect truncated ascertainment, as the MAGIC analyses were restricted to nondiabetic individuals. For the previously reported loci, these findings are broadly consistent with those from more detailed physiological studies6,8,44 and suggest that, of the newly discovered loci, the risk alleles at CENTD2 modify T2D susceptibility through a detrimental effect on beta-cell function. In contrast, the risk alleles at KLF14 and possibly HMGA2 (ref. 45), along with those at PPARG, IRS1 (ref. 10) and ADAMTS9 (ref. 46), appear to have a primary effect on insulin action which is not driven by obesity, unlike the alleles at FTO. The MAGIC meta-analysis did not extend to the X chromosome, but analysis of rs5945326, near DUSP9, in a sub-set of MAGIC samples (n = 14,644–21,118), revealed no significant (P < 0.05) associations with any fasting glycemic trait. For this signal, as with the other newly identified loci, more detailed phenotypic analyses will be required to determine how these impact T2D risk. Overall, these data are consistent with the impression that common T2D risk alleles more often act through beta-cell dysfunction12,44, but they provide further examples of T2D risk variants that exert their primary effects on insulin action.

Figure 3.

Figure 3

Plots of fasting blood glucose, insulin and derived indices for the established and new T2D loci. (a,b) Plots of fasting glucose (x axis) and fasting insulin (y axis). (c,d) Plots of HOMA-B (an index of beta cell function; x axis) and HOMA-IR (an index of insulin sensitivity; y axis). Each point refers to a single T2D association signal, with colors denoting the strength of the association to either the _x_-axis variable (left-hand of each pair of plots) or _y_-axis variable (right-hand of each pair) (red, P < 10−3; orange, 10−3 < _P_ < 10−2; yellow, 0.01 < _P_ < 0.05; green, 0.05 < _P_ < 0.20; blue, _P_ > 0.20). The two KCNQ1 associations are distinguished by the notation KCNQ1 for rs163184 and KCNQ1* for rs231362. The gene names associated with each signal have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality.

Expression analyses

We used expression data to seek clues to the genes mediating the T2D susceptibility effects we had detected. First, we examined expression-QTL (eQTL) data (in 23,720 transcripts) for subcutaneous adipose tissue (n = 603 with GWA data) and blood (n = 745) samples typed with the Illumina 300K chip47 (Table 2 and Supplementary Note). Among the newly identified loci, the most compelling signal was at rs972283, strongly associated with expression of KLF14 in adipose tissue and correlated (_r_2 = 0.3 in HapMap CEU) with the SNP (rs738134) with the strongest KLF14 cis expression signal. Both the T2D and cis eQTL associations at this locus showed strong parent-of-origin effects11. At the TP53INP1 locus, the _cis_-eQTL data suggest the T2D susceptibility effect is mediated via altered CCNE2 expression. In contrast, the significant _cis_-eQTL associations at the ZBED3, CENTD2, HNF1A and PRC1 T2D susceptibility signals are likely to be misleading, as the patterns of conditional association indicate that the T2D association and cis eQTL signals are not coincident. At previously reported T2D association signals, we found strong overlap with cis eQTL effects for IRS1 (consistent with data on IRS1 protein expression and function in skeletal muscle10), JAZF1 and CAMK1D7.

Table 2.

Expression QTL results for T2D-associated variants in blood and adipose tissue

SNP with strongest correlation with traite
SNP Chr. PositionB36 (bp) Nearbygenea Riskalleleb Gene (transcript) Tissue Effect (s.e.m.)c P value P adj d SNP (r2)f P value P adj g
Novel loci reported in this study
rs4457053 5 76,460,705 ZBED3 G PDE8B (NM_003719) Adipose 0.302 (0.070) 2.8 × 10−5 0.80 rs6864250(0.18) 3.1 × 10−17 5.8 × 10−13
ZBED3 (NM_032367) Adipose 0.429 (0.068) 1.0 × 10−9 0.011 rs4704389(0.20) 3.9 × 10−16 6.0 × 10−9
rs972283 7 130,117,394 KLF14 G KLF14 (NM_138693) Adipose −0.387 (0.058) 8.1 × 10−11 0.058 rs738134(0.30) 2.2 × 10−12 0.0014
rs896854 8 96,029,687 TP53INP1 T CCNE2 (NM_057749) Blood −0.225 (0.053) 3.8 × 10−5 0.78 rs4735339(0.61) 5.8 × 10−7 0.0051
rs1552224 11 72,110,746 CENTD2 A STARD10 (NM_006645) Blood 0.337 (0.066) 8.6 × 10−7 0.026 rs519790(0.04) 2.7 × 10−24 1.6 × 10−19
rs7957197 12 119,945,069 HNF1A T ACADS (NM_000017) Adipose −0.248 (0.067) 3.7 × 10−4 0.29 rs9204(0.02) 1.3 × 10−53 5.9 × 10−50
PSMD9 (NM_002813) Blood 0.240 (0.065) 3.9 × 10−4 0.0088 rs3741593(0.00) 8.3 × 10−8 1.7 × 10−6
OASL (NM_003733) Adipose 0.318 (0.068) 6.4 × 10−6 0.13 rs2259883(0.19) 1.1 × 10−7 0.0018
OASL (NM_003733) Blood 0.319 (0.064) 1.3 × 10−6 0.37 rs4556628(0.21) 4.4 × 10−22 1.4 × 10−16
COQ5 (NM_032314) Blood 0.248 (0.065) 2.1 × 10−4 0.92 rs10774561(0.02) 8.7 × 10−39 4.9 × 10−35
UNC119B (NM_032661) Blood −0.254 (0.064) 1.4 × 10−4 0.048 rs11065202(0.09) 7.8 × 10−12 2.3 × 10−9
CAMKK2 (NM_172215) Adipose −0.497 (0.068) 1.2 × 10−12 0.18 rs11065504(0.08) 2.7 × 10−117 3.8 × 10−98
CAMKK2 (NM_172215) Blood −0.360 (0.063) 3.4 × 10−8 0.68 rs11065504(0.08) 7.0 × 10−105 5.7 × 10−94
P2RX4 (NM_175568) Blood 0.312 (0.065) 3.4 × 10−6 2.0 × 10−6 rs25644(0.03) 3.4 × 10−17 1.9 × 10−17
rs8042680 15 89,322,341 PRC1 A VPS33B (NM_018668) Blood −0.371 (0.057) 2.9 × 10−10 0.50 rs12595616(0.57) 2.3 × 10−21 4.5 × 10−12
Previously reported loci
rs7578326 2 226,728,897 IRS1 A IRS1 (Contig50189_RC) Adipose −0.251 (0.059) 3.7 × 10−5 0.89 rs2943653(0.93) 3.4 × 10−5 0.69
IRS1 (NM_005544) Adipose −0.331 (0.059) 5.7 × 10−8 0.58 rs2176040(0.74) 7.8 × 10−10 0.0042
rs13081389 3 12,264,800 PPARG A IQSEC1 (NM_014869) Adipose −0.630 (0.131) 2.9 × 10−6 1.4 × 10−4 rs9211(0.01) 1.1 × 10−96 7.4 × 10−94
rs6795735 3 64,680,405 ADAMTS9 C BC040632 (AK022320) Adipose −0.229 (0.056) 7.6 × 10−5 0.28 rs4521216(0.02) 3.0 × 10−13 8.7 × 10−10
ADAMTS9 (NM_182920) Adipose −0.274 (0.055) 1.5 × 10−6 0.036 rs7372321(0.11) 1.1 × 10-9 2.3 × 10−5
rs849134 7 28,162,747 JAZF1 A JAZF1 (NM_175061) Adipose −0.346 (0.055) 1.4 × 10−9 0.70 rs1635852(1.00) 5.2 × 10−10 0.17
rs12779790 10 12,368,016 CAMK1D G CAMK1D (AL137430) Blood 0.668 (0.062) 1.6 × 10−25 0.052 rs11257655(0.83) 1.2 × 10−25 0.04
CDC123 (NM_006023) Blood 0.498 (0.064) 3.9 × 10−14 0.012 rs11257600(0.57) 1.4 × 10−16 4.9 × 10−5
CAMK1D (NM_020397) Blood 0.384 (0.064) 7.6 × 10−9 0.44 rs11257655(0.83) 3.3 × 10−9 0.13
CAMK1D (NM_153498) Blood 0.643 (0.062) 2.6 × 10−23 n/a rs12779790(1.00) 2.6 × 10−23 n/a
rs5015480 10 94,455,539 HHEX/IDE C KIF11 (NM_004523) Adipose 0.202 (0.059) 9.4 × 10−4 0.54 rs10882095(0.53) 1.6 × 10−4 0.059
KIF11 (NM_004523) Blood 0.185 (0.054) 9.7 × 10−4 0.62 rs6583826(0.43) 7.9 × 10−6 0.0025
MARCH5 (NM_017824) Adipose −0.262 (0.060) 2.5 × 10−5 0.31 rs10748579(0.04) 7.4 × 10−36 3.2 × 10−31

We also explored the tissue expression profiles of 27 autosomal genes mapping to the newly discovered regions of association and performed quantitative RT-PCR analyses across a panel of human tissues relevant to T2D pathogenesis (Supplementary Note). The broad expression of many of the transcripts, including 24 transcripts with evidence of beta-cell transcription (Supplementary Note), limited our ability to prioritize among candidate transcripts on the basis of static patterns of transcript expression.

Pathway and protein-protein interaction analyses

Reasoning that the additional T2D susceptibility loci would amplify our ability to identify over-represented molecular processes48, we deployed several complementary approaches to detect evidence of pathway or network enrichment (Supplementary Note). Using GRAIL49, we found that genes within T2D-associated regions showed evidence of increased connectivity within PubMed abstracts, though this largely reflects shared roles in monogenic or syndromic diabetes (involving HNF1A, HNF1B and WFS1). We also showed that the extent of protein-protein interaction between the products of genes mapping to the association signals substantially exceeded expectation (Supplementary Note). Pathway enrichment analyses using the PANTHER database50 uncovered some evidence of over-representation of signal transduction and protein metabolism and modification, and Reactome51 highlighted a separate set of pathways including metabolism of lipids and lipoproteins, endothelins and beta-arrestins (for details, see Supplementary Note).

The only consistent signal to emerge across multiple analyses involved cell-cycle regulation. Network analyses based on protein-protein interaction data detected (unadjusted P ~0.004) an 18-member subnetwork characterized by enhanced protein-protein interaction connectivity and highly enriched for genes implicated in cell cycle regulation (P = 2.8 × 10−7). A smaller (five, only partly overlapping genes) cell-cycle network independently emerged from the Reactome analyses, and gene-set enrichment analysis of selected candidate pathways52 also detected over-representation of association signals (P ~0.006) among cell-cycle genes (Supplementary Note). Because many genes within these networks are expressed in pancreatic islets and T2D-association effects at several of these loci are mediated primarily through beta-cell dysfunction44, these findings highlight the contribution of regulation of beta-cell mass to the long-term maintenance of normal glucose homeostasis.

In addition, these analyses highlighted notable biological connections between sets of genes within confirmed T2D-association regions. For example, HMGA2 emerges as a key transcriptional regulator of IGF2BP2 (refs. 53,54). However, because Hmga/Hmg1c knockout mice are deficient in adipocyte differentiation45, and the IGF2BP2 risk allele is associated with reduced beta-cell function55, further work is required to establish the relevance of this regulatory interaction to T2D pathogenesis. Our analyses also revealed that TLE4 (at the CHCHD9 locus) encodes a homolog of Groucho that forms complexes with TCF proteins, including TCF7L2, to modulate transcription at target sites56. Finally, FURIN, one of the genes mapping to the newly identified PRC1 locus, encodes a paired basic amino acid cleaving enzyme; both NOTCH2 and ADAMTS9 (ref. 7) are known targets of FURIN cleavage57-59.

Notably, these global approaches failed to provide any consistent support for many other mechanisms previously promoted on the basis of biochemical or physiological evidence as likely contributors to T2D pathogenesis1 (Supplementary Note). Overall, the relative paucity of signals from these analyses—particularly when contrasted with the compelling patterns of enrichment seen for other complex traits48—indicates, either that T2D pathogenesis is characterized by substantial etiological heterogeneity or that the processes critical to T2D development are poorly represented in existing pathway and interaction databases.

DISCUSSION

By increasing the discovery sample size, our study has substantially expanded the number of loci for which there is strong statistical evidence indicating a role in T2D predisposition. When combined with recent reports of additional T2D susceptibility loci arising from studies of continuous glycemic traits12,13 and parent-of-origin effects11, the number of confirmed loci for T2D currently stands at 38.

Although these discoveries represent new opportunities to explore the biology of T2D predisposition, the challenges inherent in translating these common variant association signals into biological mechanisms of disease causation are clear. Nevertheless, the analyses we report have generated several mechanistic hypotheses that can direct future efforts at functional evaluation and genetic refinement. At some loci, particularly those near HNF1A, HMGA2 and KLF14, existing biology, coupled with phenotypic and expression data presented here, highlight the named genes as prime candidates for mediating the susceptibility effect. For example, the T2D susceptibility effect near KLF14, which maps within an imprinted region on chromosome 7q32 and which, on the basis of the MAGIC meta-analysis data, appears to be driven by reduced insulin action, is restricted to the maternally transmitted allele11. As KLF14 is maternally expressed, and the eQTL association between rs972283 and KLF14 expression (see above) is similarly restricted to the maternal allele, KLF14 (a widely expressed, intronless member of the Krüppel-like family of transcription factors60) emerges as the main regional candidate. At the X-chromosome signal, evidence implicating DUSP9 (mitogen-activated protein kinase phosphatase-4) in the regulation of insulin action in mice gives DUSP9 particular salience as an association candidate61,62. However, as described above, failure to detect associations with continuous glycemic phenotypes (including fasting insulin and HOMA-IR) means that the functional connection with DUSP9 remains speculative.

In other regions, such as those near PRC1, TP53INP1 and CHCHD9, the functional connections and/or eQTL associations of particular genes mapping within or close to the respective association intervals show FURIN, CCNE2 and TLE4, respectively, to be promising biological candidates. At yet other loci, such as those centered around ZBED3, CENTD2 and ZFAND6, existing data provide little, if any, basis for strong inferences concerning the genes likely to mediate the T2D susceptibility effect. Accumulation of new data—through deep resequencing of the regions, fine-mapping and functional studies in humans and in animal models—will be required to characterize the specific variants responsible and the genes and pathways through which they execute their effect on T2D risk.

One theme emerging from this work is the high frequency with which loci implicated in T2D susceptibility harbor variants that influence other common traits. This colocalization of common risk variants exceeds chance expectation, often connects diseases with little obvious mechanistic overlap and typically involves statistically independent susceptibility signals. Recent evidence that tissue-specific eQTL signals are preferentially located in regulatory sequences some distance from transcriptional start sites63—in common with many complex trait association signals—suggests that further dissection of these regions should improve understanding of the genomic organization of tissue- and/or developmental-stage-specific regulation.

A further conclusion is that common SNP signals are likely to fall short in explaining the observed familial aggregation of T2D, at least in European descent populations. The limited power of our study (Table 1) to detect several of the genome-wide significant variants we report here (based on the stage 1 sample size and stage 2 odds ratios that minimize ‘winner’s curse’ effects) indicates that there are likely to be many additional common variant signals of similar effect that could be detected by further expansion of the GWA meta-analysis approach. However, it seems unlikely that these will explain a substantial proportion of unex-plained heritability. Based on the data presented, the same is likely to be true for common CNVs and for variation on the sex chromosomes. As a result, the attention of researchers in the field is increasingly directed toward evaluation of the contribution of low frequency and rare variants to complex trait susceptibility. Several lines of evidence—the overlap in loci implicated in monogenic and multifactorial diabetes, the congregation of multiple disease signals at a limited number of loci the conditional analyses—point toward the importance of obtaining complete descriptions of causal genetic variation (of all types and frequencies) at the loci uncovered by this and other GWA studies. Such loci are likely to represent hotspots at which the overall contribution to T2D predisposition and biology may be considerably greater than that estimated using the discovered common variants alone.

METHODS

Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturegenetics/.

Supplementary Material

Supplementary Note, Supplementary Figures 1 and 2 and Supplementary Tables 1-6

Online Methods

Acknowledgments

We acknowledge funding from: the Academy of Finland (no. 124243); Agence Nationale de la Recherche (France); American Diabetes Association (1-05-RA-140, 7-08-MN-OK; 7-06-MN-05); Ardix Medical; Association Diabète Risque Vasculaire; Association de Langue Française pour l’Etude du Diabète et des Maladies Métaboliques; Association Française des Diabétiques; Bayer Diagnostics; British Diabetic Association Research; Becton Dickinson; Broad Institute of Harvard and Massachusetts Institute of Technology; The Burroughs Wellcome Fund; Cardionics; Center for Inherited Disease Research (USA); Centre for Medical Systems Biology (The Netherlands); Centre of Excellence Metabolic Disorders Baden-Wuerttemberg (Germany); Caisse Nationale Assurance Maladie des Travailleurs Salariés (France); Clinical Research Institute HUCH Ltd; Deutsche Forschungsgemeinschaft (DFG GrK 1041, DFG RA459, SFB 518); the Danish Diabetes Association; the Danish Health Research Council; Diabetes UK; Doris Duke Charitable Foundation; Erasmus Medical Center (The Netherlands); the Dutch Diabetes Foundation; European Community (HEALTH-F4-2007-201413, HEALTH-2007-B-223211, LSHG-CT-2006-01947, LSHM-CT-2004-512013, LSHM-CT-2004-005272, LSHM-CT-2006-518153); the European Foundation for the Study of Diabetes; the Federal Ministry of Health (Germany); the Federal Ministry of Education and Research (Germany) (FKZ01GS0823 and DZD e.V.); Fédération Française de Cardiologie; The Finnish Diabetes Research Foundation; The Folkhalsan Research Foundation; The Foundation for Strategic Research (Sweden); The Foundation of Bristol-Myers Squibb; the German National Genome Research Network; Helmholtz Zentrum München-Research Center for Environment and Health; INSERM (France); La Fondation de France; Lilly; The Linnaeus Centre for Bioinformatics (Sweden); the Lundbeck Foundation Centre of Applied Medical Genomics for Personalized Disease Prediction, Prevention and Care; the Medical Research Council UK (G0601261, G0000649; 081696); Munich Center of Health Sciences-LMU Innovativ (Germany); Merck Santé; the Ministry of Health and Department of Educational Assistance, University and Research of the Autonomous Province of Bolzano (Italy); the Ministry of Innovation, Science, Research and Technology of the State of North Rhine-Westphalia (Germany); the Ministry of Science, Education and Sport (Croatia); the National Heart, Lung, and Blood Institute (N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, N01-HC-25195, R01HL087641, R01HL59367, R01HL086694, N02-HL-6-4278); National Human Genome Research Institute (U01HG004402, U01HG004399, U01HG004171, 1 Z01 HG000024); the National Institute of Diabetes, Digestive and Kidney Diseases (DK078616, K24-DK080140, U54 DA021519, DK58845, DK069922, DK062370, DK073490, K23-DK65978 and DK072193); the US National Institutes of Health (HHSN268200625226C, HHSN268200625226C, 1K08AR055688, UL1RR025005, 1K99HL094535-01A1); the Netherlands Foundation for Scientific Research (175.010.2005.011, 047.017.043); Nord-Pas-de-Calais region (France); Novartis Pharma; Novo Nordisk; the Oxford National Institute for Health Research (NIHR) Biomedical Research Centre (UK); Office National Inter-professionnel des Vins; Peninsula Medical School, Exeter UK; Pfizer, Inc; Pierre Fabre laboratory (France); Programme National de Recherche sur le Diabète (France); Richard and Susan Family Foundation/American Diabetes Association Pinnacle Program Project; Roche; the Royal Society (UK); Russian Foundation for Basic Research (047.017.043); Sanofi-Aventis; Sarnoff Cardiovascular Research; Scottish Government Chief Scientist Office; SenterNovem (IOP Genomics grant IGE05012); Sigrid Juselius Foundation; the Skaraborg Institute, Skövde, Sweden; South Tyrolean Sparkasse Foundation; the Swedish Natural Sciences Research Council; The Swedish Research Council (349 2006-237P); the Association Diabète Risque Vasculaire (France); Topcon; the Wallenberg Foundation; and the Wellcome Trust (072960; 076113; 083270; 088885; 079557; 081682; 086596; 077016; 075491). A more complete list of acknowledgments is provided in the Supplementary Note.

Footnotes

Note: Supplementary information is available on the Nature Genetics website.

COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturegenetics/.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Note, Supplementary Figures 1 and 2 and Supplementary Tables 1-6

Online Methods