Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease (original) (raw)

. Author manuscript; available in PMC: 2014 Jun 1.

Published in final edited form as: Nat Genet. 2013 Oct 27;45(12):1452–1458. doi: 10.1038/ng.2802

Abstract

Eleven susceptibility loci for late-onset Alzheimer’s disease (LOAD) were identified by previous studies; however, a large portion of the genetic risk for this disease remains unexplained. We conducted a large, two-stage meta-analysis of genome-wide association studies (GWAS) in individuals of European ancestry. In stage 1, we used genotyped and imputed data (7,055,881 SNPs) to perform meta-analysis on 4 previously published GWAS data sets consisting of 17,008 Alzheimer’s disease cases and 37,154 controls. In stage 2,11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer’s disease cases and 11,312 controls. In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer’s disease.


Alzheimer’s disease is a devastating neurological disorder primarily affecting the elderly. The disease manifests with progressive deterioration in cognitive functions, leading to loss of autonomy. The APOE gene (encoding apolipoprotein E) is a major genetic risk factor for Alzheimer’s disease1,2. Previous GWAS in individuals of European ancestry identified nine other genomic regions associated with LOAD37. Recently, a rare susceptibility variant in TREM2 was identified8,9. The search for additional genetic risk factors requires large-scale meta-analysis of GWAS to increase statistical power. Under the banner of I-GAP (International Genomics of Alzheimer’s Project), we conducted a meta-analysis of 4 GWAS samples of European ancestry totaling 17,008 cases and 37,154 controls (stage 1) followed up by genotyping of 11,632 SNPs showing moderate evidence of association (P < 1 × 10−3 in stage 1) in an independent sample that included 8,572 cases and 11,312 controls (stage 2).

In the stage 1 meta-analysis, we used data from four consortia: the Alzheimer’s Disease Genetic Consortium (ADGC), the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the European Alzheimer’s Disease Initiative (EADI) and the Genetic and Environmental Risk in Alzheimer’s Disease (GERAD) Consortium (Table 1, Online Methods, Supplementary Table 1 and Supplementary Note). We used European population reference (EUR) haplotype data from the 1000 Genomes Project (2010 interim release based on sequence data freeze from 4 August 2010 and phased haplotypes from December 2010) to impute genotypes for up to 11,863,202 SNPs per data set. We excluded SNPs that did not pass quality control in each study (Supplementary Table 2 and Supplementary Note). Our meta-analysis included SNPs either genotyped or successfully imputed in at least 40% of the Alzheimer’s disease cases and 40% of the control samples across all data sets (7,055,881 SNPs; Online Methods). In each data set, genotype dosages were analyzed as described in the Supplementary Note (Supplementary Table 2). We performed meta-analysis of the results after applying genomic control correction to each study. The genomic control inflation factor for the meta-analysis was 1.087 for the full set of SNPs and 1.082 after excluding SNPs within the APOE locus (chr. 19: 45,409,039–45,412,650) and within 500 kb of SNPs associated with Alzheimer’s disease at a prespecified level of genome-wide significance (P < 5 × 10−8) in stage 1 (see Supplementary Fig. 1 for quantile-quantile plots).

Table 1.

Description of the consortium data sets used for stage 1 and stage 2

Alzheimer’s disease cases Controls
Consortium N Percentwomen MeanAAO (s.d.) N Percentwomen MeanAAE (s.d.)
Stage 1 ADGC 10,273 59.4 74.7 (7.7) 10,892 58.6 76.3 (8.1)
CHARGE 1,315 63.6 82.7 (6.8) 12,968 57.8 72.8 (8.6)
EADI 2,243 64.9 68.5 (8.9) 6,017 60.7 74.0 (5.4)
GERAD 3,177 64.0 73.0 (8.5) 7,277 51.8 51.0 (11.8)
N 17,008 37,154
Country N Percentwomen MeanAAO (s.d.) N Percentwomen MeanAAE (s.d.)
Stage 2 Austria 210 61.0 72.5 (8.1) 829 43.3 65.5 (8.0)
Belgium 878 66.1 75.4 (8.6) 661 59.5 65.7 (14.3)
Finland 422 68.0 71.4 (6.9) 562 59.3 69.1 (6.2)
Germany 972 63.9 73.0 (8.6) 2,378 53.1 69.5 (10.1)
Greece 256 63.3 69.2 (8.0) 229 34.1 49.3 (16.4)
Hungary 125 68.0 74.9 (6.8) 100 69.0 74.4 (6.5)
Italy 1,729 66.5 71.5 (8.7) 720 55.7 70.0 (10.4)
Spain 2,121 66.3 75.0 (8.3) 1,921 55.3 70.2 (10.8)
Sweden 797 61.7 76.8 (8.1) 1,506 62.8 70.6 (8.7)
UK 490 57.6 74.6 (8.7) 1,066 29.2 73.8 (6.5)
United States 572 61.9 83.5 (7.6) 1,340 54.0 79.3 (6.8)
N 8,572 11,312

In addition to the APOE locus, 14 genomic regions had associations that reached the genome-wide significance level (Fig. 1). Nine had been previously identified by GWAS as genetic susceptibility factors37, and five (HLA-DRB5_–_HLA-DRB1, PTK2B, SORL1, _SLC24A4_-RIN3 and DSG2) represent newly associated loci (Table 2). SORL1 had previously been identified as an Alzheimer’s disease gene through candidate gene approaches and in a GWAS combining ADGC and Asian samples10. Genes attributed to a signal were those closest to the most significantly associated SNP. However, we are aware that these are potentially not the causative genes. Detailed results for each region are given in Supplementary Figure 2–7.

Figure 1.

Figure 1

Manhattan plot of stage 1 for genome-wide association with Alzheimer’s disease (17,008 cases and 37,154 controls). The threshold for genome-wide significance (P < 5 × 10−8) is indicated by the red line. Genes previously identified by GWAS are shown in black, and newly associated genes are shown in red. Red diamonds represent SNPs with the smallest P values in the overall analysis.

Table 2.

summary of stage 1, stage 2 and overall meta-analyses for sNPs reaching genome-wide significance after stages 1 and 2

Stage 1 Stage 2 Overall
SNPa Chr. Positionb Closestgenec Major/minoralleles MAFd OR(95% CI)e Meta_P_ value OR(95% CI)e Meta_P_ value OR(95% CI)e Meta_P_ value _I_2 (%),P valuef
Known GWAS-defined associated genes
rs6656401 1 207692049 CR1 G/A 0.197 1.17(1.12–1.22) 7.7 × 10−15 1.21(1.14–1.28) 7.9 × 10−11 1.18(1.14–1.22) 5.7 × 10−24 0, 7.8 × 10−1
rs6733839 2 127892810 BIN1 C/T 0.409 1.21(1.17–1.25) 1.7 × 10−26 1.24(1.18–1.29) 3.4 × 10−19 1.22(1.18–1.25) 6.9 × 10−44 28, 6.1 × 10−2
rs10948363 6 47487762 CD2AP A/G 0.266 1.10(1.07–1.14) 3.1 × 10−8 1.09(1.04–1.15) 4.1 × 10−4 1.10(1.07–1.13) 5.2 × 10−11 0, 9 × 10−1
rs11771145 7 143110762 EPHA1 G/A 0.338 0.90(0.87–0.93) 8.8 × 10−10 0.90(0.86–0.95) 2.8 × 10−5 0.90(0.88–0.93) 1.1 × 10−13 14, 2.4 × 10−1
rs9331896 8 27467686 CLU T/C 0.379 0.86(0.84–0.89) 9.6 × 10−17 0.86(0.82–0.90) 4.5 × 10−10 0.86(0.84–0.89) 2.8 × 10−25 0, 4.9 × 10−1
rs983392 11 59923508 MS4A6A A/G 0.403 0.90(0.87–0.93) 2.8 × 10−11 0.90(0.86–0.94) 4.5 × 10−6 0.90(0.87–0.92) 6.1 × 10−16 1, 4.5 × 10−1
rs10792832 11 85867875 PICALM G/A 0.358 0.88(0.85–0.91) 6.5 × 10−16 0.85(0.81–0.89) 1.1 × 10−11 0.87(0.85–0.89) 9.3 × 10−26 0, 9.8 × 10−1
rs4147929 19 1063443 ABCA7 G/A 0.190 1.14(1.10–1.20) 1.7 × 10−9 1.17(1.10–1.24) 9.9 × 10−8 1.15(1.11–1.19) 1.1 × 10−15 0, 9.4 × 10−1
rs3865444g 19 51727962 CD33 C/A 0.307 0.91(0.88–0.94) 5.1 × 10−8 0.99(0.94–1.04) 6.9 × 10−1 0.94(0.91–0.96) 3.0 × 10−6 0, 6.9 × 10−1
New loci reaching genome-wide significance in the discovery analysis
rs9271192 6 32578530 _HLA-DRB5_– HLA-DRB1 A/C 0.276 1.11(1.07–1.16) 1.6 × 10−8 1.12(1.06–1.18) 4.2 × 10−5 1.11(1.08–1.15) 2.9 × 10−12 0, 5.4 × 10−1
rs28834970 8 27195121 PTK2B T/C 0.366 1.10(1.07–1.14) 3.3 × 10−9 1.11(1.06–1.17) 4.3 × 10−6 1.10(1.08–1.13) 7.4 × 10−14 10, 3.0 × 10−1
rs11218343 11 121435587 SORL1 T/C 0.039 0.76(0.70–0.83) 5.0 × 10−11 0.78(0.70–0.88) 4.0 × 10−5 0.77(0.72–0.82) 9.7 × 10−15 0, 8.3 × 10−1
rs10498633 14 92926952 SLC24A4 RIN3 G/T 0.217 0.90(0.87–0.94) 1.5 × 10−7 0.93(0.88–0.98) 7.8 × 10−3 0.91(0.88–0.94) 5.5 × 10−9 0, 6.3 × 10−1
rs8093731g 18 29088958 DSG2 C/T 0.017 0.54(0.43–0.67) 4.6 × 10−8 1.01(0.80–1.28) 9.0 × 10_−_1 0.73(0.62–0.86) 1.0 × 10_−_4 38, 3.9 × 10−2
New loci reaching genome-wide significance in the combined discovery and replication analysis
rs35349669 2 234068476 INPP5D C/T 0.488 1.07(1.03–1.10) 9.6 × 10−5 1.10(1.05–1.15) 5.7 × 10−5 1.08(1.05–1.11) 3.2 × 10−8 0, 8.0 × 10−1
rs190982 5 88223420 MEF2C A/G 0.408 0.92(0.89–0.95) 2.5 × 10−6 0.93(0.89–0.98) 3.4 × 10−3 0.93(0.90–0.95) 3.2 × 10−8 0, 6.4 × 10−1
rs2718058 7 37841534 NME8 A/G 0.373 0.93(0.90–0.96) 1.3 × 10−5 0.91(0.87–0.95) 6.3 × 10−5 0.93(0.90–0.95) 4.8 × 10−9 0, 9.2 × 10−1
rs1476679 7 100004446 ZCWPW1 T/C 0.287 0.92(0.89–0.96) 7.4 × 10−6 0.89(0.85–0.94) 9.7 × 10−6 0.91(0.89–0.94) 5.6 × 10−10 0, 7.0 × 10−1
rs10838725 11 47557871 CELF1 T/C 0.316 1.08(1.04–1.11) 6.7 × 10−6 1.09(1.04–1.14) 4.1 × 10−4 1.08(1.05–1.11) 1.1 × 10−8 0, 7.6 × 10−1
rs17125944 14 53400629 FERMT2 T/C 0.092 1.13(1.07–1.19) 1.0 × 10−5 1.17(1.08–1.26) 1.6 × 10−4 1.14(1.09–1.19) 7.9 × 10−9 10, 3.0 × 10−1
rs7274581 20 55018260 CASS4 T/C 0.083 0.87(0.82–0.92) 1.6 × 10−6 0.89(0.82–0.96) 4.1 × 10−3 0.88(0.84–0.92) 2.5 × 10−8 0, 9.9 × 10−1

In stage 2, we selected for genotyping all stage 1 SNPs with a P value less than 1 × 10−3, excluding SNPs flanking APOE (chr. 19: 45,409,039–45,412,650) (n = 19,532; see URLs for database access). From the initial set of SNPs, 14,445 could be genotyped using Illumina iSelect technology. After quality control procedures (Online Methods), we considered 11,632 SNPs for association analysis. The stage 2 sample included 8,572 cases and 11,312 controls of European ancestry originating from Austria, Belgium, Finland, Germany, Greece, Hungary, Italy, Spain, Sweden, the UK and the United States (Table 1 and Supplementary Note). We observed 116 SNPs showing the same risk allele and direction of association in stages 1 and 2 that were significantly associated with Alzheimer’s disease risk in stage 2 after a strict Bonferroni correction for multiple testing (P < 4.3 × 10−6). Of these 116 SNPs, 80 had been associated at genome-wide significance with Alzheimer’s disease risk in stage 1. Additionally, in analyses in stage 2, 2,562 SNPs were associated with Alzheimer’s disease at a nominal level of significance (P < 0.05), having the same risk allele and direction of association as in stage 1.

The results from stages 1 and 2 and from the combined stage 1 and stage 2 data sets, which represent a secondary discovery effort, are shown in Table 2. With the exception of CD33 and DSG2, we nominally replicated all loci that surpassed the genome-wide significance level in stage 1. Inability to replicate DSG2 is not surprising, as evidence of association for this locus was based on data for a single SNP and was not supported by data from surrounding SNPs in linkage disequilibrium (LD, _r_2 > 0.8; Supplementary Fig. 7b). Moreover, seven new loci reached the genome-wide significance level in the combined analysis (Table 2). More detailed results for the seven newly identified LOAD loci are provided in Supplementary Figure 8–11. There was no significant heterogeneity across studies at any of the loci, except at DSG2 (Table 2 and Supplementary Fig. 12–16). To identify potential causative genes, we also examined all SNPs with association P < 5× 10−8 that were within 500 kb of the top SNP at each locus to identify cis expression quantitative trait locus (_cis_-eQTL) associations (Online Methods and Supplementary Table 3).

The results from the combined stage 1 and stage 2 data sets also identified 13 loci with suggestive evidence of association (P < 1 × 10−6) (Supplementary Table 4). Among these, we detected a signal for rs9381040 (P = 6.3 × 10−7), which is located approximately 5.5 kb away from the 3′ end of TREML2 and 24 kb away from the 5′ end of TREM2. TREM2 was recently reported to carry a rare variant (encoding p.Arg47His) associated with three- to fourfold increased risk of developing Alzheimer’s disease8,9. This region also reached genome-wide significance in a study of cerebral spinal fluid levels of phosphorylated tau, a biomarker for Alzheimer’s disease11.

Beyond the already known, GWAS-defined genes (ABCA7, BIN1, CD33, CLU, CR1, CD2AP, EPHA1, MS4A6A–MS4A4E and PICALM), the most significant new association was in the HLA-DRB5_–_DRB1 region (encoding major histocompatibility complex, class II, DRβ5 and DRβ1, respectively). This region is associated with immunocompetence and histocompatibility and, interestingly, with risk of both multiple sclerosis and Parkinson disease12,13. Owing to the complex genetic organization of the human leukocyte antigen (HLA) region on chromosome 6, we were unable to define which gene(s) are responsible for this signal (Supplementary Fig. 6a).

The second strongest signal was within the SORL1 gene (encoding sortilin-related receptor, L(DLR class) 1). Our data clearly demonstrated that this gene was associated at genome-wide significance in European samples. SORL1 is noteworthy, as it is associated with increased risk of both autosomal dominant and sporadic forms of Alzheimer’s disease14,15 and represents the first LOAD gene that directly connects aberrant trafficking and metabolism of the amyloid precursor protein (APP) to LOAD14.

The third locus, PTK2B (encoding protein tyrosine kinase 2β), is only approximately 130 kb away from CLU, but we believe the two signals are independent because (i) the two most strongly associated SNPs within each of these two genes are not in LD (_D_′ = 0.06 and _r_2 = 0.003 as computed using 1000 Genomes Project data); (ii) a recombination peak exists between the two loci (Fig. 2); and (iii) conditional analysis in the stage 2 data confirmed the independence of the PTK2B association (Supplementary Fig. 17 and Supplementary Table 5). The protein encoded by PTK2B may be an intermediate between neuropeptide-activated receptors or neurotransmitters that increase calcium flux and the downstream signals regulating neuronal activity such as mitogen-activated protein kinase (MAPK) signaling16. PTK2B is involved in the induction of long-term potentiation in the hippocampal CA1 (cornu ammonis 1) region, a central process in the formation of memory17. We cannot, however, exclude the possibility that there are multiple signals in the PTK2B_–_CLU region that are functionally connected to a single gene. For instance, two SNPs associated with genome-wide significance in the PTK2B_–_CLU region are eQTLs for the gene DPYSL2 that has been implicated in Alzheimer’s disease18 (Supplementary Table 3).

Figure 2.

Figure 2

Regional plot for the PTK2B-CLU locus (17,008 cases and 37,154 controls).

The fourth locus was SLC24A4 (encoding solute carrier family 24 (sodium/potassium/calcium exchanger), member 4). The SLC24A4 gene encodes a protein involved in iris development and hair and skin color variation in humans in addition to being associated with the risk of developing hypertension19,20. SLC24A4 is also expressed in the brain and may be involved in neural development21. Of note, in the vicinity of the most strongly associated SNP is another gene called RIN3 (encoding Ras and Rab interactor 3), and its gene product directly interacts with the BIN1 gene product22, a protein that may be connected to tau-mediated pathology23.

In addition to these four loci reaching genome-wide significance in stage 1, seven new loci reached genome-wide significance in the combined analysis.

The strongest association at one of these new loci was intronic in the ZCWPW1 gene (encoding zinc finger, CW type with PWWP domain 1), whose corresponding protein modulates epigenetic regulation24. However, the region defined by all the SNPs associated with Alzheimer’s disease risk in our data is large and contains about ten genes (Supplementary Fig. 9b). Another interesting possible candidate gene in the ZCWPW1 region is NYAP1, as disruption of the corresponding gene in mice affects brain size, neurite elongation and, more generally, neuronal morphogenesis25. Our data do not resolve which gene in this region may be causal.

A second locus was within the CELF1 gene (encoding CUGBP, Elav-like family member 1), whose gene product is a member of the protein family that regulates pre-mRNA alternative splicing26. As with the ZCWPW1 locus, the region of interest is large and contains about ten genes (Supplementary Fig. 10a). Among these genes is MADD (encoding MAP kinase–activating death domain), the reduced expression of which may affect long-term neuronal viability in Alzheimer’s disease27.

A discrete signal was observed adjacent to NME8 (encoding NME/ NM23 family member 8), which is responsible for primary ciliary dyskinesia type 6 (ref. 28).

The FERMT2 gene (encoding fermitin family member 2) is expressed in the brain. Its corresponding protein localizes to cell matrix adhesion structures, activates integrins, is involved in the orchestration of actin assembly and cell shape modulation, and is an important mediator of angiogenesis29. An association between the Drosophila melanogaster ortholog of FERMT2 (fit1/fit2) and tau-mediated toxicity was recently described30.

We identified a fifth signal on chromosome 20 at CASS4 (encoding Cas scaffolding protein family member 4). Little is known about the function of the encoded protein. However, the Drosophila CASS family ortholog (p130CAS) binds to CMS, the Drosophila ortholog of CD2AP (CMS), a known Alzheimer’s disease susceptibility gene (Table 2) that is involved in actin dynamics31.

Another locus was identified at INPP5D (encoding inositol polyphosphate-5-phosphatase, 145 kDa) on chromosome 2. INPP5D is expressed at low levels in the brain, but the encoded protein has been shown to interact with CD2AP, whose corresponding gene is one of the Alzheimer’s disease genes previously identified by GWAS32, and to modulate, along with GRB2, metabolism of APP33.

We identified a seventh signal adjacent to MEF2C (encoding myocyte enhancer factor 2). Mutations at this locus are associated with severe mental retardation, stereotypic movements, epilepsy and cerebral malformation34. The MEF2C protein limits excessive synapse formation during activity-dependent refinement of synaptic connectivity and thus may facilitate hippocampal-dependent learning and memory35.

In summary, our Alzheimer’s disease GWAS meta-analysis has identified 11 new susceptibility loci in addition to the already known ABCA7, APOE, BIN1, CLU, CR1, CD2AP, EPHA1, MS4A6A–MS4A4E and PICALM genes. However, we were not able to replicate association of CD33 in our stage 2 analysis (P = 0.61). We did not detect any biases in terms of imputation in our discovery data sets or genotyping in our replication data sets (data not shown), suggesting a potential statistical fluctuation across our populations as an explanation for the lack of replication. However, recent data suggest that genetically determined decreased CD33 expression might reduce Alzheimer’s disease risk and interfere with amyloid β peptide clearance36, a dysfunction thought to be central in late-onset forms of Alzheimer’s disease37. Further investigations in independent case-control studies will thus be required to confirm or refute the association of CD33 with Alzheimer’s disease.

The newly associated loci reinforce the importance of some previously suspected pathways such as APP (SORL1 and CASS4) and tau (CASS4 and FERMT2) in pathology. Several candidate genes at these loci are involved in pathways already shown to be enriched for association signal in Alzheimer’s disease GWAS38,39, such as immune response and inflammation (HLA-DRB5_–_DRB1, INPP5D and MEF2C), which is also supported by the described association of Alzheimer’s disease with CR1 (ref. 3) and TREM2 (refs. 8,9), cell migration (PTK2B) and lipid transport and endocytosis (SORL1). Our results also suggest the existence of new pathways underlying Alzheimer’s disease. These pathways could include hippocampal synaptic function (MEF2C and PTK2B), cytoskeletal function and axonal transport (CELF1, NME8 and CASS4), regulation of gene expression and post-translational modification of proteins, and microglial and myeloid cell function (INPP5D).

Examining the genetic effect attributable to all the associated loci, we demonstrated that the most strongly associated SNPs at each locus other than APOE had population-attributable fractions (PAFs) or preventive fractions between 1.0–8.0% in the stage 2 sample (Supplementary Table 6). Strong efforts in sequencing and post-GWAS analyses will now be required to fully characterize the candidate genes and functional variants responsible for the association of these GWAS-identified loci with Alzheimer’s disease risk and to understand their exact roles in the pathophysiology of Alzheimer’s disease40,41.

URLs. Database access, http://www.pasteur-lille.fr/en/recherche/u744/Igap_stage1.zip; IMPUTE2, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html; MaCH, http://www.sph.umich.edu/csg/abecasis/MACH/; ProbABEL, http://www.genabel.org/packages/ProbABEL; SMARTPCA, http://www.hsph.harvard.edu/alkes-price/software/; GWAMA, http://www.well.ox.ac.uk/gwama/; LocusZoom, http://csg.sph.umich.edu/locuszoom/; PLINK, http://pngu.mgh.harvard.edu/~purcell/plink/; SNPTEST, https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html; Aberrant, http://www.well.ox.ac.uk/software; Metal, http://www.sph.umich.edu/csg/abecasis/metal/; R, http://www.r-project.org/; R meta, http://cran.r-project.org/web/packages/rmeta/index.html; eQTL analyses (accessed 18 February 2013), http://eqtl.uchicago.edu/cgi-bin/gbrowse/eqtl.

ONLINE METHODS

All case-control studies are described in Table 1, in the Supplementary Note (see full description of the I-GAP data sets) and in Supplementary Tables 1, 7 and 8. Written informed consent was obtained from study participants or, for those with substantial cognitive impairment, from a caregiver, legal guardian or other proxy, and the study protocols for all populations were reviewed and approved by the appropriate institutional review boards.

Imputation and SNP selection for stage 1 analysis

After quality control criteria were finalized for each individual and each sample collection (SNPs with call rates of <95% were excluded; Supplementary Note), IMPUTE2 (ref. 42) or MaCH/Minimac43 software (Supplementary Table 2) was used to impute the genotypes of all participants with haplotypes derived from samples of European ancestry in the 1000 Genome Project (2010 interim release based on the sequence data freeze from 4 August 2010 and phased haplotypes from December 2010). In each data set, SNPs with _R_2 or info score quality estimates of less than 0.3, as indicated by MaCH or IMPUTE2, respectively (with these two quality estimates described to be equivalent), were excluded from analyses. Similarly, SNPs with a MAF of <1% were also excluded. After these procedures, a maximum of 8,133,148 SNPs were retained that were present in at least 1 data set.

In each case-control data set, the association of LOAD with genotype dosage was analyzed by a logistic regression model including covariates for age, sex and principal components to adjust for possible population stratification (Supplementary Table 2). For the three CHARGE cohorts with incident Alzheimer’s disease data, Cox proportional hazards models were used. The four consortia used different but analogous software for these analyses (PLINK44, SNPTEST45, ProbABEL46 or R; Supplementary Table 2). Three of these tools were applied to the EADI data set for quality control, and very similar results were observed. After the exclusion of SNPs showing logistic regression coefficient |β| > 5 or P value equal to 0 or 1, the maximum number of SNPs in any data set was 8,131,643. Each consortium uploaded summarized results for each SNP to an internal I-GAP website for access by members of each consortium.

SNPs genotyped or imputed in at least 40% of Alzheimer’s disease cases and 40% of control samples were included in the meta-analysis. This threshold represented the best compromise between maximizing the total number of SNPs and maximizing the number of samples in which the given SNP was present. Indeed, analyzing all SNPs available in at least one study could have greatly increased the risk of false positives. On the other hand, studying SNPs only present in all studies could have led to the removal of SNPs of potential interest, even if those SNPs could have reached adequate statistical power in a more limited number of data sets (false negatives). This approach allowed us to increase homogeneity between studies for some SNPs by excluding poor quality data present only in a limited number of data sets of small size. This last selection step led to a final number of 7,055,881 SNPs in stage 1 analysis.

iSelect microarray design and stage 2 SNP quality control

SNPs associated with Alzheimer’s disease risk and exhibiting P value < 1 × 10−3 in stage 1 analysis were selected for replication. A list of 19,532 SNPs was submitted to a devoted Illumina website to develop an iSelect microarray. A total of 16,732 SNPs exhibiting an Illumina score superior or equal to 0.4 were selected for microarray production. During the Illumina production process, 2,287 SNPs failed oligonucleotide synthesis, leading to a final number of 14,445 SNPs for which genotyping was attempted. Genotyping failure led to the further exclusion of an additional 1,999 SNPs as a result of the SNPs (i) having no intensity signal (n = 559), (ii) not being polymorphic (n = 1,176), (iii) only being found in a heterozygous state (n = 248) or (iv) having mismatched alleles compared to 1000 Genomes Project data (n = 16). Finally, several quality control measures were applied to the remaining 12,446 SNPs to detect potential biases in genotyping. We first tested for discrepancies in allelic frequency between the 1000 Genomes Project EUR reference panel and stage 2 data. Allele frequencies for stage 2 data were estimated on 10,750 controls (see “Stage 2 sample quality control”) and after exclusion of Finnish individuals. The allelic test was performed with PLINK, and P values were computed by performing 4,500,000 permutations to avoid an assumption of Hardy-Weinberg equilibrium. In total, 798 SNPs showed a highly significant difference in terms of allele frequency between the 1000 Genomes Project EUR reference panel and stage 2 data (P < 1 × 10−5; Supplementary Fig. 18) and were excluded from the analysis.

Other SNP quality control steps were performed separately in data for each country. A SNP was considered of low genotyping quality in a country data set if it had missing genotype data for more than 10% of the individuals, if the P value for the Hardy-Weinberg test in controls was lower than 1 × 10−6 or if the P value for the test for differences in missingness between cases and controls was lower than 1 × 10−6 (see Supplementary Table 9 for differences in missingness assessed for suggestive and significant hits across European populations). These quality control steps led to the removal of 16 SNPs with low genotyping quality in data from all countries.

After SNP quality control, 11,632 SNPs were considered to be of high genotyping quality in at least 1 country and were analyzed in stage 2. For imputed data sets, SNPs were considered to be of low imputation quality if their info score was <0.3.

Of note, of the 7,086 SNPs that we were unable to successfully genotype, only 471 were not tagged by another successfully genotyped variant (±100 kb) and associated with a P value at least 10 times higher than that of the missing SNP. Because the vast majority of the untagged SNPs exhibited stage 1 P values between 1 × 10−3 and 1 × 10−4 (92%), the likelihood of missing a true association was considered to be low.

Stage 2 sample quality control

The iSelect microarray contained 33,368 SNPs, of which 11,632 were devoted to stage 2. These supplementary SNPs included various genetic data that allowed us to further refine our quality control processes. On the basis of data for all of these SNPs, we excluded individuals who had more than 3% missing genotypes, showed a discrepancy between reported sex and sex estimated on the basis of genetic data (genetic sex) or showed evidence of non-European ancestry. Duplicated and related individuals were identified (Supplementary Table 10). Briefly, discrepancies in sex were examined using genetic sex as estimated by PLINK on 40 SNPs on chromosome X. We also removed 93 individuals from a single plate for whom an abnormal number of discrepancies in sex were observed, suggesting that sample mixing had occurred. Using a panel of 261 ancestry-informative markers (AIMs), we performed a principal-component analysis (PCA) on HapMap 2 data with the function SMARTPCA from EIGENSOFT 4.2 software47. For each country, individuals were projected onto the first two PCA axes to define their genetic ancestry. Individuals with evidence of non-European ancestry were then identified by applying a Bayesian clustering approach48 to their coordinates on the first two axes. Identity by descent (IBD) was computed for all pairs of individuals using PLINK, and individuals in a pair with IBD greater than 0.98 were considered to be duplicates. If clinical data for duplicated individuals were discordant, both individuals were excluded. Otherwise, the individual with the greater proportion of missing genotype was excluded. Similarly, IBD was computed for all pairs of individuals in data from each country separately, using 6,764 autosomal SNPs with MAF of >1% and selected to minimize LD. Individuals in pairs with IBD greater than 0.2 were considered to be related and were iteratively removed so as to obtain a sample of unrelated individuals within each country data set.

Finally, individuals with missing clinical data and controls less than 25 years of age were excluded from the analysis. After sample quality control (Supplementary Table 10), 19,884 individuals (8,572 cases and 11,312 controls) were available for analysis in stage 2.

Statistical analysis

For the stage 1 meta-analysis, we undertook fixed-effects inverse variance-weighted meta-analysis with the standard errors of the β-coefficient scaled by the square roots of study-specific genomic inflation factors estimated before combining the summary statistics across data sets. Each consortium performed an independent stage 1 meta-analysis after downloading the data files available on the I-GAP website. Two software packages were used for meta-analysis: METAL49 and GWAMA50. Very similar results were generated independently of the software used and as expected, perfect matching was observed between the analyses undertaken by each of the 4 consortia.

For stage 2, association tests were performed for each country for all high-quality genotyped SNPs under an additive model, using logistic regression as implemented in PLINK. Analysis was adjusted for age, sex and principal components, when necessary. Using SMARTPCA, PCA was performed on individuals from each country separately. Difference in PCA coordinates between cases and controls were tested for the first four principal components, and analysis was further adjusted on principal components if the P value of this test was lower than 0.05. PCA for Bonn stage 2 samples was based on GWAS data. For imputed data sets, association tests were performed using likelihood score tests for missing data as implemented in SNPTEST. Genotyped and imputed German samples were analyzed separately, and results were then combined by fixed-effects meta-analysis using the inverse variance approach as implemented in METAL. Using this approach, a fixed-effects meta-analysis was then performed to combine stage 2 results from the different countries. We also performed the analysis separately for each center in stage 2 and combined the results by fixed-effects meta-analysis. Results were similar to those obtained when analysis was performed by country (data not shown).

We finally generated fixed-effects inverse variance–weighted meta-analyses by combining summary statistics across ADGC, CHARGE, EADI, GERAD and stage 2 data by country. At this point, we performed Cochran’s Q test for heterogeneity and generated _I_2 estimates with METAL to evaluate the possible effect of study heterogeneity on the results.

A graphic representation of the association signal in the stage 1 data was generated with LocusZoom software51 for all the loci of interest reaching a genome-wide significant level after combined stage 1 and stage 2 analyses.

PAF was calculated using the Levin equation52.

Annotation of I-GAP top SNPs for eQTLs

To gain further biological insights, we explored reported associations between SNPs in the top I-GAP loci and gene expression. We first selected all SNPs that reached genome-wide significance (P value ≤ 5 × 10−8) in the combined stage 1 and stage 2 analysis and were located in a 500-kb window upstream or downstream of the top SNP at each locus (Table 2). We then searched for published data on gene expression associated with each of these SNPs in the eQTL database from the Pritchard laboratory (see URLs). For each reported eQTL gene and each type of eQTL association as defined in this database, we then counted the number of reported eQTL SNPs and selected the one with the lowest P value.

Supplementary Material

Supplementary note, tables and figures

ACKNOWLEDGMENTS

This work was made possible by the generous participation of the control subjects, the patients and their families. iSelect chips were funded by the French National Foundation on Alzheimer’s Disease and Related Disorders. Data management involved the Centre National de Génotypage and was supported by the Institut Pasteur de Lille, INSERM, FRC (Fondation pour la Recherche sur le Cerveau) and Rotary. This work has been developed and supported by the LABEX (Laboratory of Excellence Program Investment for the Future) DISTALZ grant (Development of Innovative Strategies for a Transdisciplinary Approach to Alzheimer’s Disease). The French National Foundation on Alzheimer’s Disease and Related Disorders and the Alzheimer’s Association (Chicago, Illinois) grant supported in-person meetings and communication for IGAP, and the Alzheimer’s Association (Chicago, Illinois) grant provided some funds to each consortium for analyses.

GERAD was supported by the Wellcome Trust, the MRC, Alzheimer’s Research UK (ARUK) and the Welsh government. ADGC and CHARGE were supported by the US National Institutes of Health, National Institute on Aging (NIH-NIA), including grants U01 AG032984 and R01 AG033193 (additional US National Institutes of Health grant numbers are listed in the Supplementary Note). CHARGE was also supported by Erasmus Medical Center and Erasmus University.

Complete acknowledgments are detailed in the Supplementary Note.

Footnotes

Note: Any Supplementary Information and Source Data files are available in the online version of the paper.

AUTHOR CONRIBUTIONS

Study concept and design: J.-C.L., C.A.I.-V., D. Harold, A.C.N., A.L.D., J.C.B., A.V.S., M.A.I., H. Schmidt, A.L.F., V.G., O.L.L., D.W.T., D. Blacker, T.H.M., T.B.H., J.I.R., W.A.K., M. Boada, R. Schmidt, R.M., A.H., B.M.P., J.L.H., P.A.H., M.L., M.A.P.-V., L.J.L., L.A.F., C.M.v.D., V.M., S. Seshadri, J.W., G.D.S. and P.A. Acquisition of data: J.-C.L., C.A.I.-V., D. Harold, C. Bellenguez, R. Sims, G.J., B.G.-B., G.R., N.J., V.C., C. Thomas, D.Z., Y.K., A.G., H. Schmidt, M.L.D., M.-T.B., S.-H.C., P.H., V.G., C. Baldwin, C.C., C. Berr, O.L.L., P.L.D.J., D.E., L. Letenneur, G.E., K.S., A.M.G., N.F., M.J.H., M.I.K., E.B.L., A.J.M., C.D., S.T., S. Love, E.R., P.S.G.-H., L.Y., M.M.C., D. Beekly, F.Z., O.V., S.G.Y., W.G., M.J.O., K.M.F., P.V.J., M.C.O., L.B.C., D.A.B., T.B.H., R.F.A.G.d.B., T.J.M., J.I.R., K.M., T.M.F., W.A.K., J.F.P., M.A.N., K.R., J.S.K.K., E.B., M.R., M. Boada, L.-S.W., J.-F.D., C. Tzourio, M.M.N., B.M.P., L.J., J.L.H., M.L., L.J.L., L.A.F., A.H., C.M.v.D., S. Seshadri, J.W., G.D.S. and P.A. Sample contribution: A. Ruiz, F. Pasquier, A. Ramirez, O.H., J.D.B., D. Campion, P.K.C., C. Baldwin, T.B., C.C., D. Craig, V.D., J.A.J., S. Lovestone, F.J.M., D.C.R., K.S., A.M.G., N.F., M.G., K. Brown, M.I.K., L.K., P.B.-G., B.M., R.G., A.J.M., D.W., E.R., J.G., P.S.G.-H., J.C., A.L., A. Bayer, M.T., P. Bossù, G.S., P. Proitsi, J.C., S. Sorbi, F.S.-G., N.C.F., J.H., M.C.D.N., P. Bosco, R.C., C. Brayne, D.G., M. Mancuso, F.M., S. Moebus, P.M., M.D.Z., W.M., H. Hampel, A.P., M. Bullido, F. Panza, P.C., B.N., M. Mayhaus, L. Lannfelt, H. Hakonarson, S.P., M.M.C., M.I., V.A., S.G.Y., E.C., C. Razquin, P. Pastor, I.M., O.C., H. Soininen, S. Mead, D.A.B., L.F., C.H., P. Passmore, T.J.M., K. Bettens, A. Brice, D. Hannequin, K.R., M.R., M.H., D.R., C.G. and C.V.B. Data analysis: C.A.I.-V., D. Harold, A.C.N., R. Sims, C. Bellenguez, G.J., A.L.D., J.C.B., G.W.B., B.G.-B., G.R., T.A.T.-W., N.J., A.V.S., V.C., M.A.I., D.Z., Y.K., B.N.V., C.-F.L., A.G., B.K., C. Reitz, J.R.G., O.V., W.A.K., K.L.L., K.L.H.-N., E.R.M., L.-S.W., B.M.P., M.L., V.M. and J.W. Statistical analysis and interpretation: J.-C.L., C.A.I.-V., D. Harold, A.C.N., C. Bellenguez, G.J., A.L.D., J.C.B., G.W.B., T.A.T.-W., A.V.S., V.C., M.A.I., B.N.V., Y.K., C.-F.L., B.K., C. Reitz, A.L.F., N.A., J.R.G., R.F.A.G.d.B., W.A.K., K.L.L., E.R.M., L.-S.W., B.M.P., L.J., J.L.H., P.A.H., M.A.P.-V., L.J.L., L.A.F., C.M.v.D., V.M., S. Seshadri, J.W., G.D.S. and P.A. Drafting of the manuscript: J.-C.L., C.A.I.-V., D. Harold, A.C.N., C. Bellenguez, A.L.D., J.C.B., A.V.S., R.M., B.M.P., J.L.H., M.A.P.-V., L.J.L., L.A.F., C.M.v.D., C.V.B., S. Seshadri, J.W., G.D.S. and P.A.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary note, tables and figures