Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease (original) (raw)

. Author manuscript; available in PMC: 2011 Mar 1.

SUMMARY

Parkinson disease (PD) is a chronic neurodegenerative disorder with a cumulative prevalence of greater than one per thousand. To date three independent genome-wide association studies (GWAS) have investigated the genetic susceptibility to PD. These studies have also implicated several genes as PD risk loci with strong, but not genome-wide significant, associations.

In this study, we combined data from two previously published GWAS of Caucasian subjects with our GWAS of 604 cases and 619 controls for a joint analysis with a combined sample size of 1752 cases and 1745 controls. SNPs in SNCA (rs2736990, p-value = 6.7×10−8; genome-wide adjusted p = 0.0109, odds ratio (OR) = 1.29 [95% CI: 1.17–1.42] G vs. A allele, population attributable risk percent (PAR%) = 12%) and the MAPT region (rs11012, p-value = 5.6×10−8; genome-wide adjusted p = 0.0079, OR = 0.70 [95% CI: 0.62–0.79] T vs. C allele, PAR% = 8%) were genome-wide significant. No other SNPs were genome-wide significant in this analysis. This study confirms that SNCA and the MAPT region are major genes whose common variants are influencing risk of PD.

Keywords: Parkinson disease, Association study, Alpha-synuclein, Microtubule associated protein tau

INTRODUCTION

Parkinson disease [PD (OMIM 168600)] is a chronic neurodegenerative disorder with a cumulative prevalence of greater than one per thousand (Kuopio et al. 1999) with at least 1.5 million cases in the United States and 6 million worldwide (Thomas & Beal 2007). Some genetic contributions to PD are well recognized. Mutations with high penetrance were initially identified in PD due to their easily detectable effects in relatively rare, early-onset, Mendelian forms of PD (Bonifati et al. 2003; Kitada et al. 1998; Paisan-Ruiz et al. 2004; Polymeropoulos et al. 1997; Zimprich et al. 2004). These known mutations explain less than 10% of PD cases (Lesage & Brice 2009). Over the last several years significant effort has been focused on investigating the contributions of common variants to PD risk and age-at-onset. Candidate gene approaches to identify genetic associations with PD have been used to follow up family-based genome-wide linkage studies to identify genomic regions containing risk loci (reviewed by Lesage et al (Lesage & Brice 2009)). This focused approach restricts the number of association tests performed, but is limited to identifying loci detectable by linkage analysis(Risch & Merikangas 1996). More recently three genome-wide association studies (GWAS) have been conducted in PD, albeit with results that did not reach genome-wide significance (Fung et al. 2006a; Maraganore et al. 2005; Pankratz et al. 2009). While genes that achieve genome-wide significance in a complex disease GWAS are important, the presence of genetic heterogeneity, unobserved environmental interactions and subphenotypes with distinct genetic etiologies can all reduce the apparent contribution of important genes to below the threshold for genome-wide significance, requiring that large samples be assembled for adequate power.

Associations with PD have been replicated in the candidate gene and GWAS contexts, including those described early in PD association studies, such as alpha-synuclein (SNCA, Entrez Gene ID (EGI):6622) (Farrer et al. 2001; Kruger et al. 1999; Maraganore et al. 2006; McCulloch et al. 2008; Mueller et al. 2005; Myhre et al. 2008; Sutherland et al. 2009) and the microtubule-associated protein tau (MAPT, EGI:4137) inversion region on chromosome 17 (Fidani et al. 2006; Fung et al. 2006b; Goris et al. 2007; Healy et al. 2004; Kwok et al. 2004; Levecque et al. 2004; Mamah et al. 2005; Martin et al. 2001; Scott et al. 2001; Skipper et al. 2004; Vandrovcova et al. 2007; Winkler et al. 2007; Zabetian et al. 2007; Zappia et al. 2003), as well as ubiquitin-specific protease 24 (USP24, EGI:23358) (Haugarvoll et al. 2009; Li et al. 2006; Oliveira et al. 2005), ELAV-like 4 (ELAVL4, EGI:1996) (DeStefano et al. 2008; Haugarvoll et al. 2007; Noureddine et al. 2005), monoamine oxidase B (MAOB, EGI:4129) (Kurth et al. 1993), Apolipoprotein E (APOE, EGI:348) (Rubinsztein et al. 1994), and the mitochondrial haplogroups (Autere et al. 2004; Gaweda-Walerych et al. 2008; Ghezzi et al. 2005; Huerta et al. 2005; Kosel et al. 1998; Pyle et al. 2005; Ross et al. 2003; van der Walt et al. 2003). The consistency of results, particularly for SNCA and MAPT, suggest that the failure to reach genome-wide significance in previous studies is due to the relatively small GWAS datasets.

We have conducted a GWAS of PD at the Miami Institute for Human Genomics (MIHG) in a sample of 604 unrelated cases and 619 unrelated controls using 491,376 autosomal and sex chromosome SNPs. To increase the power of the analysis we included data from two of the three previous PD GWAS: The National Institute of Neurological Disorders and Stroke (NINDS) (Fung et al. 2006a), and the joint dataset from the Progeni/GenePD studies that was genotyped at the Center for Inherited Disease Research (CIDR) (Pankratz et al. 2009), and excluding the Mayo clinic GWAS which used discordant sibling pairs as subjects for analysis (Maraganore et al. 2005). This provided a joint analysis dataset with a combined sample size of 1,752 cases and 1,745 controls with genotypes at 422,322 SNPs after imputation and sample and SNP quality control procedures.

We demonstrate in our Caucasian-based population that the SNCA and MAPT regions are the strongest genetic contributors to PD risk, reaching genome-wide significance and establishing these factors without controversy. In addition, several genes replicated in all three datasets, but with less stringent significance. Although they did not achieve genome-wide significance in the joint analysis, the consistency of their effects makes them strong candidates and may provide additional insight into the pathological mechanisms of PD.

MATERIALS AND METHODS

Samples

Samples in the MIHG GWAS include individuals with PD collected by one of 13 ascertainment centers in the PD Genetics Collaboration (Scott et al. 2001) or by the Morris K. Udall Parkinson Disease Center of Excellence (J.M. Vance, PI) ascertainment core. These participants were recruited by participating movement disorder and neurology clinics, referrals, and advertisements. Unaffected spouse and friend controls were recruited when available and willing to participate. All participants provided written informed consent, in accord with protocols established by institutional review boards at each center.

All individuals with PD were examined by a board-certified neurologist. A neurological exam and standard clinical evaluation was performed on all participants with PD. Affected individuals exhibited at least two cardinal symptoms of PD, e.g. bradykinesia, resting tremor, and rigidity and no other causes of Parkinsonism or atypical clinical features. Unaffected individuals had no symptoms of PD upon physical examination and self-reported symptom questionnaire (Rocca et al. 1998). Individuals were excluded if there was a history of encephalitis, neuroleptic therapy within one year before diagnosis, evidence of normal pressure hydrocephalus, or a clinical course with unusual features suggesting atypical or secondary Parkinsonism. Additionally, a blood sample, family history, medical history, and standard cognitive test (Blessed Orientation Memory Concentration (BOMC) (Katzman et al. 1983) test or Modified Mini Mental Status exam (3MS) (Folstein et al. 1975)) were obtained for each individual. To ensure diagnostic consistency across sites, clinical data for all participants were reviewed by a panel consisting of a board-certified neurologist with fellowship training in movement disorders, a board certified neurologist and medical geneticist, and a certified physician assistant.

Genotyping

Genotypes for 635 PD cases and 255 PD controls were generated using the Illumina Infinium 610-quad BeadChip (Illumina, San Diego, CA, USA) and the Illumina Infinium II assay protocol (Gunderson et al. 2005). Additionally, we included 223 cognitively-normal controls with no PD symptoms by self-reported symptom questionnaire (Rocca et al. 1998) from a previous GWAS (Beecham et al. 2009) of late-onset Alzheimer disease (LOAD) genotyped using the Illumina HumanHap 550 BeadChip, and another 164 cognitively-normal controls from a second LOAD study with no self-reported PD symptoms by questionnaire (Rocca et al. 1998) genotyped using the 1M-Duo Infinium HD BeadChip. Genotypes were determined using Illumina BeadStudio Genotyping Module version 3.2.33, samples with 99% genotyping efficiency were used to redefine genotype clusters, per the manufacturer’s recommendation. Concordance of genotype calls for two CEPH samples with six replicates each was 99.98%.

Quality Control

Samples with genotyping efficiency of greater than 98% were included in subsequent QC and statistical analysis steps. One case and seven control samples were removed for low efficiency. Population stratification was assessed using Structure (Falush et al. 2003; Pritchard et al. 2000) and Eigenstrat (Price et al. 2006). For Structure analysis, 5000 independent autosomal SNPs with minor allele frequency (MAF) > 0.25 were chosen using PLINK (Purcell et al. 2007) with an r2 threshold of 0.2 using 10,000 iterations of burn-in and 15,000 iterations of estimation. These analyses indicated that no stratification was present in our sample (Supplemental Figure 1a). For Eigenstrat analysis, 30,000 independent autosomal SNPs with MAF>0.25 were used to generate plots of principal component loadings for samples and to remove outliers (6 cases, 7 controls) using the top ten principal components over five iterations with a threshold of six standard deviations (Supplemental Figure 1b).

Further quality control steps for the MIHG samples included checks for duplicated and related samples using mean identity by state (22 cases removed as duplicate samples). Two cases and nine controls were identified for excess or insufficient heterozygosity at sex chromosomes and were removed. Finally SNPs with MAF < 0.01 were removed in the combined case-control dataset (21,976 SNPs), significant deviation from Hardy-Weinberg equilibrium (HWE) in controls (78 SNPs with p<1.00×10−7), and differential missing rate between cases and controls (29 SNPs with p<1×10−5) were removed.

Since controls from the AD GWAS dataset were genotyped on different Illumina chips, we examined all controls for homogeneity of genotype frequencies. This filter is used to detect where genotyping error rates differ across studies, and is a critical QC step when multiple genotyping experiments are combined into a single data set for analysis. A 4 degree of freedom (df) Fisher’s exact chi-square test with 10,000 permutations was used at all SNPs to test for frequency differences for the three genotypes and the three genotyping chips in the study. SNPs were removed if the p-value for this test was less than 0.001 (546 SNPs). After all QC procedures, 604 cases and 619 controls with 491,376 SNPs were available for association analysis in the MIHG GWAS dataset. The sex and age distributions for these samples are described in Table 1.

Table 1.

Demographic properties of the MIHG, CIDR and NINDS samples.

MIHG CIDR NINDS
All MIHG Cases Controls All CIDR Cases Controls All NINDS Cases Controls
Male (freq) 595(0.49) 381(0.63) 214(0.34) 875(0.50) 358(0.41) 517(0.60) 339(0.54) 162(0.61) 127(0.48)
Female (freq) 631(0.51) 224(0.37) 407(0.66) 869(0.50) 523(0.59) 346(0.40) 289(0.46) 104(0.39) 136(0.52)
AAO(SD) -- 56.39(12.95) -- -- 62.34(10.32) -- -- 65.33(7.86) --
AAE(SD) 66.97(11.31) 64.56(12.18) 69.32(9.86) 70.99(9.32) 71.21(8.46) 54.83(13.08) 71.73(7.82) 71.99(6.53) 69.69(8.77)

Data were downloaded from dbGAP (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap) for the CIDR (Pankratz et al. 2009) and NINDS (Fung et al. 2006a) PD GWAS. Imputation of SNP genotypes from these and the MIHG data was performed independently using the software package Impute (Marchini et al. 2007). Samples were removed from these GWAS for genotyping efficiency < 0.98 (CIDR: 9 cases, 4 controls removed; NINDS 11 cases, 9 controls removed). SNP data for these GWAS were removed for genotyping efficiency < 0.98 (CIDR: 7,886 SNPs removed; NINDS: 16,886 SNPs removed), MAF < 0.01 (CIDR: 7,676 SNPs removed ; NINDS: 4,596 SNPs removed), HWE p < 10−7 (CIDR: 790 SNPs removed ; NINDS: 67 SNPs removed), and missing at random by disease status p < 10−5 (CIDR: 29 SNPs removed). Eigenstrat and Structure were used to ensure that the merged dataset did not contain stratified samples (Supplemental Figures 2a, b). The Fisher’s exact test for homogeneity described above was run for all 5 genotyping platforms using an 8 df test, removing SNPs for p<0.001. Imputation was performed on each dataset independently after QC filters had been applied. The remaining high-quality samples and genotypes from each study were used to impute SNPs based on the HapMap reference panel. Individual genotypes with probability of 90% or greater were included. Imputed SNPs with greater than 5% missing genotypes were excluded from analysis. Upon merging the imputed GWAS files in PLINK, Hardy-Weinberg, minor allele frequency, SNP genotyping efficiency, and tests for differential missing rate by phenotype filters were applied as described above. The final dataset for analysis contained 1,752 cases and 1,745 controls with genotypes at 498,571 genotyped and imputed SNPs.

Association analysis of the genotype data was conducted with PLINK (Purcell et al. 2007). Cochran-Armitage (Armitage 1955) trend tests were calculated at each SNP to assess allelic association. Additional analyses evaluating dominant, recessive and genotypic exposures were performed using logistic regression in PLINK. To avoid over-correcting for multiple comparisons by using the conservative Bonferroni correction, we used PLINK to generate empirically adjusted p-values based on 10,000 permutations to adjust for multiple tests (Purcell et al. 2007). Additional logistic regression analysis fitting covariates for age of onset/age at exam, sex or history of smoking was conducted to assess confounding by these variables at all SNPs. If there was an indication that an individual had reported any history of smoking, they were scored as a smoker, a non-smoker if they reported not smoking, and missing otherwise.

The population proportional attributable risk (PAR%) presented in the Abstract was calculated using the formula for retrospective studies, [(SNP allele frequency)×(OR−1)]/[1+(SNP allele frequency)×(OR−1)], using the SNP allele frequency in controls as the exposure frequency, and the odds ratio (OR) in place of the relative risk (Woodward 2005).

Assessment of associated SNPs and the presence of chromosome 17q21.31 alleles in the H1–H2 haplotype clades in MAPT was accomplished using rs1981997 as a haplotype tag SNP, because the major (G) and minor (A) alleles of this SNP are fixed in the H1 and H2 clades respectively (Stefansson et al. 2005). A 2-locus haplotype association analysis was performed using PLINK with SNPs in the MAPT region to determine which alleles were on the H1 haplotype, which has previously been associated with PD (Farrer et al. 2002; Golbe et al. 2001; Maraganore et al. 2001; Martin et al. 2001; Refenes et al. 2009; Skipper et al. 2004; Zabetian et al. 2007).

In addition to the statistical analysis of the joint sample, as a final check against cryptic bias arising due to genotyping error rate heterogenetity across studies, we evaluated association with meta-analysis techniques using METAL (Abecasis & Willer 2007).

RESULTS

In the MIHG sample no SNPs were statistically significantly associated with PD at the genome-wide level using permutation tests with a multiple testing corrected threshold of p<0.05 (Table 2). SNCA showed the strongest association in the MIHG study (SNCA intron, rs356220 p = 2.7×10−6, OR = 1.48; 95% CI [1.25–1.74]). Other top associations included SNPs in the chromosome 1p22 gene chloride channel accessory 4 (CLCA4, EGI:22802, intron, rs1543467 p = 3.0×10−6, OR = 1.51; 95% CI [1.27–1.80]), the chromosome 3q26 gene neuroligin 1 (NLGN1, EGI:22871, intron, rs976683 p = 1×10−5, OR = 1.49; 95% CI [1.25–1.78]) and the chromosome 20p11 gene solute carrier family 24, member 3 (SLC24A3, EGI:57419, intronic SNPs, rs1406968 p = 1.2×10−5, OR = 1.47; 95% CI [1.24–1.75]; rs4813368 p = 1.2×10−5, OR = 1.48; 95% CI [1.24–1.76]; r2 for rs1406968:rs4813368 = 0.94). Neither sex nor age-at-onset (age-at-exam in controls) were confounders for the top associations in the MIHG sample.

Table 2.

Trend test of association with PD results from the MIHG GWAS sample and exact Hardy-Weinberg test results for all SNPs associated with PD at p<5×10−5 or less.

SNP rs# Trend p HWE p Chromosome Gene
rs356220 2.67×10−6 0.930 4q22.1 SNCA
rs1543467 2.97×10−6 0.117 1p22.3 CLCA4
rs12063142 5.02×10−6 0.468 1p36.13 --
rs9513249 5.92×10−6 0.011 13q32.1 --
rs12870589 9.45×10−6 0.002 13q32.2 --
rs976683 1.04×10−5 0.529 3q26.31 NLGN1
rs1816879 1.16×10−5 0.784 15q22.1 --
rs1406968 1.16×10−5 0.705 20p11.23 SLC24A3
rs4813368 1.22×10−5 0.701 20p11.23 SLC24A3
rs7646773 1.30×10−5 0.865 3p11.2 --
rs1992695 1.34×10−5 0.422 4q28.3 --
rs7322222 1.35×10−5 0.061 13q32.1 --
rs10851073 1.44×10−5 0.001 13q32.2 --
rs4238458 1.52×10−5 0.407 15q24.1 --
rs11625012 1.61×10−5 0.507 14q22.1 --
rs7698161 1.76×10−5 0.738 4q28.3 --
rs7496513 1.85×10−5 0.402 15q24.1 --
rs135066 1.91×10−5 1.000 22q13.2 MPPED1
rs13032621 1.95×10−5 1.000 2q37.2 --
rs9457743 2.03×10−5 0.931 6q25.3 --
rs1159278 2.21×10−5 0.025 13q32.1 RAP2A
rs12142266 2.31×10−5 0.406 1p22.3 --
rs358079 2.99×10−5 0.794 3p14.3 CACNA2D3
rs13411180 3.09×10−5 0.832 2q31.2 ZNF533
rs13009601 3.13×10−5 0.848 2q31.2 ZNF533
rs6930229 3.14×10−5 1.000 6q25.3 --
rs1515274 3.39×10−5 0.850 2q31.2 ZNF533
rs13157 3.52×10−5 1.000 1p36.11 RUNX3
rs929708 3.70×10−5 0.378 3p25.3 ATP2B2
rs10882088 3.89×10−5 0.612 10q23.33 KIF11
rs11833635 4.09×10−5 0.743 12q15 PTPRR
rs6798732 4.51×10−5 0.736 3p11.2 --
rs4777585 4.57×10−5 0.496 15q24.1 NEO1
rs12938031 4.61×10−5 1.000 17q21.31 CRHR1
rs6598020 4.63×10−5 0.270 11p15.5 TMEM16J
rs4927602 4.79×10−5 0.936 2p25.3 SNTG2

Two SNPs were statistically significant after empirical adjustment for multiple comparisons (permutation p<0.05) in the joint analysis of the MIHG, CIDR and NINDS samples (Table 3, Supplemental Table 1); notably these SNPs were genotyped, not imputed, in all three datasets. These SNPs are in the genes pleckstrin homology domain-containing protein, family M, member 1 (_PLEKHM1_, EGI:9842, promoter, rs11012 p = 5.6×10−8, OR = 0.71; 95% CI [0.62–0.79]), and alpha-synuclein (_SNCA_ intron, rs2736990 p = 6.7×10−8, OR = 1.30; 95% CI [1.18–1.43]). In addition to rs2736990 in _SNCA_, there were two SNPs with 9×10−5>p-value>1×10−5 (rs11931074, rs356220), two SNPs with 9×10−4>p-value>l1×10−4 (rs365188, rs1866995), two with 9×10−3>p-value>1×10−3 (rs2583985, rs3775439) as well as three more SNPs with p<0.05 (rs3857059, rs12502363, rs3822095), out of 15 total SNCA SNPs. The SNP rs356220, which was the most associated SNP from the MIHG study, had a p-value of 9.7×10−5 in the joint sample. The LD in SNCA is weaker than that at chromosome 17q21.31, due to the large inversion there which inhibits recombination. The 3’ terminus of PLEKHM1 is 400 kilobases from the 3’ terminus of MAPT on chromosome 17, and rs11012 is in linkage disequilibrium (LD; mean r2 = 0.75; Supplemental Figure 3) with SNPs in MAPT. Strong association (p<1×10−5) with PD spans the entire region around MAPT, including SNPs in intramembrane protease 5 (IMP5 exonic SNPs, rs12373139 p = 2.9×10−6, OR = 0.75; 95% CI [0.67–0.85]; rs12185268 p = 3.6×10−6 OR = 0.76; 95% CI [0.67–0.85]), MAPT intron (rs8070723 p = 4.5×10−6, OR = 0.76; 95% CI [0.67–0.85]) and chromosome 17 open reading frame 69 (C17orf69 exon:Y132C, rs393152 p = 3.5×10−6, OR = 0.76; 95% CI [0.67–0.85]). The smallest-to-largest ranking of the p-values for association with these SNPs and PD varied greatly within the three datasets (rs11012 ranked 290 in MIHG, 11 in CIDR and 78,027 in NINDS; rs2736990 ranked 38 in MIHG, 440 in CIDR and 41,827 in NINDS) illustrating that real associations might be deep within the ranked associations in underpowered GWAS studies, and increased sample size will help true association rise to the top. Neither sex nor age-at-onset (age-at-exam in controls) were confounders for the top associations in the joint analysis. Additionally, no SNP was significant after correction for multiple tests in the dominant, recessive or genotypic analyses.

Table 3.

Results for SNPs with p < 1×10−6 from the joint analysis with imputation of the MIHG, CIDR and NINDS GWAS data.

SNP Chromosome Gene Type PM PC PN P3 Empirical p
rs11012 17q21.31 PLEKHM1 GGG 6.58×10−4 4.40×10−5 0.188 5.65×10−8 0.011
rs2736990 4q22.1 SNCA GGG 5.08×10−5 8.67×10−4 0.102 6.74×10−8 0.014
rs12063142 1p36.13 TAS1R2 GGG 6.41×10−6 0.032 0.033 4.83×10−7 0.159
rs4837628 9q33.1 DBC1 GGG 4.83×10−3 5.66×10−5 0.984 1.03×10−6 0.281
rs11248060 4p16.3 DGKQ GGG 0.244 2.05×10−5 0.005 2.26×10−6 0.509
rs1981997 17q21.31 MAPT GGG 4.96×10−3 1.37×10−4 0.733 2.45×10−6 0.539
rs12373139 17q21.31 IMP5 GGI 4.96×10−3 1.92×10−4 0.349 2.85×10−6 0.602
rs10464059 5q35.3 -- III 1.86×10−4 4.78×10−4 0.939 2.91×10−6 0.610
rs13355682 5q35.3 -- III 1.86×10−4 4.78×10−4 0.888 3.16×10−6 0.640
rs393152 17q21.31 C17orf69 GGG 5.86×10−3 2.58×10−4 0.281 3.46×10−6 0.674
rs12185268 17q21.31 IMP5 GGG 4.96×10−3 2.52×10−4 0.349 3.62×10−6 0.689
rs8070723 17q21.31 MAPT GGG 5.86×10−3 2.30×10−4 0.392 4.49×10−6 0.723
rs611199 9q33.1 DBC1 GGG 5.43×10−3 9.61×10−4 0.212 7.65×10−6 0.908
rs17080196 5q35.3 GFPT2 III 8.35×10−4 1.52×10−3 0.633 7.73×10−6 0.911
rs1635291 17q21.31 -- GGG 7.27×10−3 1.89×10−4 0.517 7.76×10−6 0.912
rs7703402 5q35.3 GFPT2 III 1.57×10−4 1.06×10−3 0.192 8.41×10−6 0.936
rs6864729 5q35.3 GFPT2 III 8.35×10−4 1.52×10−3 0.696 8.87×10−6 0.945
rs2303012 5q35.3 GFPT2 III 9.09×10−4 1.52×10−3 0.696 9.49×10−6 0.951
rs974002 14q13.1 NPAS3 GGG 6.29×10−3 7.37×10−3 0.017 9.85×10−6 0.954

Haplotype analysis with associated chromosome 17q21.31 SNPs and the H1–H2 haplotype clade tag SNP rs1981997 revealed high D’ and r2 values for all SNPs in the region, and alleles at those SNPs with positive effect sizes were in strong LD with the G allele of rs1981997, indicating their presence in the H1 haplotype clade (Supplemental Table 3, Supplemental Figure 3).

Additional strong association signals in the joint data were observed in SNPs in the chromosome 9q33 gene deleted in bladder cancer 1 (DBC1, EGI:1620, intron, rs4837628 p = 1.07×10−6, OR = 0.79; 95% CI [0.72–0.87]), the chromosome 14q13 gene neuronal PAS domain protein 3 (NPAS3, EGI:64067, intron, rs974002 p = 9.9×10−6, OR = 1.32; 95% CI [1.17–1.49]), and in imputed SNPs in the chromosome 5q35 gene glucose fructose-6-phosphate transaminase 2 (GFPT2, EGI:9945, intronic SNPs, rs17080196 p = 7.7×10−6, OR = 0.77; 95% CI 0.69–0.87; rs7703402 (promoter) p = 8.4×10−6, OR = 0.76; 95% CI [0.66–0.85]; rs6864729 p = 8.9×10−6, OR = 0.77; 95% CI [0.69–0.87]; rs2303012 9.5×10−6, OR = 0.77; 95% CI [0.69–0.87]; Table 3, Supplemental Table 1). However, none of these associations survive multiple testing adjustments.

Weaker associations in the joint data that replicated at an uncorrected p≤0.05 with consistent effect directions in each of the three GWAS were observed at several loci (Table 4, Supplemental Table 2). These associations were observed at rs12063142 in the chromosome 1p36.13 gene taste receptor 1, member 2 (TAS1R2, intron, rs12063142 p=4.83×10−7, OR = 0.76; 95% CI [0.69–0.85]), rs974002 in NPAS3, the chromosome 8q21 gene matrix metallopeptidase 16 (MMP16, EGI:4325, intron, rs3851539 p = 3.8×10−5, OR = 1.21; 95% CI [1.11–1.34]), chromosome 12q24 gene kinase suppressor of ras 2 (KSR2, EGI:283455, intron, rs7960736 p = 6.3×10−5, OR = 1.22; 95% CI [1.11–1.34]), chromosome 14q24 SNP rs11159221 (p = 1×10−4, OR = 0.79; 95% CI [0.69–0.89]), chromosome 2q31 gene WAS/WASL-interacting protein family, member1 (WASPIP, EGI:7456, intron, rs1991601 p = 3×10−4, OR = 1.23; 95% CI [1.09–1.38]), chromosome 11q32 gene family with sequence similarity 55, family A (FAM55A, EGI:120400, intron, rs1080074 p = 4×10−4, OR = 1.29; 95% CI [1.12–1.48]), and the chromosome 15q22 gene RAR-related orphan receptor A (RORA, EGI:6095, intron, rs1863270 p = 5×10−4, OR = 1.19; 95% CI [1.08–1.33]).

Table 4.

Association results that replicate in the MIHG, CIDR and NINDS studies at the p<0.05 level with effects in the same direction. All SNPs were genotyped.

SNP Chromosome Gene PM PC PN P3
rs12063142 1p36.13 TAS1R2 6.41×10−6 0.032 0.033 4.83×10−7
rs974002 14q13.1 NPAS3 6.32×10−3 7.39×10−3 0.017 9.85×10−6
rs3851539 8q21.3 MMP16 7.86×10−3 0.015 0.034 3.81×10−5
rs7960736 12q24.23 KSR2 0.026 0.022 6.94×10−3 6.26×10−5
rs11159221 14q24.3 -- 0.024 0.045 5.21×10−3 1.32×10−4
rs1991601 2q31.1 WIPF1 0.019 0.043 0.043 3.26×10−4
rs1080074 11q23.2 FAM55A 0.038 0.036 0.035 3.77×10−4
rs1863270 15q22.2 RORA 0.047 0.044 0.032 5.33×10−4

Results from each GWAS and the joint analysis for SNPs in the top 10 genes from the PDgene database (www.pdgene.org) are summarized in Table 5. The PDgene database is a free online meta-analysis summary of PD genetic association studies. No associations with p<2.00×10−3 were observed in genes other than SNCA and MAPT. The strongest association among all SNPs in these genes was in monoamine oxidase B (MAOB intron, rs209766 p = 0.0026, OR = 0.86; 95% CI [0.76–0.96]). Weaker associations were also detected in leucine-rich repeat kinase 2 (LRRK2, EGI:120892, intron, rs1907632 p = 0.0321, OR = 1.15; 95% CI [1.01–1.30]), USP24 (intron, rs12065953 p = 0.0078, OR = 1.46; 95% CI [1.10–1.94]), and ELAVL4 (intron, rs17105974 p = 0.0418, OR = 0.82; 95% CI [0.65–0.99]).

Table 5.

Summary of results for the top 10 genes in order from PDgene (June 2009) for each GWAS dataset for the SNP in each gene with the smallest joint analysis p-value.

Gene SNP PM PC PN P3 #SNPs in gene Gene Size (bp)
GBA rs9628662 0.694 0.165 0.581 0.153 1 10,245
LRRK2 rs1907632 0.423 0.255 0.023 0.032 42 144,273
SNCA rs2736990 5.08×10−5 8.67×10−4 0.102 6.70×10−8 15 112,876
PINK1 rs2296223 0.546 0.031 0.247 0.212 2 18,056
MAPT rs1981997 4.96×10−3 1.37×10−4 0.733 2.50×10−6 15 134,002
USP24 rs12065953 8.96×10−3 0.195 0.129 7.81×10−3 2 149,006
CYP2D6 rs769258 0.572 0.848 0.033 0.574 1 4,382
MAOB rs209766 0.031 0.172 0.056 0.003 31 115,864
ELAVL4 rs17105974 3.39×10−3 0.059 0.142 0.042 10 153,854
APOE rs439401 0.568 0.411 0.829 0.41 2 3,611

Results from the meta-analysis of the results from the three GWAS were consistent with those observed in the joint analysis (Pearson’s correlation coefficient for p-values from the two analyses = 0.95). The SNPs rs11012 (meta-analysis p-value = 7.33×10−8) in the chromosome 17q21.31 region and rs2736990 (meta-analysis p-value = 7.88×10−8) in SNCA were the most strongly associated SNPs in the meta-analysis.

DISCUSSION

This study unambiguously identifies SNCA and MAPT as risk factors for idiopathic PD. The power to declare association at genome-wide levels of significance in any of these GWAS studies is low for the effects observed in this study; however, in aggregate these datasets provide conclusive evidence for major risk genes and highlight other genes of interest. Our study illustrates the utility of freely available online data resources and large collaborative studies that are better powered to detect associations with modest effects in GWAS data. It is only with joint analysis of three independent GWAS datasets that SNPs in the SNCA and the MAPT region reached genome-wide significance. The exact mechanisms by which variation in SNCA and MAPT influence risk for PD are unknown (Devine & Lewis 2008). Joint analysis with trend tests of allelic association also suggests other candidates that show strong, but not genome-wide evidence of association: DBC1, NPAS3, and GFPT2.

The SNCA gene product α-synuclein (AS) is an abundant brain protein which is localized in axon terminals where it may mediate synaptic processes (Cabin et al. 2002). PD is considered a synucleinopathy in which aggregations of precipitated filamentous AS (which form Lewy bodies) are a common finding in PD brains at autopsy (Kosaka et al. 1976; Lewy 1912; Okazaki et al. 1961; Tretiakoff 1919; Spillantini et al. 1997). Mutations in SNCA have been shown to cosegregate with PD with an autosomal dominant mode of inheritance (Chartier-Harlin et al. 2004; Ibanez et al. 2004; Kruger et al. 1998; Polymeropoulos et al. 1997; Singleton et al. 2003; Zarranz et al. 2004). Other variation in SNCA has been reported to increase the risk of PD (Kruger et al. 1999).

The MAPT protein (tau) is critical for the assembly and stabilization of the microtubule network, which is essential for axonal transport in neurons (Garcia & Cleveland 2001). Neurological disorders where aggregations of tau are observed in brain tissue are known as tauopathies. The most common tauopathy is AD, where hyperphosphorylated tau accumulates in intraneuronal neurofibrillary tangles (NFTs) (Goedert et al. 1989). Additionally, progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD) are neurodegenerative diseases characterized by tau deposition in neurons and glia (Dickson et al. 2002; Litvan et al. 1996) and parkinsonism. A further set of disorders referred to as frontotemporal dementia and parkinsonism linked to chromosome 17 (FTDP-17) is also characterized by the abnormal deposition of tau (Ingram & Spillantini 2002). Haplotypes in the chromosome 17q21.31 region have been associated with progressive supranuclear palsy (PSP), corticobasal degeneration (CBD) (Pittman et al. 2005), and PD (Farrer et al. 2002; Golbe et al. 2001; Healy et al. 2004; Maraganore et al. 2001; Martin et al. 2001; Refenes et al. 2009; Skipper et al. 2004; Zabetian et al. 2007). In the current study we report haplotypic associations of alleles on the H1 haplotype clade which is consistent with previous investigations of PD, PSP, and CBD. The H1 clade has also been shown to have higher expression levels of tau than H2 (Kwok et al. 2004), which may provide some insight into PD pathology as previous studies have shown that tau mediates and promotes the polymerization of AS (Giasson et al. 2003).

The gene Diacylglycerol kinase, theta (DGKQ) was previously observed to be associated with PD in the Pankratz et al. study (Pankratz et al. 2008). This gene is thought to be active in the phosphatidylinositol signaling pathway and is expressed in the brain.

Interestingly, associations between SNPs in NPAS3 and smoking cessation have been observed in two of three GWAS datasets investigating genetic differences between the ability of individuals to quit smoking (Uhl et al. 2008). Given the known relationship between smoking and PD (Allam et al. 2004; DORN 1959), this gene may merit further investigation in PD studies. We saw no evidence of confounding by or interaction with history of smoking at NPAS3 SNPs (data not shown).

The MIHG data and the CIDR data were analyzed for interactions with and confounding by history of smoking. Although no SNPs were significant after correction for multiple tests (data not shown), these and other exposures may explain or modify genetic susceptibilities to PD. Previous research in PD families has observed interactions with smoking and the genes nitric oxide synthase 2A (NOS2A) (Hancock et al. 2006), glutathione S-transferase omega 1 (GSTO1) (Wahner et al. 2007) and SNCA (McCulloch et al. 2008). This study only included two NOS2A SNPs, neither of which were included in the previous study, and which do not provide sufficient coverage of the gene to evaluate the previous findings of interaction. Similarly, the report of interaction with smoking with the Rep1 variant in SNCA was not replicable with the SNPs in this study. The interaction with history of smoking and GSTO1 at rs4925 was not replicated.

DBC1 is a gene that has been observed to be deleted in some bladder cancer cell lines (Habuchi et al. 1998). The DBC1 gene product is detectable in several tissues, including brain, spinal cord, cortex and cerebellum (http://www.genecards.org/). The DBC1 protein inhibits cell proliferation by negative regulation of the G1/S transition (Nishiyama et al. 2001). Additionally, this protein mediates non-apoptotic cell death (Wright et al. 2004), and is a regulator of components of the plasminogen pathway (Louhelainen et al. 2006). The relationship between PD and DBC1 is not obvious based on the known biology of this gene; however, the cell-death phenotypes for this gene in sensitive brain regions might play a role in PD pathophysiology.

It should also be noted that the GFPT2 results were obtained by imputation in the three GWAS samples used here. These genotypes are likely to be assigned with higher error rates than assayed SNPs. GFPT2 is the rate-limiting enzyme for the entry of glucose into the hexosamine biosynthesis pathway and is thus a biologically important gene (Oki et al. 1999). Energy production int eh central nervous system is a mixture of glycolysis and electron transport in mitochondrion. As mitochondria are clearly affected in PD (Beal 2007; Schapira 2008), enzymes such as GFPT2 may become increasingly important in energy production and thereby cell survival. Additionally, variants in this gene have been associated with type II diabetes mellitus (T2DM) and diabetic nephropathy in Caucasian and African American samples (Zhang et al. 2004); T2DM has been suggested as a risk factor for PD (Driver et al. 2008; Hu et al. 2007).

Despite the intriguing results of our study, there are some limitations. The error rate of imputed genotypes is likely to be higher than that of assayed genotypes, although this effect should be mild for imputed genotypes with high posterior probabilities (Marchini et al. 2007). An additional possible underlying source of heterogeneity is subtle differences in the PD trait definition used by the three studies. It is possible that PD is composed of several subphenotypes, each with a distinct genetic etiology. This analysis would have optimal power to detect only associations that underlie the common features of the trait definitions for all three GWAS. The power of the joint analysis is also not optimal for the magnitude of effects seen in this study (≥80% power for the combined sample at α=1×10−7 and MAF 0.25 for odds ratios <0.59 or >1.59; ≥80% power for the MIHG sample at α=1×10−7 and MAF 0.25 for odds ratios <0.39 or >2.14); however, with the addition of larger PD studies more genes might be conclusively associated. These results support the meta-analysis of Pankratz et al. of the CIDR GWAS results and the NINDS GWAS results to the extent that the MAPT region is prominently represented in the top associations (Pankratz et al. 2009); however, the overall similarity among top hits is sparse. A final limitation is the lack of detailed covariate data across datasets for exposures such as smoking, pesticide exposure and coffee drinking for model adjustment and gene×environment interaction analysis. This is an important point, as the value of these datasets would increase greatly with the inclusion of well-documented environmental and subphenotype data, such as diabetes status, depression and dementia. Future studies of PD should include important environmental factors in the data collection protocol.

In conclusion, SNCA and MAPT are significantly associated at a genome-wide level with idiopathic PD. Several other biologically plausible genes are associated with PD but do not meet the genome-wide significance threshold. PD is a complex phenotype with substantial variation in age of onset and clinical course. It is possible that more genes are associated with subsets of PD cases or are associated with more modest effects than those detected here. These results suggest that observing additional PD-gene associations will require pooling datasets to achieve very large samples of thousands of cases and controls to obtain statistically significant p-values for loci of more modest effects after correction for multiple tests. Collaborative projects with coordinated ascertainment are necessary to advance PD genetic epidemiology.

Supplementary Material

Supp Materials

ACKNOWLEDGEMENTS

We are grateful to the patients and control subjects who participated in this study. We thank the members of the PD Genetics Collaboration: Martha A. Nance, Ray L. Watts, Jean P. Hubble, William C. Koller, Kelly Lyons, Rajesh Pahwa, Matthew B. Stern, Amy Colcher, Bradley C. Hiner, Joseph Jankovic, William G Ondo, Fred H. Allen, Jr., Christopher G. Goetz, Gary W. Small, Donna Masterman, Frank Mastaglia, and Jonathan L. Haines who contributed subjects to this study. Some of the samples used in this study were collected while the Udall PDRCE was at Duke University. This work was supported by National Institutes of Health grant NS39764. We also thank the investigators from the Pankratz et al (accession number: phs000126.v1.p1), and the Fung et al (accession number: phs000089.v1.p1) studies for making their data available on dbGAP.

Footnotes

Conflicts of Interest: None

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Materials