Inclusion of a priori information in genome-wide association analysis (original) (raw)
Related papers
Effects of covariates and interactions on a genome-wide association analysis of rheumatoid arthritis
BMC Proceedings, 2009
While genetic and environmental factors and their interactions influence susceptibility to rheumatoid arthritis (RA), causative genetic variants have not been identified. The purpose of the present study was to assess the effects of covariates and genotype × sex interactions on the genome-wide association analysis (GWAA) of RA using Genetic Analysis Workshop 16 Problem 1 data and a logistic regression approach as implemented in PLINK. After accounting for the effects of population stratification, effects of covariates and genotype × sex interactions on the GWAA of RA were assessed by conducting association and interaction analyses. We found significant allelic associations, covariate, and genotype × sex interaction effects on RA. Several top single-nucleotide polymorphisms (SNPs) (~22 SNPs) showed significant associations with strong p-values (p < 1 × 10-4-p < 1 × 10-24). Only three SNPs on chromosomes 4, 13, and 20 were significant after Bonferroni correction, and none of these three SNPs showed significant genotype × sex interactions. Of the 30 top SNPs with significant (p < 1 × 10-4-p < 1 × 10-6) interactions,~23 SNPs showed additive interactions and~5 SNPs showed only dominance interactions. Those SNPs showing significant associations in the regular logistic regression failed to show significant interactions. In contrast, the SNPs that showed significant interactions failed to show significant associations in models that did not incorporate interactions. It is important to consider interactions of genotype × sex in addition to associations in a GWAA of RA. Furthermore, the association between SNPs and RA susceptibility varies significantly between men and women.
The power of genome-wide association studies can be improved by incorporating information from previous study findings, for example, results of genome-wide linkage analyses. Weighted falsediscovery rate (FDR) control can incorporate genome-wide linkage scan results into the analysis of genome-wide association data by assigning single-nucleotide polymorphism (SNP) specific weights. Stratified FDR control can also be applied by stratifying the SNPs into high and low linkage strata. We applied these two FDR control methods to the data of North American Rheumatoid Arthritis Consortium (NARAC) study and the Framingham Heart Study (FHS), combining both association and linkage analysis results. For the NARAC study, we used linkage results from a previous genome scan of rheumatoid arthritis (RA) phenotype. For the FHS study, we obtained genome-wide linkage scores from the same 550 k SNP data used for the association analyses of three lipids phenotypes (HDL, LDL, TG). We confirmed some genes previously reported for association with RA and lipid phenotypes. Stratified and weighted FDR methods appear to give improved ranks to some of the replicated SNPs for the RA data, suggesting linkage scan results could provide useful information to improve genome-wide association studies.
A new gene-based association test for genome-wide association studies
BMC Proceedings, 2009
Genome-wide association studies are widely used today to discover genetic factors that modify the risk of complex diseases. Usually, these methods work in a SNP-by-SNP fashion. We present a gene-based test that can be applied in the context of genome-wide association studies. We compare both strategies, SNP-based and gene-based, in a sample of cases and controls for rheumatoid arthritis. We
BMC proceedings, 2009
Genome-wide association studies often involve testing hundreds of thousands of single-nucleotide polymorphisms (SNPs). These tests may be highly correlated because of linkage disequilibrium among SNPs. Multiple testing correction ignoring the correlation among markers, as is done in the Bonferroni procedure, can cause loss of power. Several multiple testing adjustment methods accounting for correlations among tests have been developed and have shown improved power compared to the Bonferroni procedure. These methods include a Monte Carlo (MC) method and a method of computing p-values adjusted for correlated tests. The objective of this study is to apply these two multiple testing methods to genome-wide association study of the Genetic Analysis Workshop 16 rheumatoid arthritis data from the North American Rheumatoid Arthritis Consortium, to compare the performance of these two methods to the Bonferroni procedure in identifying susceptibility loci underlying rheumatoid arthritis, and t...
Causal graph-based analysis of genome-wide association data in rheumatoid arthritis
Biology Direct, 2011
Background: GWAS owe their popularity to the expectation that they will make a major impact on diagnosis, prognosis and management of disease by uncovering genetics underlying clinical phenotypes. The dominant paradigm in GWAS data analysis so far consists of extensive reliance on methods that emphasize contribution of individual SNPs to statistical association with phenotypes. Multivariate methods, however, can extract more information by considering associations of multiple SNPs simultaneously. Recent advances in other genomics domains pinpoint multivariate causal graph-based inference as a promising principled analysis framework for highthroughput data. Designed to discover biomarkers in the local causal pathway of the phenotype, these methods lead to accurate and highly parsimonious multivariate predictive models. In this paper, we investigate the applicability of causal graph-based method TIE* to analysis of GWAS data. To test the utility of TIE*, we focus on anti-CCP positive rheumatoid arthritis (RA) GWAS datasets, where there is a general consensus in the community about the major genetic determinants of the disease. Results: Application of TIE* to the North American Rheumatoid Arthritis Cohort (NARAC) GWAS data results in six SNPs, mostly from the MHC locus. Using these SNPs we develop two predictive models that can classify cases and disease-free controls with an accuracy of 0.81 area under the ROC curve, as verified in independent testing data from the same cohort. The predictive performance of these models generalizes reasonably well to Swedish subjects from the closely related but not identical Epidemiological Investigation of Rheumatoid Arthritis (EIRA) cohort with 0.71-0.78 area under the ROC curve. Moreover, the SNPs identified by the TIE* method render many other previously known SNP associations conditionally independent of the phenotype. Conclusions: Our experiments demonstrate that application of TIE* captures maximum amount of genetic information about RA in the data and recapitulates the major consensus findings about the genetic factors of this disease. In addition, TIE* yields reproducible markers and signatures of RA. This suggests that principled multivariate causal and predictive framework for GWAS analysis empowers the community with a new tool for high-quality and more efficient discovery.
BMC proceedings, 2009
The availability of very large number of markers by modern technology makes genome-wide association studies very popular. The usual approach is to test single-nucleotide polymorphisms (SNPs) one at a time for association with disease status. However, it may not be possible to detect marginally significant effects by single-SNP analysis. Simultaneous analysis of SNPs enables detection of even those SNPs with small effect by evaluating the collective impact of several neighboring SNPs. Also, false-positive signals may be weakened by the presence of other neighboring SNPs included in the analysis. We analyzed the North American Rheumatoid Arthritis Consortium data of Genetic Analysis Workshop 16 using HLasso, a new method for simultaneous analysis of SNPs. The simultaneous analysis approach has excellent control of type I error, and many of the previously reported results of single-SNP analyses were confirmed by this approach.
Genome-wide association analysis of rheumatoid arthritis data via haplotype sharing
BMC Proceedings, 2009
We present computationally simple association tests based on haplotype sharing that can be easily applied to genome-wide association studies, while allowing use of fast (but not likelihood-based) haplotyping algorithms, and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype sharing analyses that adjust for population stratification. We apply our methods to a genome-wide association study of rheumatoid arthritis available as Problem 1 of Genetic Analysis Workshop 16. In addition to the HLA region on chromosome 6, we find genome-wide significant signals at 7q33 and 13q31.3. These regions contain genes with interesting potential connections with rheumatoid arthritis and are not identified using single single-nucleotide polymorphism methods.
Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data
BMC Proceedings, 2009
For Genetic Analysis Workshop 16 Problem 1, we provided data for genome-wide association analysis of rheumatoid arthritis. Single-nucleotide polymorphism (SNP) genotype data were provided for 868 cases and 1194 controls that had been assayed using an Illumina 550 k platform. In addition, phenotypic data were provided from genotyping DRB1 alleles, which were classified according to the rheumatoid arthritis shared epitope, levels of anti-cyclic citrullinated peptide, and levels of rheumatoid factor IgM. Several questions could be addressed using the data, including analysis of genetic associations using single SNPs or haplotypes, as well as gene-gene and genetic analysis of SNPs for qualitative and quantitative factors.
PLoS ONE, 2013
Genome-wide association studies (GWAS) led to the identification of numerous novel loci for a number of complex diseases. Pathway-based approaches using genotypic data provide tangible leads which cannot be identified by single marker approaches as implemented in GWAS. The available pathway analysis approaches mainly differ in the employed databases and in the applied statistics for determining the significance of the associated disease markers. So far, pathway-based approaches using GWAS data failed to consider the overlapping of genes among different pathways or the influence of protein-interactions. We performed a multistage integrative pathway (MIP) analysis on three common diseases -Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) -incorporating genotypic, pathway, protein-and domaininteraction data to identify novel associations between these diseases and pathways. Additionally, we assessed the sensitivity of our method by studying the influence of the most significant SNPs on the pathway analysis by removing those and comparing the corresponding pathway analysis results. Apart from confirming many previously published associations between pathways and RA, CD and T1D, our MIP approach was able to identify three new associations between disease phenotypes and pathways. This includes a relation between the influenza-A pathway and RA, as well as a relation between T1D and the phagosome and toxoplasmosis pathways. These results provide new leads to understand the molecular underpinnings of these diseases. The developed software herein used is available at