Detecting genome-wide epistases based on the clustering of relatively frequent items - PubMed (original) (raw)

Detecting genome-wide epistases based on the clustering of relatively frequent items

Minzhu Xie et al. Bioinformatics. 2012.

Abstract

Motivation: In genome-wide association studies (GWAS), up to millions of single nucleotide polymorphisms (SNPs) are genotyped for thousands of individuals. However, conventional single locus-based approaches are usually unable to detect gene-gene interactions underlying complex diseases. Due to the huge search space for complicated high order interactions, many existing multi-locus approaches are slow and may suffer from low detection power for GWAS.

Results: In this article, we develop a simple, fast and effective algorithm to detect genome-wide multi-locus epistatic interactions based on the clustering of relatively frequent items. Extensive experiments on simulated data show that our algorithm is fast and more powerful in general than some recently proposed methods. On a real genome-wide case-control dataset for age-related macular degeneration (AMD), the algorithm has identified genotype combinations that are significantly enriched in the cases.

Availability: http://www.cs.ucr.edu/\~minzhux/EDCF.zip

Contact: minzhux@cs.ucr.edu; jingli@cwru.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

False positive rates under the null model. The plot in (a) shows the false positive rates of EDCF using different α0s for different _d_s, and the plots in (b) and (c) show the false positive rates of EDCF and BOOST when the sample size (b) and the number of SNPs (c) vary.

Fig. 2.

Fig. 2.

Performance comparison of EDCF and MB-MDR on four disease models for different allele frequencies. The sample size is 800 individuals including 400 cases and 400 controls and the LD level r_2=1. The black, red and green bars show the power of EDCF when α_s is set to be 0.01, 0.05 and 0.3, respectively. The blue bars show the power of MB-MDR.

Fig. 3.

Fig. 3.

Performance comparison of EDCF, BOOST, SNPRuler, epiMODE and ChiSQ on four disease models for different allele frequencies, sample sizes and LD levels. The black, red, green, blue and cyan bars show the powers of EDCF, BOOST, SNPRuler, epiMODE and ChiSQ. respectively. The absence of a bar indicates no power. (a) Model 1; (b) Model 2; (c) Model 3; (d) Model 4.

Fig. 4.

Fig. 4.

Performance comparison on two 3-loci epistasis models. (a) Model 5 with some marginal effects. (b) Model 6 without marginal effects. The black, red and green bars show the powers of EDCF, SNPRuler and epiMODE respectively. The absence of a bar indicates no power.

Similar articles

Cited by

References

    1. Altshuler D., et al. Genetic mapping in human disease. Science. 2008;322:881–888. - PMC - PubMed
    1. Cattaert T., et al. Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann. Hum. Genet. 2011;75:78–89. - PMC - PubMed
    1. Cordell H.J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 2002;11:2463–2468. - PubMed
    1. Cordell H.J. Detecting gene-gene interactions that underlie human diseases. Nat. Rev. Genet. 2009;10:392–404. - PMC - PubMed
    1. Culverhouse R., et al. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 2002;70:461–471. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources