Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions - PubMed (original) (raw)

Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions

Casey S Greene et al. BioData Min. 2009.

Abstract

Background: Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).

Results: SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.

Conclusion: Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from http://www.epistasis.org.

PubMed Disclaimer

Figures

Figure 1

Figure 1

How Relief, ReliefF and SURF select neighbors. Each panel in this figure shows the genotypes at two markers for a dataset of cases and controls. For the purpose of this example only these two markers will be considered and both are continuous. When analyzing real data, the process of selecting neighbors is the same, however, but there will be thousands of discrete valued markers (SNPs) each of which would be represented by one of thousands of dimensions. The individual for whom neighbors are being found is shown by the filled red circle. The neighbors that each approach uses for weighting are highlighted in blue. Parts A, B, and C represent how Relief, ReliefF and SURF would select neighbors to be used in weighting. Relief selects the nearest individual of the same class (blue circle) and the nearest individual of the other class (blue cross). ReliefF selects some user specified number of individuals (two in this example) to be used for weighting. SURF, instead of using a fixed number of neighbors, uses all individuals within a distance threshold. The dotted line shows a hypothetical distance threshold.

Figure 2

Figure 2

Example Success Rate and Significance of Differences. Part A shows the detailed success rate analysis results for a single heritability (0.1) and sample size (1600). The success rate to filter both relevant SNPs into percentiles from the 99_th_ to 50_th_ is shown. The 99 th_percentile corresponds to the top 10 SNPs by the assigned weights in these datasets which contain 1000 SNPs. In part B pairwise comparisons are made between each pair of methods at the 99_th, 95_th_, and 75_th_ percentiles. ReliefF, SURF, TuRF, and SURF&TuRF are labeled R, S, T, and ST respectively. Significance is illustrated with levels of grey (i.e. light grey indicates 0.01 <p ≤ 0.05, dark grey indicates 0.001 <p ≤ 0.01, and black indicates p ≤ 0.001). As an example, at the 99_th_ percentile the blank square at the intersection of R and S indicates that the difference between ReliefF and SURF was not significant. On the other hand the black square at the intersection of S and ST indicates that the difference between the success rates of SURF and SURF&TuRF at that percentile was highly significant.

Figure 3

Figure 3

Success Rate Analysis. This is a summary of success rate as shown in figure 2 across a wide range of sample sizes and heritabilities. Within each heritability the success rates for all five genetic models for that heritability are averaged. The x_-axis for each plot corresponds to the percentiles as in figure 2. Across these situations, SuRF alone performs as well as TuRF when filtering to the 75_th percentile of SNPs. SURF outperforms ReliefF, the tuned approaches outperform the non-tuned approaches when using a more stringent filter (i.e. 99_th_ and 95_th_ percentiles), and SURF & TuRF outperforms TuRF with ReliefF.

Similar articles

Cited by

References

    1. Iles MM. What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease? PLoS Genet. 2008;4:e33. doi: 10.1371/journal.pgen.0040033. - DOI - PMC - PubMed
    1. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. doi: 10.1038/nrg2344. - DOI - PubMed
    1. Hardy J, Singleton A. Genomewide Association Studies and Human Disease. N Engl J Med. 2009;360:1759–1768. doi: 10.1056/NEJMra0808700. - DOI - PMC - PubMed
    1. Kraft P, Hunter DJ. Genetic Risk Prediction - Are We There Yet? N Engl J Med. 2009;360:1701–1703. doi: 10.1056/NEJMp0810107. - DOI - PubMed
    1. Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genet. 2009;5:e1000337. doi: 10.1371/journal.pgen.1000337. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources