On the analysis of sequence data: testing for disease susceptibility loci using patterns of linkage disequilibrium (original) (raw)

2011, Genetic Epidemiology

Despite the numerous and successful applications of genome-wide association studies (GWASs), there has been a lot of difficulty in discovering disease susceptibility loci (DSLs). This is due to the fact that the GWAS approach is an indirect mapping technique, often identifying markers. For the identification of DSLs, which is required for the understanding of the genetic pathways for complex diseases, sequencing data that examines every genetic locus directly is necessary. Yet, there is currently a lack of methodology targeted at the identification of the DSLs in sequencing data: existing methods localize the causal variant to a region but not to a single variant, and therefore do not allow one to identify unique loci that cause the phenotype association. Here, we have developed such a method to determine if there is evidence that an individual loci affects case/control status with sequencing data. This methodology differs from other rare variant approaches: rather than testing an entire region comprised of many loci for association with the phenotype, we can identify the individual genetic locus that causes the association between the phenotype and the genetic region. For each variant, the test determines if the pattern of linkage disequilibrium (LD) across the other variants coincides with the pattern expected if that variant were a DSL. Power simulations show that the method successfully detects the causal variant, distinguishing it from other nearby variants (in high LD with the causal variant), and outperforms the standard tests. The efficiency of the method is especially apparent with small samples, which are currently realistic for studies due to sequencing data costs. The practical relevance of the approach is illustrated by an application to a sequencing dataset for nonsyndromic cleft lip with or without cleft palate. The proposed method implicated one variant (P 5 0.002, 0.062 after Bonferroni correction), which was not found by standard analyses. Code for implementation is available. Genet. Epidemiol. 35:880-886, 2011.