Association mapping in structured populations - PubMed (original) (raw)

Association mapping in structured populations

J K Pritchard et al. Am J Hum Genet. 2000 Jul.

Abstract

The use, in association studies, of the forthcoming dense genomewide collection of single-nucleotide polymorphisms (SNPs) has been heralded as a potential breakthrough in the study of the genetic basis of common complex disorders. A serious problem with association mapping is that population structure can lead to spurious associations between a candidate marker and a phenotype. One common solution has been to abandon case-control studies in favor of family-based tests of association, such as the transmission/disequilibrium test (TDT), but this comes at a considerable cost in the need to collect DNA from close relatives of affected individuals. In this article we describe a novel, statistically valid, method for case-control association studies in structured populations. Our method uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations. It provides power comparable with the TDT in many settings and may substantially outperform it if there are conflicting associations in different subpopulations.

PubMed Disclaimer

Figures

Figure  1

Figure 1

Summary of estimates of the ancestry of individuals in data set A (discrete populations). The dashed and solid lines show histograms of estimated values of

q(i)1

(the proportion of ancestry of individual i in subpopulation 1) for individuals from subpopulations 1 and 2, respectively. Using this data set, the classification of individuals to subpopulations is essentially perfect.

Figure  2

Figure 2

Summary of estimates of the ancestry of individuals in the population with recent admixture (data set B). The peaks, from left to right, represent histograms of estimated values of

q(i)1

(the proportion of ancestry of individual i in subpopulation 1) for individuals with 0, 1, 2, 3, and 4 grandparents, respectively, in subpopulation 1. The solid lines show the results for the entire sample, while the dotted lines show the results for those individuals who are affected. These data were simulated under a model in which an individual’s risk of disease increases with their amount of ancestry in subpopulation 1, with the result that affected individuals tend to have a greater proportion of ancestry in subpopulation 1 than do controls. This effect can be seen here, as the proportion of affected individuals in each peak increases from left to right. Our test for association uses the estimated

q(i)

to control for the presence of biased sampling.

Figure  3

Figure 3

Plot of estimated q against actual ancestry for one realization of data set C. This data set is intended to be a rough model of an African American population, with an average of 20% European admixture. Controls are marked by a plus sign (+) and cases by an unblackened circle (○). It is assumed that the disease of interest is more common among individuals with substantial European ancestry, and, hence, the distribution of cases is shifted toward the right, relative to controls.

Figure  4

Figure 4

Cumulative plot of estimated P values across the 120 loci for three realizations of data set C. The three solid lines show the distribution of P values obtained using a χ2 test which ignores population structure. These indicate an excess of small P values. The three dashed lines show the distributions of P values obtained using STRAT. These fall close to the diagonal (which is the ideal distribution) or (in one case) appear slightly conservative.

Similar articles

Cited by

References

    1. Boehnke M, Langefeld CD (1998) Genetic association mapping based on discordant sib pairs: the discordant-alleles test. Am J Hum Genet 62:950–961 - PMC - PubMed
    1. Cooper G, Amos W, Bellamy R, Siddiqui MR, Frodsham A, Hill A, Rubinsztein D (1999) An empirical exploration of the (δμ)2 genetic distance for 213 human microsatellite markers. Am J Hum Genet 65:1125–1133 - PMC - PubMed
    1. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
    1. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004 - PubMed
    1. Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–464 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources