Association mapping in structured populations - PubMed (original) (raw)
Association mapping in structured populations
J K Pritchard et al. Am J Hum Genet. 2000 Jul.
Abstract
The use, in association studies, of the forthcoming dense genomewide collection of single-nucleotide polymorphisms (SNPs) has been heralded as a potential breakthrough in the study of the genetic basis of common complex disorders. A serious problem with association mapping is that population structure can lead to spurious associations between a candidate marker and a phenotype. One common solution has been to abandon case-control studies in favor of family-based tests of association, such as the transmission/disequilibrium test (TDT), but this comes at a considerable cost in the need to collect DNA from close relatives of affected individuals. In this article we describe a novel, statistically valid, method for case-control association studies in structured populations. Our method uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations. It provides power comparable with the TDT in many settings and may substantially outperform it if there are conflicting associations in different subpopulations.
Figures
Figure 1
Summary of estimates of the ancestry of individuals in data set A (discrete populations). The dashed and solid lines show histograms of estimated values of
q(i)1
(the proportion of ancestry of individual i in subpopulation 1) for individuals from subpopulations 1 and 2, respectively. Using this data set, the classification of individuals to subpopulations is essentially perfect.
Figure 2
Summary of estimates of the ancestry of individuals in the population with recent admixture (data set B). The peaks, from left to right, represent histograms of estimated values of
q(i)1
(the proportion of ancestry of individual i in subpopulation 1) for individuals with 0, 1, 2, 3, and 4 grandparents, respectively, in subpopulation 1. The solid lines show the results for the entire sample, while the dotted lines show the results for those individuals who are affected. These data were simulated under a model in which an individual’s risk of disease increases with their amount of ancestry in subpopulation 1, with the result that affected individuals tend to have a greater proportion of ancestry in subpopulation 1 than do controls. This effect can be seen here, as the proportion of affected individuals in each peak increases from left to right. Our test for association uses the estimated
q(i)
to control for the presence of biased sampling.
Figure 3
Plot of estimated q against actual ancestry for one realization of data set C. This data set is intended to be a rough model of an African American population, with an average of 20% European admixture. Controls are marked by a plus sign (+) and cases by an unblackened circle (○). It is assumed that the disease of interest is more common among individuals with substantial European ancestry, and, hence, the distribution of cases is shifted toward the right, relative to controls.
Figure 4
Cumulative plot of estimated P values across the 120 loci for three realizations of data set C. The three solid lines show the distribution of P values obtained using a χ2 test which ignores population structure. These indicate an excess of small P values. The three dashed lines show the distributions of P values obtained using STRAT. These fall close to the diagonal (which is the ideal distribution) or (in one case) appear slightly conservative.
Similar articles
- A test for linkage and association in general pedigrees: the pedigree disequilibrium test.
Martin ER, Monks SA, Warren LL, Kaplan NL. Martin ER, et al. Am J Hum Genet. 2000 Jul;67(1):146-54. doi: 10.1086/302957. Epub 2000 May 23. Am J Hum Genet. 2000. PMID: 10825280 Free PMC article. - The problems of using the transmission/disequilibrium test to infer tight linkage.
Whittaker JC, Denham MC, Morris AP. Whittaker JC, et al. Am J Hum Genet. 2000 Aug;67(2):523-6. doi: 10.1086/303007. Epub 2000 Jun 16. Am J Hum Genet. 2000. PMID: 10858328 Free PMC article. - Identity by descent between distant relatives: detection and applications.
Browning SR, Browning BL. Browning SR, et al. Annu Rev Genet. 2012;46:617-33. doi: 10.1146/annurev-genet-110711-155534. Epub 2012 Sep 17. Annu Rev Genet. 2012. PMID: 22994355 Review. - On selecting markers for association studies: patterns of linkage disequilibrium between two and three diallelic loci.
Garner C, Slatkin M. Garner C, et al. Genet Epidemiol. 2003 Jan;24(1):57-67. doi: 10.1002/gepi.10217. Genet Epidemiol. 2003. PMID: 12508256 Review.
Cited by
- The genetic architecture of the load linked to dominant and recessive self-incompatibility alleles in Arabidopsis halleri and Arabidopsis lyrata.
Le Veve A, Genete M, Lepers-Blassiau C, Ponitzki C, Poux C, Vekemans X, Durand E, Castric V. Le Veve A, et al. Elife. 2024 Sep 2;13:RP94972. doi: 10.7554/eLife.94972. Elife. 2024. PMID: 39222005 Free PMC article. - Association Mapping of Seed Coat Color Characteristics for Near-Isogenic Lines of Colored Waxy Maize Using Simple Sequence Repeat Markers.
Heo TH, Park H, Kim NW, Cho J, Mo C, Ryu SH, Choi JK, Park KJ, Sa KJ, Lee JK. Heo TH, et al. Plants (Basel). 2024 Aug 1;13(15):2126. doi: 10.3390/plants13152126. Plants (Basel). 2024. PMID: 39124244 Free PMC article. - Diversity of ecotypes of five species of ryegrass from Northwestern Spain by phenotypic traits and microsatellites.
Fernández-Otero CI, Ramos-Cabrer AM, Pereira-Lorenzo S. Fernández-Otero CI, et al. BMC Plant Biol. 2024 Aug 3;24(1):740. doi: 10.1186/s12870-024-05440-7. BMC Plant Biol. 2024. PMID: 39095701 Free PMC article. - Genome-wide association studies of body size traits in Tibetan sheep.
Liu D, Li X, Wang L, Pei Q, Zhao J, Sun D, Ren Q, Tian D, Han B, Jiang H, Zhang W, Wang S, Tian F, Liu S, Zhao K. Liu D, et al. BMC Genomics. 2024 Jul 30;25(1):739. doi: 10.1186/s12864-024-10633-3. BMC Genomics. 2024. PMID: 39080522 Free PMC article. - A historical stepping-stone path for an island-colonizing cactus across a submerged "bridge" archipelago.
Franco FF, Amaral DT, Bonatelli IAS, Meek JB, Moraes EM, Zappi DC, Taylor NP, Eaton DAR. Franco FF, et al. Heredity (Edinb). 2024 Jun;132(6):296-308. doi: 10.1038/s41437-024-00683-4. Epub 2024 Apr 18. Heredity (Edinb). 2024. PMID: 38637723
References
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
- Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases