Linkage Disequilibrium: Ancient History Drives the New Genetics (original) (raw)
Related papers
Linkage disequilibrium patterns vary substantially among populations
European Journal of Human Genetics, 2005
A major initiative to create a global human haplotype map has recently been launched as a tool to improve the efficiency of disease gene mapping. The 'HapMap' project will study common variants in depth in four (and to a lesser degree in up to 12) populations to catalogue haplotypes that are expected to be common to all populations. A hope of the 'HapMap' project is that much of the genome occurs in regions of limited diversity such that only a few of the SNPs in each region will capture the diversity and be relevant around the world. In order to explore the implications of studying only a limited number of populations, we have analyzed linkage disequilibrium (LD) patterns of three 175-320 kb genomic regions in 16 diverse populations with an emphasis on African and European populations. Analyses of these three genomic regions provide empiric demonstration of marked differences in frequencies of the same few haplotypes, resulting in differences in the amount of LD and very different sets of haplotype frequencies. These results highlight the distinction between the statistical concept of LD and the biological reality of haplotypes and their frequencies. The significant quantitative and qualitative variation in LD among populations, even for populations within a geographic region, emphasizes the importance of studying diverse populations in the HapMap project to assure broad applicability of the results.
Allelic association with SNPs: metrics, populations, and the linkage disequilibrium map
Human Mutation, 2001
Comparison of different metrics, using three large samples of haplotypes from different populations, demonstrates that ρ is the most efficient measure of association between pairs of single nucleotide polymorphisms (SNPs). Pairwise data can be modeled, using composite likelihood, to describe the decline in linkage disequilibrium with distance (the Malecot model). The evidence from more isolated populations (Finland, Sardinia) suggests that linkage disequilibrium extends to 427–893 kb but, even in samples representative of large heterogeneous populations, such as CEPH, the extent is 385 kb or greater. This suggests that isolated populations are not essential for linkage disequilibrium mapping of common diseases with SNPs. The ∈ parameter of the Malecot model (recombination and time), evaluated at each SNP, indicates regions of the genome with extensive and less extensive disequilibrium (low and high values of ∈ respectively). When plotted against the physical map, the regions with extensive and less extensive linkage disequilibrium may correspond to recombination cold and hot spots. This is discussed in relation to the Xq25 cytogenetic band and the HFE gene region. Hum Mutat 17:255–262, 2001. © 2001 Wiley-Liss, Inc.
Linkage disequilibrium in human populations
Proceedings of the National Academy of Sciences, 2003
Whereas the human linkage map appears on limited evidence to be constant over populations, maps of linkage disequilibrium (LD) vary among populations that differ in gene history. The greatest difference is between populations of sub-Saharan origin and populations remotely derived from Africa after a major bottleneck that reduced their heterozygosity and altered their Malecot parameters, increasing the intercept M that reflects association in founders and decreasing the exponential decline. Variation among populations within this ethnic dichotomy is much smaller. These observations validate use of a cosmopolitan LD map based on a sizeable sample representing a large population reliably typed for markers at high density. Then an LD map for a region or isolate within an ethnic group may be created by fitting the sample LD to the cosmopolitan map, estimating Malecot parameters simultaneously. The cosmopolitan map scaled by recovers 95% of the information that a local map at the same density gives and therefore more than the information in a low-resolution local map. Relative to a Eurasian cosmopolitan map the scaling factors are estimated to be 0.82 for isolates of European descent, 1.53 for Yorubans, and 1.74 for African Americans. These observations are consistent with a common bottleneck (perhaps but not necessarily speciation) Ϸ173,500 years ago, if the bottleneck associated with migration out of Africa was 100,000 years ago. Eurasian populations (especially isolates with numerous cases) are efficient for genome scans, and populations of recent African origin (such as African Americans) are efficient for identification of causal polymorphisms within a candidate sequence.
The Extent of Linkage Disequilibrium in Four Populations with Distinct Demographic Histories
The American Journal of Human Genetics, 2000
The design and feasibility of whole-genome-association studies are critically dependent on the extent of linkage disequilibrium (LD) between markers. Although there has been extensive theoretical discussion of this, few empirical data exist. The authors have determined the extent of LD among 38 biallelic markers with minor allele frequencies 1.1, since these are most comparable to the common disease-susceptibility polymorphisms that association studies aim to detect. The markers come from three chromosomal regions-1,335 kb on chromosome 13q12-13, 380 kb on chromosome 19q13.2, and 120 kb on chromosome 22q13.3-which have been extensively mapped. These markers were examined in ∼1,600 individuals from four populations, all of European origin but with different demographic histories; Afrikaners, Ashkenazim, Finns, and East Anglian British. There are few differences, either in allele frequencies or in LD, among the populations studied. A similar inverse relationship was found between LD and distance in each genomic region and in each population. Mean D is .68 for marker pairs !5 kb apart and is .24 for pairs separated by 10-20 kb, and the level of LD is not different from that seen in unlinked marker pairs separated by 1500 kb. However, only 50% of marker pairs at distances !5 kb display sufficient LD ( ) to be D 1 .3 useful in association studies. Results of the present study, if representative of the whole genome, suggest that a whole-genome scan searching for common disease-susceptibility alleles would require markers spaced р5 kb apart.
Molecular Biology and Evolution, 2003
At present there is tremendous interest in characterizing the magnitude and distribution of linkage disequilibrium (LD) throughout the human genome, which will provide the necessary foundation for genome-wide LD analyses and facilitate detailed evolutionary studies. To this end, a human high-density single-nucleotide polymorphism (SNP) marker map has been constructed. Many of the SNPs on this map, however, were identified by sampling a small number of chromosomes from a single population, and inferences drawn from studies using such SNPs may be influenced by ascertainment bias (AB). Through extensive simulations, we have found that AB is a potentially significant problem in estimating and comparing LD within and between populations. Specifically, the magnitude of AB is a function of the SNP discovery strategy, number of chromosomes used for SNP discovery, population genetic characteristics of the particular genomic region considered, amount of gene flow between populations, and demographic history of the populations. We demonstrate that a balanced SNP discovery strategy (where equal numbers of chromosomes are sampled from multiple subpopulations) is the optimal study design for generating broadly applicable SNP resources. Finally, we validate our theoretical predictions by comparing our results to publicly available data from ten genes sequenced in 24 African American and 23 European American individuals.
Linkage disequilibrium in young genetically isolated Dutch population
European Journal of Human Genetics, 2004
The design and feasibility of genetic studies of complex diseases are critically dependent on the extent and distribution of linkage disequilibrium (LD) across the genome and between different populations. We have examined genomewide and region-specific LD in a young genetically isolated population identified in the Netherlands by genotyping approximately 800 Short Tandem Repeat markers distributed genomewide across 58 individuals. Several regions were analyzed further using a denser marker map. The permutationcorrected measure of LD was used for analysis. A significant (Po0.0004) relation between LD and genetic distance on a genomewide scale was found. Distance explained 4% of the total LD variation. For finemapping data, distance accounted for a larger proportion of LD variation (up to 39%). A notable similarity in the genomewide distribution of LD was revealed between this population and other young genetically isolated populations from Micronesia and Costa Rica. Our study population and experiment was simulated in silico to confirm our knowledge of the history of the population. High agreement was observed between results of analysis of simulated and empirical data. We conclude that our population shows a high level of LD similar to that demonstrated previously in other young genetic isolates. In Europe, there may be a large number of young genetically isolated populations that are similar in history to ours. In these populations, a similar degree of LD is expected and thus they may be effectively used for linkage or LD mapping.
Human Heredity, 2006
Objective: Analyze the information contained in homozygous haplotypes detected with high density genotyping. Methods: We analyze the genotypes of ϳ 2,500 markers on chr 22 in 12 population samples, each including 200 individuals. We develop a measure of disequilibrium based on haplotype homozygosity and an algorithm to identify ge-nomic segments characterized by non-random homozygosity (NRH), taking into account allele frequencies, missing data, genotyping error, and linkage disequilibrium. Results: We show how our measure of linkage disequilibrium based on homozygosity leads to results comparable to those of R 2 , as well as the importance of correcting for small sample variation when evaluating D . We observe that the regions that harbor NRH segments tend to be consistent across populations, are gene rich, and are characterized by lower recombination. Conclusions: It is crucial to take into account LD patterns when interpreting long stretches of homozygous markers.