Sequence features in regions of weak and strong linkage disequilibrium (original) (raw)
- Albert V. Smith1,2,5,
- Daryl J. Thomas3,5,6,
- Heather M. Munro4, and
- Gonçalo R. Abecasis4
- 1 Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
- 2 Genthor ehf., 101 Reykjavik, Iceland
- 3 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
- 4 Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, USA
Abstract
We use genotype data generated by the International HapMap Project to dissect the relationship between sequence features and the degree of linkage disequilibrium in the genome. We show that variation in linkage disequilibrium is broadly similar across populations and examine sequence landscape in regions of strong and weak disequilibrium. Linkage disequilibrium is generally low within ∼15 Mb of the telomeres of each chromosome and noticeably elevated in large, duplicated regions of the genome as well as within ∼5 Mb of centromeres and other heterochromatic regions. At a broad scale (100–1000 kb resolution), our results show that regions of strong linkage disequilibrium are typically GC poor and have reduced polymorphism. In addition, these regions are enriched for LINE repeats, but have fewer SINE, DNA, and simple repeats than the rest of the genome. At a fine scale, we examine the sequence composition of “hotspots” for the rapid breakdown of linkage disequilibrium and show that they are enriched in SINEs, in simple repeats, and in sequences that are conserved between species. Regions of high and low linkage disequilibrium (the top and bottom quartiles of the genome) have a higher density of genes and coding bases than the rest of the genome. Closer examination of the data shows that whereas some types of genes (including genes involved in immune response and sensory perception) are typically located in regions of low linkage disequilibrium, other genes (including those involved in DNA and RNA metabolism, response to DNA damage, and the cell cycle) are preferentially located in regions of strong linkage disequilibrium. Our results provide a detailed analysis of the relationship between sequence features and linkage disequilibrium and suggest an evolutionary justification for the heterogeneity in linkage disequilibrium in the genome.
Footnotes
[Supplemental material is available at www.genome.org. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: J. Mullikin, G. McVean, and C. Freeman.]
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4421405\. Freely available online through the Genome Research Immediate Open Access option.
↵5 These two authors contributed equally to this work.
↵6 Corresponding author. E-mail daryl{at}soe.ucsc.edu; fax (831) 459-1809.
- Accepted September 7, 2005.
- Received July 12, 2005.
Cold Spring Harbor Laboratory Press