A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms - PubMed (original) (raw)

doi: 10.1086/429393. Epub 2005 Mar 1.

Emily C Walsh, Xiayi Ke, Marcos Delgado, Mark Griffiths, Sarah Hunt, Jonathan Morrison, Pamela Whittaker, Eric S Lander, Lon R Cardon, David R Bentley, John D Rioux, Stephan Beck, Panos Deloukas

Affiliations

A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms

Marcos M Miretti et al. Am J Hum Genet. 2005 Apr.

Abstract

Autoimmune, inflammatory, and infectious diseases present a major burden to human health and are frequently associated with loci in the human major histocompatibility complex (MHC). Here, we report a high-resolution (1.9 kb) linkage-disequilibrium (LD) map of a 4.46-Mb fragment containing the MHC in U.S. pedigrees with northern and western European ancestry collected by the Centre d'Etude du Polymorphisme Humain (CEPH) and the first generation of haplotype tag single-nucleotide polymorphisms (tagSNPs) that provide up to a fivefold increase in genotyping efficiency for all future MHC-linked disease-association studies. The data confirm previously identified recombination hotspots in the class II region and allow the prediction of numerous novel hotspots in the class I and class III regions. The region of longest LD maps outside the classic MHC to the extended class I region spanning the MHC-linked olfactory-receptor gene cluster. The extended haplotype homozygosity analysis for recent positive selection shows that all 14 outlying haplotype variants map to a single extended haplotype, which most commonly bears HLA-DRB1*1501. The SNP data, haplotype blocks, and tagSNPs analysis reported here have been entered into a multidimensional Web-based database (GLOVAR), where they can be accessed and viewed in the context of relevant genome annotation. This LD map allowed us to give coordinates for the extremely variable LD structure underlying the MHC.

PubMed Disclaimer

Figures

Figure  1

Figure 1

Low-heterozygosity regions in the MHC. The observed heterozygosity (“Ho,” red line) averaged in 50-kb windows is plotted with the number of loci with MAF <5% (green area). SNPs showing MAF <5% are not randomly distributed across the analyzed 4.459-Mb region. Specifically, three regions (black arrows) presented loss of heterozygosity; most SNPs are monomorphic in this population. These fragments contain OR2H1 and MAS1L; DHX16, NRM, MDC1 TUBB, and FLOT1; BAT5, LY6GD, C6orf25, DDAH2, C6orf26, and VARS2, respectively. The uneven distribution of the 770 SNPs that failed to generate genotyping data is represented by the blue line (“failed,” number of SNPs in 50-kb windows). Typing failures are concentrated mainly in three genomic regions (blue peaks), including HLA-A, HLA-B, HLA-C, HLA-DQB1, and HLA-DRB1, which suggests that the highly polymorphic nature of these genes might be responsible for the failure.

Figure  2

Figure 2

SW plot of average

_r_2

across the MHC. Average

_r_2

was calculated from 25 kb to 250 kb in 500-kb SWs, with 50-kb increments between windows. The SW plot captures trends of LD (

_r_2

) by averaging

_r_2

values between a given marker and all the SNPs up to 250 kb. Avoiding pairwise comparisons with surrounding markers (25 kb each side) excludes the raising effect of closely linked loci on LD. MHC extended class I, class I, class III, and class II regions (blue, yellow, orange, and green, respectively) present comparatively distinct variation patterns of long-range LD, which is reflected in the haplotype-block analysis and interferes in the SNP-tagging process.

Figure  3

Figure 3

Decay of LD as a function of distance. The decay rates represented by average

_r_2

(A) and D′ (B) for all marker pairs separated by distance S (

_S_=10

kb, 20 kb, 30 kb,…500 kb). Line colors represent different MHC subregions and chromosome 20 genotyping data from CEPH families (Ke et al. 2004_b_). Whereas LD decay values averaged across the whole MHC region are comparable with the chromosome 20 LD decay, the extended class I and classical class II regions show distinct slopes consistent with the underlying LD structure observed in fig. 2.

Figure  4

Figure 4

LD structure across the MHC. A, Distribution of haplotype blocks across the MHC region, as viewed in the GLOVAR genome browser. Haplotype blocks, according to criteria of Gabriel et al. (2002) implemented by Haploview 2.05 (Haploview Web site), are represented by red bars. Each bar corresponds to an individual haplotype block comprising a number of SNPs (red marks), which are located according their map position. This enables an accurate interpretation of the LD-block distribution, size, and gaps in the context of additional genomic features, such as gene annotation, SNP density, and physical distance. The distribution of tagSNPs selected in this work is generated in the GLOVAR genome browser and is indicated by a green track under the haplotype blocks. B, High-resolution view of 720 kb of the extended MHC class I region, as represented by GOLDsurfer 3D view of D′ values (Pettersson et al. 2004). This region contains a large cluster of olfactory-receptor genes in high LD (540 kb), interrupted by a single recombination hotspot between OR12D3 and OR12D2. This long-range LD region includes 13 contiguous haplotype blocks, according to the criteria of Gabriel et al. (2002) (see corresponding inset in panel A). C, View of the LD structure (D′ values) within the MHC class II region in which experimental evidence for recombination hotspots have been described elsewhere (Cullen et al. ; Jeffreys et al. 2001). High-LD areas (red blocks) are separated by recombination hotspots. The first three LD breaks correspond to recombination hotspots mapped at TAP2 and HLA-DMB and between BRD2 and HLA-DOA (Cullen et al. ; Jeffreys et al. 2001). Another LD break is visualized between HLA-DOA and HLA-DPA1.

Figure  5

Figure 5

Recombination-rate variation across the MHC. Recombination rates estimated from population-genetic data are far from being uniform; their distribution fluctuates considerably in both, by scale (cM/Mb) and by map position. Recombination hotspots are represented by peaks 10 times higher than the local background level of recombination. Peaks enclosed in the inset correspond to hotspots identified by sperm typing (Jeffreys et al. 2001) located at or near TAP2, HLA-DMB, BRD2, and HLA-DOA, observable as LD breaks in figure 4_C_ . The recombination hotspot between OR12D3 and OR12D2 in the olfactory-receptor gene cluster (“ORs”), inferred from population genetic data, correlates perfectly with the LD break visible in figure 4_B_ . Note the presence of two coldspots showing recombination rates 10 times lower than the local background level of recombination.

Figure  6

Figure 6

EHH outliers at 0.3 cM and 0.25 cM. A, EHH by frequency plot at 0.3cM, indicating the eight outliers at that distance. B, EHH by frequency plot at 0.25cM, indicating six outliers at that distance. C, Physical mapping of outlying variants. A subset of genes in the region is shown for reference. Haplotype variants with extended LD are indicated in orange and with numbers that correlate to their position in the EHH by frequency plots in panels A and B.

Figure  7

Figure 7

Representative EHH outlier. A, EHH by frequency plot at 0.3 cM distance, indicating one of the eight variants at that an outlier with distance that is >4.5 SD in its frequency bin and per its

frequency_×_EHH

statistic. This variant is indicated in figure 6 (panel 9) and is part of a block slightly centromeric to the DQB1 gene. B, Haplotype structure of the block containing this haplotype variant. The outlier is the 37% allele (“GGGT”) indicated in orange in all remaining plots (arrow). C, EHH by distance (cM) plot of all variants in the block. The arrow indicates the region of extended LD for the orange haplotype. D, Haplotype bifurcation plots of all variants in the block. The arrow indicates the region of interest for the orange haplotype.

References

Electronic-Database Information

    1. Center for Statistical Genetics, http://www.sph.umich.edu/csg/abecasis/PedStats/ (for PEDSTATS)
    1. dbSNP Home Page, http://www.ncbi.nlm.nih.gov/SNP/index.html
    1. GLOVAR Genome Browser, http://www.glovar.org/Homo_sapiens/
    1. Haploview, http://www.broad.mit.edu/mpg/haploview/index.php
    1. Human Chromosome 6 Project Overview, http://www.sanger.ac.uk/HGP/Chr6/

References

    1. Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) MERLIN—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97–101 - PubMed
    1. Abecasis GR, Cookson WOC (2000) GOLD—graphical overview of linkage disequilibrium. Bioinformatics 16:182–183 - PubMed
    1. Ahmad T, Neville M, Marshall SE, Armuzzi A, Mulcahy-Hawes K, Crawshaw J, Sato H, Ling K-L, Barnardo M, Goldthorpe S, Walton R, Bunce M, Jewell DP, Welsh KI (2003) Haplotype-specific linkage disequilibrium patterns define the genetic topography of the human MHC. Hum Mol Genet 12:647–656 - PubMed
    1. Barcellos LF, Oksenberg JR, Begovich AB, Martin ER, Schmidt S, Vittinghoff E, Goodin DS, Pelletier D, Lincoln RR, Bucher P, Swerdlin A, Perick-Vance MA, Haines JL, Hauser SL, for the Multiple Sclerosis Genetics Group (2003) HLA-DR2 dose effect on susceptibility to multiple sclerosis and influence on disease course. Am J Hum Genet 72:710–716 - PMC - PubMed
    1. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120 - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources