Genome-wide comparisons of variation in linkage disequilibrium - PubMed (original) (raw)

Comparative Study

. 2009 Oct;19(10):1849-60.

doi: 10.1101/gr.092189.109. Epub 2009 Jun 18.

Affiliations

Comparative Study

Genome-wide comparisons of variation in linkage disequilibrium

Yik Y Teo et al. Genome Res. 2009 Oct.

Abstract

Current genome-wide surveys of common diseases and complex traits fundamentally aim to detect indirect associations where the single nucleotide polymorphisms (SNPs) carrying the association signals are not biologically active but are in linkage disequilibrium (LD) with some unknown functional polymorphisms. Reproducing any novel discoveries from these genome-wide scans in independent studies is now a prerequisite for the putative findings to be accepted. Significant differences in patterns of LD between populations can affect the portability of phenotypic associations when the replication effort or meta-analyses are attempted in populations that are distinct from the original population in which the genome-wide study is performed. Here, we introduce a novel method for genome-wide analyses of LD variations between populations that allow the identification of candidate regions with different patterns of LD. The evidence of LD variation provided by the introduced method correlated with the degree of differences in the frequencies of the most common haplotype across the populations. Identified regions also resulted in greater variation in the success of replication attempts compared with random regions in the genome. A separate permutation strategy introduced for assessing LD variation in the absence of genome-wide data also correctly identified the expected variation in LD patterns in two well-established regions undergoing strong population-specific evolutionary pressure. Importantly, this method addresses whether a failure to reproduce a disease association in a disparate population is due to underlying differences in LD structure with an unknown functional polymorphism, which is vital in the current climate of replicating and fine-mapping established findings from genome-wide association studies.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Comparisons across different window sizes L. Comparisons of the standardized scores for regions identified in our analysis of LD differences between HapMap CEU vs. WTCCC 58C with different numbers of SNPs in each window. Four separate analyses were run with L = 25, 50, 100, and 200 SNPs, respectively, where comparisons were made against the regions identified with L = 50. For each of the regions identified for L = 50, we noted the maximum standardized varLD scores in this region in the analyses with L = 25 (A), 100 (B), and 200 (C). Each point in the figures represents a region identified in the original analysis with L = 50. The size and shade of each point indicates the relative size of the region, with larger circles and darker shades of gray indicating larger regions. (Black shading) Regions with sizes >500 kb.

Figure 2.

Figure 2.

LD variation at the NRG1 gene on chromosome 8. (Upper panel) Standardized varLD scores across the region encapsulating the NRG1 gene. (Red points) LD comparisons between HapMap Europeans (CEU) and HapMap Asians (CHB and JPT); (purple points) LD comparisons between HapMap Europeans (CEU) and HapMap Africans (YRI); (cyan points) LD comparisons between HapMap Africans (YRI) and HapMap Asians (CHB and JPT). (Dotted lines) Values of the corresponding thresholds. (Middle panel) Fine-scale recombination rates in the region from the combined HapMap samples. Positions of genes in the region shown in the bottom panel were obtained from Ensembl. All coordinates shown are in NCBI Build 35 (dbSNP build 125).

Figure 3.

Figure 3.

Differences in statistical evidence at the associated SNP in CEU and CHB+JPT. Comparison of the −log10 _P_-value from a test of association between 2000 simulated cases and 2000 simulated controls at an associated SNP in each of the HapMap CEU and CHB+JPT populations. For each SNP, the larger −log10 _P_-value is set as the baseline and is mapped to zero, and we only plot the difference of the −log10 _P_-values. The regions are then ranked from left to right by increasing the degree of the difference in statistical evidence between CEU and CHB+JPT. (A) Three hundred randomly selected regions that have been identified by varLD to be in the top fifth percentile of the genome-wide distribution. (B) Three hundred regions that have been randomly selected across the genome, where each region spans an identical physical distance to one of the 300 varLD-identified regions from A. (Green circles) Differential statistical evidence observed in the CEU; (red circles) differential statistical evidence observed in the CHB+JPT.

Figure 4.

Figure 4.

Heatmap representations of LD in two genomic regions between pairs of populations in HapMap. The upper left and lower right triangles of each plot correspond to the LD in a region for each of two populations, respectively, as measured by the pairwise _r_2 metric, with the plots in the first column comparing HapMap Europeans with HapMap Asians, the second column comparing HapMap Europeans with HapMap Africans, and the last column comparing HapMap Africans with HapMap Asians. The plots in the first row depict the same genomic region on chromosome 2 of 136.26 Mb–136.38 Mb spanning the LCT gene, while the plots in the second row depict the genomic region on chromosome 1 of 155.9 Mb–156.0 Mb spanning the DARC gene.

Figure 5.

Figure 5.

Standardized varLD scores across different population pairs in established regions undergoing positive natural selection or containing high haplotype diversity. The standardized varLD signals for each population pair are shown, and only scores above their respective 95th quantiles are illustrated in a nongray color. (Red points) LD comparisons between HapMap Europeans (CEU) and HapMap Asians (CHB and JPT); (purple points) LD comparisons between HapMap Europeans (CEU) and HapMap Africans (YRI); (cyan points) LD comparisons between HapMap Africans (YRI) and HapMap Asians (CHB and JPT); (green points) LD comparisons between two European populations (HapMap CEU vs. WTCCC 58C); (blue points) LD comparisons between two African populations (HapMap YRI vs. the Gambian Jola). The four regions considered contain the LCT gene in chromosome 2 undergoing selection in European populations (A), the SLC24A5 gene in chromosome 15 reported for association with skin pigmentation in Europeans (B), the HBB gene in chromosome 11 with well-documented haplotypic differences between the two populations considered (C), and the highly polymorphic MHC region in chromosome 6 (D). (Dotted lines) Approximate start and end positions of the gene/region in each panel.

Figure 6.

Figure 6.

Imputation diagnostics and standardized varLD scores. Comparison of the standardized varLD score against imputation diagnostics generated by IMPUTE when the HapMap YRI is used as a reference panel against Gambian Jola data. The imputation algorithm calculates a measure of information and a confidence score based on the average maximum posterior probability, which we used as surrogates of imputation accuracy. A composite measure of imputation accuracy as measured by the product of call rate and genotype concordance is calculated for the 10 deciles of varLD scores found in the top 20th percentile of the genome-wide distribution of varLD scores. As concordance is measured as the proportion of agreement between the imputed and observed genotypes for the Gambian Jola samples, we only consider autosomal SNPs on the Affymetrix array that are found in the regions identified by varLD.

Figure 7.

Figure 7.

Genotype assignment and hybridization intensity profiles of a SNP in a region containing deletions. The two axes represent the fluorescence intensities that indicate the extent of hybridization to the two possible alleles of a biallelic SNP, which have been generically defined as alleles A and B. Solid circles in red, green, blue, and gray indicate samples whose genotypes have been assigned as AA, AB, BB, and NULL (missing), respectively. (Dashed ellipses) Intensity profiles that correspond to homozygous deletion (gray), hemizygous A deletion (light green), hemizygous B deletion (purple), genotype AA (red), genotype AB (dark green), and genotype BB (blue). The figure illustrates that samples with hemizygous deletions have been erroneously assigned to homozygous genotypes, while samples with homozygous deletions have been classified as missing.

Similar articles

Cited by

References

    1. Bentires-Alj M, Paez JG, David FS, Keilhack H, Halmos B, Naoki K, Maris JM, Richardson A, Bardelli A, Sugarbaker DJ, et al. Activating mutations of the Noonan syndrome-associated SHP2/PTPN11 gene in human solid tumors and adult acute myelogenous leukemia. Cancer Res. 2004;64:8816–8820. - PubMed
    1. Bersaglieri T, Sabeti PC, Patterson N, Vanderploeq T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN. Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004;74:1111–1120. - PMC - PubMed
    1. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, et al. Replicating genotype–phenotype associations. Nature. 2007;447:655–660. - PubMed
    1. Clark AG, Li J. Conjuring SNPs to detect associations. Nat Genet. 2007;39:815–816. - PubMed
    1. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006;38:75–81. - PubMed

Publication types

MeSH terms

LinkOut - more resources