Ascertainment bias in studies of human genome-wide polymorphism - PubMed (original) (raw)
Comparative Study
. 2005 Nov;15(11):1496-502.
doi: 10.1101/gr.4107905.
Affiliations
- PMID: 16251459
- PMCID: PMC1310637
- DOI: 10.1101/gr.4107905
Comparative Study
Ascertainment bias in studies of human genome-wide polymorphism
Andrew G Clark et al. Genome Res. 2005 Nov.
Abstract
Large-scale SNP genotyping studies rely on an initial assessment of nucleotide variation to identify sites in the DNA sequence that harbor variation among individuals. This "SNP discovery" sample may be quite variable in size and composition, and it has been well established that properties of the SNPs that are found are influenced by the discovery sampling effort. The International HapMap project relied on nearly any piece of information available to identify SNPs-including BAC end sequences, shotgun reads, and differences between public and private sequences-and even made use of chimpanzee data to confirm human sequence differences. In addition, the ascertainment criteria shifted from using only SNPs that had been validated in population samples, to double-hit SNPs, to finally accepting SNPs that were singletons in small discovery samples. In contrast, Perlegen's primary discovery was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and F(ST), as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude of disparity between these samples in these measures of variability indicates that population genetic analysis on the raw genotype data is ill advised. Given the knowledge of the discovery samples, we perform an ascertainment correction and show how the post-correction data are more consistent across these studies. However, discrepancies persist, suggesting that the heterogeneity in the SNP discovery process of the HapMap project resulted in a data set resistant to complete ascertainment correction. Ascertainment bias will likely erode the power of tests of association between SNPs and complex disorders, but the effect will likely be small, and perhaps more importantly, it is unlikely that the bias will introduce false-positive inferences.
Figures
Figure 1.
Site frequency spectra for the fully resequenced NIEHS gene set, for the Perlegen sequencing-by-hybridization SNP ascertainment set, and for the set of SNPs that the International HapMap consortium genotyped, all contrasted to the neutral expectation (given estimates of the sample θ). Note the marked absence of rare SNPs and oversampling of SNPs of intermediate frequency in the HapMap sample.
Figure 2.
(Top) Distributions of uncorrected HS (within-population heterozygosity) for the HapMap and the Perlegen data across 5682 windows of 500 kb spanning the entire human genome. Commensurate with the upward skew to the site frequency spectrum, the HapMap data have higher heterozygosity. (Bottom) After correction for ascertainment bias, the distributions of heterozygosity are more comparable; however, the ascertainment correction appears to have inflated the variance among windows in HS.
Figure 3.
Scatterplot of uncorrected HT for the HapMap data (_x_-axis) and the Perlegen data (_y_-axis). Each circle represents a 500-kb window, and the plot depicts the entire HapMap and Perlegen genome-wide samples.
Figure 4.
Distributions of FST between European and Chinese samples for ascertainment-corrected 500-kb windows of the HapMap data (top) and the Perlegen data (bottom).
Figure 5.
Scatterplot of FST between European and Chinese samples for ascertainment-corrected 500-kb windows of the HapMap data vs. the Perlegen data.
Figure 6.
Uncorrected (top) and ascertainment-corrected site frequency spectra (bottom) for the HapMap data (red)and the Perlegen data (blue dashed line). The HapMap data seriously underrepresented the rare SNPs compared with Perlegen, and the ascertainment correction produced frequency spectra that were more similar (bottom).
Similar articles
- How imputation can mitigate SNP ascertainment Bias.
Geibel J, Reimer C, Pook T, Weigend S, Weigend A, Simianer H. Geibel J, et al. BMC Genomics. 2021 May 12;22(1):340. doi: 10.1186/s12864-021-07663-6. BMC Genomics. 2021. PMID: 33980139 Free PMC article. - Ascertainment biases in SNP chips affect measures of population divergence.
Albrechtsen A, Nielsen FC, Nielsen R. Albrechtsen A, et al. Mol Biol Evol. 2010 Nov;27(11):2534-47. doi: 10.1093/molbev/msq148. Epub 2010 Jun 17. Mol Biol Evol. 2010. PMID: 20558595 Free PMC article. - Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.
Teo YY, Sim X, Ong RT, Tan AK, Chen J, Tantoso E, Small KS, Ku CS, Lee EJ, Seielstad M, Chia KS. Teo YY, et al. Genome Res. 2009 Nov;19(11):2154-62. doi: 10.1101/gr.095000.109. Epub 2009 Aug 21. Genome Res. 2009. PMID: 19700652 Free PMC article. - SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it.
Lachance J, Tishkoff SA. Lachance J, et al. Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9. Bioessays. 2013. PMID: 23836388 Free PMC article. Review. - [Analysis and application of SNP and haplotype in the human genome].
Li J, Pan YC, Li YX, Shi TL. Li J, et al. Yi Chuan Xue Bao. 2005 Aug;32(8):879-89. Yi Chuan Xue Bao. 2005. PMID: 16231744 Review. Chinese.
Cited by
- A map of canine sequence variation relative to a Greenland wolf outgroup.
Nguyen AK, Schall PZ, Kidd JM. Nguyen AK, et al. Mamm Genome. 2024 Aug 1. doi: 10.1007/s00335-024-10056-1. Online ahead of print. Mamm Genome. 2024. PMID: 39088040 - Global musical diversity is largely independent of linguistic and genetic histories.
Passmore S, Wood ALC, Barbieri C, Shilton D, Daikoku H, Atkinson QD, Savage PE. Passmore S, et al. Nat Commun. 2024 May 10;15(1):3964. doi: 10.1038/s41467-024-48113-7. Nat Commun. 2024. PMID: 38729968 Free PMC article. - Survival analysis under imperfect record linkage using historic census data.
Marks-Anglin AK, Barg FK, Ross M, Wiebe DJ, Hwang WT. Marks-Anglin AK, et al. BMC Med Res Methodol. 2024 Mar 13;24(1):67. doi: 10.1186/s12874-024-02194-6. BMC Med Res Methodol. 2024. PMID: 38481152 Free PMC article. - Identification and functional characterization of a novel heterozygous splice‑site mutation in the calpain 3 gene causes rare autosomal dominant limb‑girdle muscular dystrophy.
Mao B, Yang J, Zhao X, Jia X, Shi X, Zhao L, Banerjee S, Zhang L, Ma X. Mao B, et al. Exp Ther Med. 2024 Jan 11;27(3):97. doi: 10.3892/etm.2024.12385. eCollection 2024 Mar. Exp Ther Med. 2024. PMID: 38356676 Free PMC article. - Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes.
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Flegontov P, et al. PLoS Genet. 2023 Sep 7;19(9):e1010931. doi: 10.1371/journal.pgen.1010931. eCollection 2023 Sep. PLoS Genet. 2023. PMID: 37676865 Free PMC article.
References
- Akey, J.M., Zhang, K., Xiong, M., and Jin, L. 2003. The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol. Biol. Evol. 20: 232–242. - PubMed
- Bustamante, C.D., Fledel-Alon, A., Williamson, S., Nielsen, R., Hubisz, M.T., Glanowski, S., Tanenbaum, D.M., White, T.J., Sninsky, J.J., Hernandez, R., et al. 2005. Natural selection on protein coding genes in the human genome. Nature (in press). - PubMed
- Crawford, D.C., Carlson, C.S., Rieder, M.J., Carrington, D.P., Yi, Q., Smith, J.D., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74: 610–622. - PMC - PubMed
- Gibbs, R.A., Belmont, J.W., Hardenbol, P., Willis, T.D., Yu, F., Yang, H., Chang, L.-Y., Huang, W., Liu, B., Shen, Y., et al. 2003. The International HapMap Project. Nature 426: 789–796. - PubMed
Web site references
- http://egp.gs.washington.edu; NIEHS resequencing study.
- http://www.hapmap.org; International HapMap Project.
- http://genome.perlegen.com/browser/download.html; Perlegen Sciences Web site.
- http://www.hapmap.org/downloads/encode1.html; HapMap .subjects
Publication types
MeSH terms
Grants and funding
- R01 HL072904/HL/NHLBI NIH HHS/United States
- GM65509/GM/NIGMS NIH HHS/United States
- R01 HG003229-01/HG/NHGRI NIH HHS/United States
- R01 HG003229/HG/NHGRI NIH HHS/United States
- P50 GM065509/GM/NIGMS NIH HHS/United States
- HL072904/HL/NHLBI NIH HHS/United States
- HG03229/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous