Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process - PubMed (original) (raw)
Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process
Anna Ramírez-Soriano et al. Genetics. 2009 Feb.
Abstract
Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.
Figures
Figure 1.—
The distribution of the estimates of θ assuming nonascertained data (no asc), ascertained data with correction (asc | c), and ascertained data without correction (asc | nc). The mean and the variance of each set of data are shown in the insets. Simulations were performed for n = 50, d = 5, θ = 150, and 1,000,000 replicates. (A) Watterson's estimator. (B) Tajima's estimator.
Figure 2.—
The variance of Watterson's estimator of θ () and Tajima's estimator of θ () and the covariance as a function of d calculated using estimated values of θ and θ2 for a sample of size n = 100. We performed 10,000 replicates. (A) θ = 150. (B) θ = 22.33.
Figure 3.—
The distribution of Tajima's D for data without ascertainment bias and without correction (no asc), for ascertained data with correction (asc | c), and for ascertained data without correction (asc | nc). The mean and the variance among estimates are shown in the inset. A value of θ = 150 was used, with n = 50, d = 5, and 1,000,000 replicates were performed.
Figure 4.—
The distribution of the ascertainment bias corrected Tajima's D on chromosome 1 in the human genome based on the Perlegen data. The genes with the most extreme D values are also indicated.
Figure 5.—
Correlation of Tajima's D results from Perlegen data with and without correction for all chromosomes.
Similar articles
- Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium.
Nielsen R, Signorovitch J. Nielsen R, et al. Theor Popul Biol. 2003 May;63(3):245-55. doi: 10.1016/s0040-5809(03)00005-4. Theor Popul Biol. 2003. PMID: 12689795 - Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data.
Korneliussen TS, Moltke I, Albrechtsen A, Nielsen R. Korneliussen TS, et al. BMC Bioinformatics. 2013 Oct 2;14:289. doi: 10.1186/1471-2105-14-289. BMC Bioinformatics. 2013. PMID: 24088262 Free PMC article. - Ascertainment bias and the pattern of nucleotide diversity at the human ALDH2 locus in a Japanese population.
Brown BT, Woerner A, Wilder JA. Brown BT, et al. J Mol Evol. 2007 Mar;64(3):375-85. doi: 10.1007/s00239-006-0149-0. Epub 2007 Jan 16. J Mol Evol. 2007. PMID: 17225965 - SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it.
Lachance J, Tishkoff SA. Lachance J, et al. Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9. Bioessays. 2013. PMID: 23836388 Free PMC article. Review. - Population genetic analysis of ascertained SNP data.
Nielsen R. Nielsen R. Hum Genomics. 2004 Mar;1(3):218-24. doi: 10.1186/1479-7364-1-3-218. Hum Genomics. 2004. PMID: 15588481 Free PMC article. Review.
Cited by
- Molecular marker development and genetic diversity exploration in Medicago polymorpha.
Ren H, Wei Z, Zhou B, Chen X, Gao Q, Zhang Z. Ren H, et al. PeerJ. 2023 Jan 16;11:e14698. doi: 10.7717/peerj.14698. eCollection 2023. PeerJ. 2023. PMID: 36684677 Free PMC article. - A Large Panel of Drosophila simulans Reveals an Abundance of Common Variants.
Signor SA, New FN, Nuzhdin S. Signor SA, et al. Genome Biol Evol. 2018 Jan 1;10(1):189-206. doi: 10.1093/gbe/evx262. Genome Biol Evol. 2018. PMID: 29228179 Free PMC article. - Population genomics of the eastern cottonwood (Populus deltoides).
Fahrenkrog AM, Neves LG, Resende MFR Jr, Dervinis C, Davenport R, Barbazuk WB, Kirst M. Fahrenkrog AM, et al. Ecol Evol. 2017 Oct 10;7(22):9426-9440. doi: 10.1002/ece3.3466. eCollection 2017 Nov. Ecol Evol. 2017. PMID: 29187979 Free PMC article. - Environmental versus geographical effects on genomic variation in wild soybean (Glycine soja) across its native range in northeast Asia.
Leamy LJ, Lee CR, Song Q, Mujacic I, Luo Y, Chen CY, Li C, Kjemtrup S, Song BH. Leamy LJ, et al. Ecol Evol. 2016 Aug 14;6(17):6332-44. doi: 10.1002/ece3.2351. eCollection 2016 Sep. Ecol Evol. 2016. PMID: 27648247 Free PMC article. - Inferring positive selection in humans from genomic data.
Wollstein A, Stephan W. Wollstein A, et al. Investig Genet. 2015 Apr 1;6:5. doi: 10.1186/s13323-015-0023-1. eCollection 2015. Investig Genet. 2015. PMID: 25834723 Free PMC article.
References
- Altshuler, D., V. J. Pollara, C. R. Cowles, W. J. Van Etten, J. Baldwin et al., 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407 513–516. - PubMed
- Bamshad, M., and S. P. Wooding, 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4 99–111. - PubMed
- Carlson, C. S., M. A. Eberle, L. Kruglyak and D. A. Nickerson, 2004. Mapping complex disease loci in whole-genome association studies. Nature 429 446–452. - PubMed
- Carlson, C. S., J. D. Smith, I. B. Stanaway, M. J. Rieder and D. A. Nickerson, 2006. Direct detection of null alleles in SNP genotyping data. Hum. Mol. Genet. 15 1931–1937. - PubMed
Publication types
MeSH terms
Grants and funding
- R01 HG003229/HG/NHGRI NIH HHS/United States
- R01 HG003229-05/HG/NHGRI NIH HHS/United States
- U01 HL084706/HL/NHLBI NIH HHS/United States
- U01HL084706/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Research Materials