Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process - PubMed (original) (raw)

Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

Anna Ramírez-Soriano et al. Genetics. 2009 Feb.

Abstract

Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

PubMed Disclaimer

Figures

Figure 1.—

The distribution of the estimates of θ assuming nonascertained data (no asc), ascertained data with correction (asc | c), and ascertained data without correction (asc | nc). The mean and the variance of each set of data are shown in the insets. Simulations were performed for n = 50, d = 5, θ = 150, and 1,000,000 replicates. (A) Watterson's estimator. (B) Tajima's estimator.

Figure 2.—

The variance of Watterson's estimator of θ ( formula image ) and Tajima's estimator of θ () and the covariance as a function of d calculated using estimated values of θ and θ2 for a sample of size n = 100. We performed 10,000 replicates. (A) θ = 150. (B) θ = 22.33.

Figure 3.—

The distribution of Tajima's D for data without ascertainment bias and without correction (no asc), for ascertained data with correction (asc | c), and for ascertained data without correction (asc | nc). The mean and the variance among estimates are shown in the inset. A value of θ = 150 was used, with n = 50, d = 5, and 1,000,000 replicates were performed.

Figure 4.—

The distribution of the ascertainment bias corrected Tajima's D on chromosome 1 in the human genome based on the Perlegen data. The genes with the most extreme D values are also indicated.

Figure 5.—

Correlation of Tajima's D results from Perlegen data with and without correction for all chromosomes.

Cited by

Molecular marker development and genetic diversity exploration in Medicago polymorpha.
Ren H, Wei Z, Zhou B, Chen X, Gao Q, Zhang Z. Ren H, et al. PeerJ. 2023 Jan 16;11:e14698. doi: 10.7717/peerj.14698. eCollection 2023. PeerJ. 2023. PMID: 36684677 Free PMC article.
Diversity and evolution of 11 innate immune genes in Bos taurus taurus and Bos taurus indicus cattle.
Seabury CM, Seabury PM, Decker JE, Schnabel RD, Taylor JF, Womack JE. Seabury CM, et al. Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):151-6. doi: 10.1073/pnas.0913006107. Epub 2009 Dec 14. Proc Natl Acad Sci U S A. 2010. PMID: 20018671 Free PMC article.
Sequencing and analysis of an Irish human genome.
Tong P, Prendergast JG, Lohan AJ, Farrington SM, Cronin S, Friel N, Bradley DG, Hardiman O, Evans A, Wilson JF, Loftus B. Tong P, et al. Genome Biol. 2010;11(9):R91. doi: 10.1186/gb-2010-11-9-r91. Epub 2010 Sep 7. Genome Biol. 2010. PMID: 20822512 Free PMC article.
Widespread genomic signatures of natural selection in hominid evolution.
McVicker G, Gordon D, Davis C, Green P. McVicker G, et al. PLoS Genet. 2009 May;5(5):e1000471. doi: 10.1371/journal.pgen.1000471. Epub 2009 May 8. PLoS Genet. 2009. PMID: 19424416 Free PMC article.
A bioinformatics workflow for detecting signatures of selection in genomic data.
Cadzow M, Boocock J, Nguyen HT, Wilcox P, Merriman TR, Black MA. Cadzow M, et al. Front Genet. 2014 Aug 26;5:293. doi: 10.3389/fgene.2014.00293. eCollection 2014. Front Genet. 2014. PMID: 25206364 Free PMC article.

References

1. Altshuler, D., V. J. Pollara, C. R. Cowles, W. J. Van Etten, J. Baldwin et al., 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407 513–516. - PubMed
1. Bamshad, M., and S. P. Wooding, 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4 99–111. - PubMed
1. Carlson, C. S., M. A. Eberle, L. Kruglyak and D. A. Nickerson, 2004. Mapping complex disease loci in whole-genome association studies. Nature 429 446–452. - PubMed
1. Carlson, C. S., J. D. Smith, I. B. Stanaway, M. J. Rieder and D. A. Nickerson, 2006. Direct detection of null alleles in SNP genotyping data. Hum. Mol. Genet. 15 1931–1937. - PubMed
1. Clark, A. G., M. J. Hubisz, C. D. Bustamante, S. H. Williamson and R. Nielsen, 2005. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 15 1496–1502. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process - PubMed (original) (raw)