Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process - PubMed (original) (raw)

Correcting estimators of theta and Tajima's D for ascertainment biases caused by the single-nucleotide polymorphism discovery process

Anna Ramírez-Soriano et al. Genetics. 2009 Feb.

Abstract

Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—

Figure 1.—

The distribution of the estimates of θ assuming nonascertained data (no asc), ascertained data with correction (asc | c), and ascertained data without correction (asc | nc). The mean and the variance of each set of data are shown in the insets. Simulations were performed for n = 50, d = 5, θ = 150, and 1,000,000 replicates. (A) Watterson's estimator. (B) Tajima's estimator.

F<sc>igure</sc> 2.—

Figure 2.—

The variance of Watterson's estimator of θ (formula image) and Tajima's estimator of θ (formula image) and the covariance as a function of d calculated using estimated values of θ and θ2 for a sample of size n = 100. We performed 10,000 replicates. (A) θ = 150. (B) θ = 22.33.

F<sc>igure</sc> 3.—

Figure 3.—

The distribution of Tajima's D for data without ascertainment bias and without correction (no asc), for ascertained data with correction (asc | c), and for ascertained data without correction (asc | nc). The mean and the variance among estimates are shown in the inset. A value of θ = 150 was used, with n = 50, d = 5, and 1,000,000 replicates were performed.

F<sc>igure</sc> 4.—

Figure 4.—

The distribution of the ascertainment bias corrected Tajima's D on chromosome 1 in the human genome based on the Perlegen data. The genes with the most extreme D values are also indicated.

F<sc>igure</sc> 5.—

Figure 5.—

Correlation of Tajima's D results from Perlegen data with and without correction for all chromosomes.

Similar articles

Cited by

References

    1. Altshuler, D., V. J. Pollara, C. R. Cowles, W. J. Van Etten, J. Baldwin et al., 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407 513–516. - PubMed
    1. Bamshad, M., and S. P. Wooding, 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4 99–111. - PubMed
    1. Carlson, C. S., M. A. Eberle, L. Kruglyak and D. A. Nickerson, 2004. Mapping complex disease loci in whole-genome association studies. Nature 429 446–452. - PubMed
    1. Carlson, C. S., J. D. Smith, I. B. Stanaway, M. J. Rieder and D. A. Nickerson, 2006. Direct detection of null alleles in SNP genotyping data. Hum. Mol. Genet. 15 1931–1937. - PubMed
    1. Clark, A. G., M. J. Hubisz, C. D. Bustamante, S. H. Williamson and R. Nielsen, 2005. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 15 1496–1502. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources