PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals - PubMed (original) (raw)

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals

Robert Kofler et al. PLoS One. 2011.

Abstract

Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θ(Watterson), θ(π), and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Outline of a population genetic analysis from pooled sequence data.

Sequencer figure from

http://www.illumina.com/

Figure 2

Figure 2. Graphical output of polymorphism and divergence estimates using PoPoolation.

Sliding window analysis of θ π of a Portuguese D. melanogaster population on chromosome 3R (black line). The red line shows divergence (dxy) between D. melanogaster and D. simulans using the same window size and step size as for θ π. Note that dxy is scaled by 1/10. Both lines are based on non-overlapping windows of 50 kb.

Figure 3

Figure 3. Sequencing errors in relation to coverage, minor allele count, and sequence quality.

PhiX sequences (74 bp) generated with an Illumina GAIIx sequencer were analyzed for sequencing error rate (number of mutated bases after quality filtering). The gray bar indicates the presence of a polymorphic site in the PhiX sequence, which results in a minimum sequencing error rate.

Figure 4

Figure 4. Improvement of the alignment for diverged regions using the PE-SW remap algorithm.

IGV screenshot of the mapping of pooled sequence reads in a highly divergent region of D. melanogaster. The upper panel shows an alignment of the PE reads without the PE-SW remap and the lower panel shows the same region with the PE-SW remap.

Figure 5

Figure 5. The influence of coverage and window size on the accuracy of the estimated θ π.

The accuracy was measured as the mean standardized difference between θ π estimated for a given window size and its expectation.

Similar articles

Cited by

References

    1. Turner TL, Bourne EC, Von Wettberg EJ, Hu TT, Nuzhdin SV. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nat Genet. 2010. - PubMed
    1. Rubin CJ, Zody MC, Eriksson J, Meadows JR, Sherwood E, et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010;464:587–591. - PubMed
    1. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–1858. - PMC - PubMed
    1. Quinlan AR, Stewart DA, Stromberg MP, Marth GT. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods. 2008;5:179–181. - PubMed
    1. Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25:2283–2285. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources