PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets - PubMed (original) (raw)

PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets

Ameet J Pinto et al. PLoS One. 2012.

Abstract

As 16S rRNA gene targeted massively parallel sequencing has become a common tool for microbial diversity investigations, numerous advances have been made to minimize the influence of sequencing and chimeric PCR artifacts through rigorous quality control measures. However, there has been little effort towards understanding the effect of multi-template PCR biases on microbial community structure. In this study, we used three bacterial and three archaeal mock communities consisting of, respectively, 33 bacterial and 24 archaeal 16S rRNA gene sequences combined in different proportions to compare the influences of (1) sequencing depth, (2) sequencing artifacts (sequencing errors and chimeric PCR artifacts), and (3) biases in multi-template PCR, towards the interpretation of community structure in pyrosequencing datasets. We also assessed the influence of each of these three variables on α- and β-diversity metrics that rely on the number of OTUs alone (richness) and those that include both membership and the relative abundance of detected OTUs (diversity). As part of this study, we redesigned bacterial and archaeal primer sets that target the V3-V5 region of the 16S rRNA gene, along with multiplexing barcodes, to permit simultaneous sequencing of PCR products from the two domains. We conclude that the benefits of deeper sequencing efforts extend beyond greater OTU detection and result in higher precision in β-diversity analyses by reducing the variability between replicate libraries, despite the presence of more sequencing artifacts. Additionally, spurious OTUs resulting from sequencing errors have a significant impact on richness or shared-richness based α- and β-diversity metrics, whereas metrics that utilize community structure (including both richness and relative abundance of OTUs) are minimally affected by spurious OTUs. However, the greatest obstacle towards accurately evaluating community structure are the errors in estimated mean relative abundance of each detected OTU due to biases associated with multi-template PCR reactions.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. The mean percent GC content of the three bacterial and three archaeal mock communities (A) and the resulting reads attributed to mock community replicates in the final 454-sequencing output expressed as percent reads in sequencing library versus the GC content of the amplicon pool (B).

Error bars in panel A represent variation in GC content between replicates of each community resulting from differences in GC content of barcoded reverse primer. Black bars: bacteria, white bars: archaea. The red dotted line in panel B shows the 95% confidence band for the regression line. Black symbols: bacteria, white symbols: archaea. Diamonds (◊) and upper triangle (Δ): large library, Squares (□) and lower triangle (∇): small library.

Figure 2

Figure 2. The taxa detection frequency for each of the replicate mock communities at different sequencing depths are compared to detection frequency at different theoretical sampling depths.

Open circles: sub-samples of in-silico mock communities with varying number of sequences, red circles: large library, green circles: small library, solid lines: 95% confidence interval band for the in-silico sub-sampling efforts. A–C: bacteria, D–F: archaea, A/D: mock1, B/E: mock 2, C/F: mock 3. Theoretical taxa detection frequencies for mock community 1 (bacteria and archaea) are 1.0 for most in-silico sub-sampling efforts and hence are not shown in panels A and D.

Figure 3

Figure 3. Relative abundance of sequences used to generate bacterial mock communities.

A: mock1, B: mock 2, C: mock 3. Dashed line: theoretical relative abundance. The experimental mean relative abundance for small libraries (green circles) and large libraries (red circles) are shown and error bars indicate standard deviations for triplicate samples. The grey box indicates a sequence that was not detected in any community; the black box indicates an OTU that consisted of two sequences at a similarity cutoff of 3%.

Figure 4

Figure 4. Relative abundance of sequences used to generate archaeal mock communities.

A: mock1, B: mock 2, C: mock 3. Dashed line: theoretical relative abundance. The experimental mean relative abundance for small libraries (green circles) and large libraries (red circles) are shown and error bars indicate standard deviations for triplicate samples. The black box indicates an OTU that consisted of two sequences at a similarity cutoff of 3%.

Figure 5

Figure 5. Rank abundance profiles for the bacterial and archaeal mock communities.

A–C: bacteria, D–F: archaea. Black lines: theoretical, green lines: small libraries, red lines: large libraries. The Kolmogorov-Smirnov statistics at the left bottom of each panel are for comparisons between large and small libraries. The Kolmogorov-Smirnov statistics to the right of each panel are for comparisons between the large libraries of m1/m2 and m1/m3.

Figure 6

Figure 6. Diversity metrics calculated for the bacterial and archaeal mock communities.

A–D: bacteria, E–H: archaea. Black bars: theoretical, red bars: large libraries, red-hashed bars: large libraries with spurious sequences removed, green bars: small libraries, green-hashed bars: small libraries with spurious sequences removed. The error bars indicate standard deviations for triplicate samples.

Figure 7

Figure 7. Principal coordinate axes plot for bacterial and archaeal communities constructed using the Morisita-Horn distance (DMH).

A–C: bacteria, D–F: archaea. A/D: m1, B/E: m2, C/F: m3. Black squares indicate the theoretical mock community and the small open circles denote the in-silico sequencing efforts at sampling depths varying from 1 to 90%. The red filled squares and red open squares represent the large libraries with and without spurious OTUs, respectively, while green filled triangles and green open triangles indicate the small libraries with and without spurious OTUs, respectively. The blue circle is the centroid of the experimental libraries.

Figure 8

Figure 8. Comparisons of pairwise Morisita-Horn distance between bacterial and archaeal mock communities to the theoretical pairwise distances.

A–C: bacteria, D–F: archaea. A/D: m1–m2, B/E: m1–m3, C/F: m2–m3. Nine pairwise comparisons generated between three replicate libraries for each community were used to construct each box.

Similar articles

Cited by

References

    1. Huse S, Huber J, Morrison H, Sogin M, Welch D (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8: R143.1–R143.9. - PMC - PubMed
    1. Metzker ML (2010) Sequencing technologies-the next generation. Nat Rev Genet 11: 31–46. - PubMed
    1. Bates ST, Berg-Lyons D, Caporaso JG, Walters WA, Knight R, et al. (2011) Examining the global distribution of dominant archaeal populations in soil. ISME J 5: 908–917. - PMC - PubMed
    1. Galand PE, Casamayor EO, Kirchman DL, Potvin M, Lovejoy C (2009) Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing. ISME J 3: 860–869. - PubMed
    1. Hollister EB, Engledow AS, Hammett AJM, Provin TL, Wilkinson HH, et al. (2010) Shifts in microbial community structure along an ecological gradient of hypersaline soils and sediments. ISME J 4: 829–838. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

This research was partially supported by United States National Science Foundation grants BES-0412618, CBET-0967707, and CBET-1133793, and Water Research Foundation Tailored Collaboration project no. 4346. No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources