Effects of experimental choices and analysis noise on surveys of the "rare biosphere" - PubMed (original) (raw)

Comparative Study

. 2009 May;75(10):3263-70.

doi: 10.1128/AEM.01931-08. Epub 2009 Mar 6.

Affiliations

Comparative Study

Effects of experimental choices and analysis noise on surveys of the "rare biosphere"

Timothy J Hamp et al. Appl Environ Microbiol. 2009 May.

Abstract

When planning a survey of 16S rRNA genes from a complex environment, investigators face many choices including which primers to use and how to taxonomically classify sequences. In this study, we explored how these choices affected a survey of microbial diversity in a sample taken from the aerobic basin of the activated sludge of a North Carolina wastewater treatment plant. We performed pyrosequencing reactions on PCR products generated from primers targeting the V1-V2, V6, and V6-V7 variable regions of the 16S rRNA gene. We compared these sequences to 16S rRNA gene sequences found in a whole-genome shotgun pyrosequencing run performed on the same sample. We found that sequences generated from primers targeting the V1-V2 variable region had the best match to the whole-genome shotgun reaction across a range of taxonomic classifications from phylum to family. Pronounced differences between primer sets, however, occurred in the "rare biosphere" involving taxa that we observed in fewer than 11 sequences. We also examined the results of analysis strategies comparing a classification scheme using a nearest-neighbor approach to directly classifying sequences with a naïve Bayesian algorithm. Again, we observed pronounced differences between these analysis schemes in infrequently observed taxa. We conclude that if a study is meant to probe the rare biosphere, both the experimental conditions and analysis choices will have a profound impact on the observed results.

PubMed Disclaimer

Figures

FIG. 1.

FIG. 1.

Sequence conservation as a function of alignment position for the 489,840 sequences in version 9.59 of the RDP. The x axis shows the position in the alignment as numbered by the E. coli 16S rRNA gene. The y axis shows the Shannon sequence entropy (see Materials and Methods), a widely used measure of conservation in multiple sequence alignments (23). Highly conserved positions within the alignment have a sequence entropy close to zero and hence are shown toward the top of the y axis. Positions of the hypervariable regions V1-V3 and V6-V7 are derived from Chakravorty et al. (3).

FIG. 2.

FIG. 2.

Number of sequences assigned at the family classification level by the RDP classification algorithm to different sequences for the V1-V2, V6, and V6-V7 primers plotted against the number of 16S sequences assigned to the whole-genome shotgun sequence set. One has been added to each count to allow the data to be shown on a log-log plot.

FIG. 3.

FIG. 3.

Across taxonomic levels, the results of a linear regression on log-transformed data between sequences generated by PCR targeting the 16S rRNA gene and 16S sequences culled from our whole-genome shotgun sequence set. Assignments are by the RDP classification algorithm as in Fig. 2. For each sequence set, two separate regressions were constructed, one for the rare biosphere (circles) with taxa seen 10 or fewer times in that sequence set's PCRs and one for a common biosphere (squares) with taxa seen 11 or more times in the PCR reactions (Fig. 1, gray lines). The top panels show the −log10 of the P value of the null hypothesis that the slope of the regression equals zero. The middle panels show Pearson's R values while the bottom panel shows the number of taxa for which classifications are made. Note that a significant P value (top panel) can be produced by either a negative or positive correlation.

FIG. 4.

FIG. 4.

Comparison of classifications at the family level made by JGast and the RDP classification algorithm.

FIG. 5.

FIG. 5.

Regressions across classification levels on log-transformed data showing the comparison between the RDP classification algorithm and JGast. Two regressions were constructed for each comparison: one for a common biosphere in which a taxon was observed 11 or more times under either JGast or RDP (squares) and one for a rare taxon in which fewer than 11 taxa were observed under both classification schemes.

Similar articles

Cited by

References

    1. Amann, R., W. Ludwig, and K. Schleifer. 1995. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59:143-169. - PMC - PubMed
    1. Bond, P. L., P. Hugenholtz, J. Keller, and L. L. Blackall. 1995. Bacterial community structures of phosphate-removing and non-phosphate-removing activated sludges from sequencing batch reactors. Appl. Environ. Microbiol. 61:1910-1916. - PMC - PubMed
    1. Chakravorty, S., D. Helb, M. Burday, N. Connell, and D. Alland. 2007. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J. Microbiol. Methods 69:330-339. - PMC - PubMed
    1. Crump, B. C., G. W. Kling, M. Bahr, and J. E. Hobbie. 2003. Bacterioplankton community shifts in an Arctic lake correlate with seasonal changes in organic matter source. Appl. Environ. Microbiol. 69:2253-2268. - PMC - PubMed
    1. DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G. L. Andersen. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72:5069-5072. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources