PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data - PubMed (original) (raw)

PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data

Thomas J Sharpton et al. PLoS Comput Biol. 2011.

Abstract

Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. PhylOTU Workflow.

Computational processes are represented as squares and databases are represented as cylinders in this generalize workflow of

PhylOTU

. See Results section for details.

Figure 2

Figure 2. Relationship between false clustering rate and true clustering rate.

Each read data set was clustered into OTUs at various thresholds and compared to the corresponding full-length data set, which was clustered at several fixed PD thresholds (shown here are full-length sequence cutoffs of 0.01, 0.03, 0.05 and 0.1). For each full-length sequence threshold, the true conjunction and false conjunction rates of the read OTUs were calculated as a function of the read threshold. Solid lines represent the median value of the true and false conjunction rates across simulations. Dashed lines represent the median value of the true and false conjunction rates derived from comparisons of randomly permuted clusters relative to the source sequence clusters.

Figure 3

Figure 3. Rarefaction analysis of OTUs identified from PCR and metagenomic sequencing at two different sequence similarity cutoffs (solid = 0.03, dashed = 0.15).

Rarefaction curves are shown for OTUs from PCR (blue) and metagenomic (red) sequencing libraries. Two different sequence similarity cutoffs are used (solid = 0.03, dashed = 0.15). Curves represent the average number of OTUs per sequence from 100 random draws of subsets of sequences from each data set.

Figure 4

Figure 4. Overlap between GOS OTUs revealed by PCR sequencing and metagenomic sequencing.

PhylOTU

was used to identify OTUs from a data set comprised of both PCR and shotgun SSU-rRNA sequences obtained from six Global Ocean Survey samples. The 1,254 OTUs that contained only PCR sequences at a clustering threshold of 0.03 were designated as OTUs unique to PCR, while those 80 OTUs that contained only metagenomic sequences at the corresponding clustering threshold of 0.15 (see Results) were designated as OTUs unique to metagenomic sequencing. The 309 OTUs identified by both PCR and shotgun sequencing was determined by using a clustering threshold of 0.03 (162 shared OTUs are identified when a threshold of 0.15 is used). The total number of OTUs and the total number of SSU-rRNA sequences are shown on the left and right of the Venn-Diagram for the PCR (threshold = 0.03) and metagenomic data (threshold = 0.15), respectively. The taxonomic distribution of each set of OTUs is shown beneath the Venn-Diagram. Here, every sequence from each OTU was taxonomically classified into major clades of Bacteria (approximately phylum level designations) using the Ribosomal Database Project classification software. The relative abundance of each taxonomic group is plotted along the x-axis (specific values can be found in Table S5). Clades exhibiting less than 1% relative abundance across all sets of OTUs are not shown.

Similar articles

Cited by

References

    1. Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci U S A. 2001;99:10494–10499. - PMC - PubMed
    1. Staley JT. The bacterial species dilemma and the genomic-phylogenetic species concept. Phil Trans R Soc B. 2006;361:1899–1909. - PMC - PubMed
    1. Schloss P, Handelsman J. Status of the microbial census. Microbiol Mol Biol Rev. 2004;68:686–91. - PMC - PubMed
    1. Sogin M, Morrison H, Huber J, Welch DM, Huse SM, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci U S A. 2006;103:12115–12120. - PMC - PubMed
    1. Martiny JH, Bohannan BJM, Brown JH, Colwell RK, Fuhrman JA, et al. Microbial biogeography: putting microorganisms on the map. Nat Rev Micro. 2006;4:102–12. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources