Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences - PubMed (original) (raw)

doi: 10.1038/nbt.2676. Epub 2013 Aug 25.

Jesse Zaneveld, J Gregory Caporaso, Daniel McDonald, Dan Knights, Joshua A Reyes, Jose C Clemente, Deron E Burkepile, Rebecca L Vega Thurber, Rob Knight, Robert G Beiko, Curtis Huttenhower

Affiliations

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

Morgan G I Langille et al. Nat Biotechnol. 2013 Sep.

Abstract

Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community's functional capabilities. Here we describe PICRUSt (phylogenetic investigation of communities by reconstruction of unobserved states), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.

PubMed Disclaimer

Figures

Figure 1

Figure 1

The PICRUSt workflow. PICRUSt is composed of two high-level workflows: gene content inference (top box) and metagenome inference (bottom box). Beginning with a reference OTU tree and a gene content table (i.e., counts of genes for reference OTUs with known gene content), the gene content inference workflow predicts gene content for each OTU with unknown gene content, including predictions of marker gene copy number. This information is precomputed for 16S based on Greengenes and IMG, but all functionality is accessible in PICRUSt for use with other marker genes and reference genomes. The metagenome inference workflow takes an OTU table (i.e., counts of OTUs on a per sample basis), where OTU identifiers correspond to tips in the reference OTU tree, as well as the copy number of the marker gene in each OTU and the gene content of each OTU (as generated by the gene content inference workflow) and outputs a metagenome table (i.e. counts of gene families on a per-sample basis).

Figure 2

Figure 2

PICRUSt recapitulates biological findings from the Human Microbiome Project. A) PCA plot comparing KEGG Module predictions using 16S data with PICRUSt (lighter colored triangles) and sequenced shotgun metagenome (darker colored circles) along with relative abundances for five specific KEGG Modules, B) M00061: Uronic acid metabolism, C) M00076: Dermatan sulfate degradation, D) M00077: Chondroitin sulfate degradation, E) M00078: Heparan sulfate degradation, and F) M00079: Keratan sulfate degradation, all involved in glycosaminosglycan degradation (KEGG pathway ko00531) using 16S with PICRUSt (P, lighter colored) and WGS (W, darker colored) across human body sites: nasal (blue), gastrointestinal tract (brown), oral (green), skin (red), and vaginal (yellow).

Figure 3

Figure 3

PICRUSt accuracy across various environmental microbiomes. Prediction accuracy for paired 16S rRNA marker gene surveys and shotgun metagenomes (y-axis) are plotted against the availability of reference genomes as summarized by the Nearest Sequenced Taxon Index (NSTI; x-axis). Accuracy is summarized using the Spearman correlation between the relative abundance of gene copy number predicted from 16S data using PICRUSt versus the relative abundance observed in the sequenced shotgun metagenome. In the absence of large differences in metagenomic sequencing depth (see text), relatively well-characterized environments, such as the human gut, have low NSTI values and can be predicted accurately from 16S surveys. Conversely, environments containing much unexplored diversity (e.g. phyla with few or no sequenced genomes), such as the Guerrero Negro hypersaline microbial mats, tended to have high NSTI values.

Figure 4

Figure 4

Accuracy of PICRUSt prediction compared with shotgun metagenomic sequencing at shallow sequencing depths. Spearman correlation (y-axis) between either PICRUSt predicted metagenomes (blue lines) or shotgun metagenomes (dashed red lines) using 14 soil microbial communities subsampled to the specified number of annotated sequences (x-axis). This rarefaction reflects random subsets of either the full 16S OTU table (blue) or the corresponding gene table for the sequenced metagenome (red). Ten randomly chosen rarefactions were performed at each depth to indicate the expected correlation obtained when assessing an underlying true metagenome using either shallow 16S rRNA gene sequencing with PICRUSt prediction or shallow shotgun metagenomic sequencing. The data label describes the number of annotated reads below which PICRUSt-prediction accuracy exceeds metagenome sequencing accuracy. Note that the plotted rarefaction depth reflects the number of 16S or metagenomic sequences remaining after standard quality control, dereplication, and annotation (or OTU picking in the case of 16S sequences), not the raw number returned from the sequencing facility. The number of total metagenomic reads below which PICRUSt outperforms metagenomic sequencing (72,650) for this dataset was calculated by adjusting the crossover point in annotated reads (above) using annotation rates for the soil dataset (17.3%) and closed-reference OTU picking rates for the 16S rRNA dataset (68.9%). The inset figure illustrates rapid convergence of PICRUSt predictions given low numbers of annotated reads (blue line).

Figure 5

Figure 5

PICRUSt prediction accuracy across the tree of bacterial and archaeal genomes. Phylogenetic tree produced by pruning the Greengenes 16S reference tree down to those tips representing sequenced genomes. Height of the bars in the outermost circle indicates the accuracy of PICRUSt for each genome (accuracy: 0.5-1.0) colored by phylum, with text labels for each genus with at least 15 strains. PICRUSt predictions were as accurate for archaeal (mean=0.94 +/- 0.04 s.d., n=103) as bacterial genomes (mean=0.95 +/- 0.05 s.d., n=2487).

Figure 6

Figure 6

Variation in inference accuracy across functional modules within single genomes. Results are colored by functional category, and sorted in decreasing order of accuracy within each category (indicated by triangular bars, right margin). Note that all accuracies were >0.80, and therefore the region 0.80-1.0 is displayed for clearer visualization of differences between modules.

Similar articles

Cited by

References

    1. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nature reviews Genetics. 2012;13:260–270. - PMC - PubMed
    1. Suen G, et al. An insect herbivore microbiome with high plant biomass-degrading capacity. PLoS genetics. 2010;6 - PMC - PubMed
    1. Kuczynski J, et al. Direct sequencing of the human microbiome readily reveals community differences. Genome biology. 2010;11:210. - PMC - PubMed
    1. Parks DH, Beiko RG. Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities. The ISME journal. 2013;7:173–183. - PMC - PubMed
    1. Knight R, et al. Unlocking the potential of metagenomics through replicated experimental design. Nature biotechnology. 2012;30:513–520. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources