PICRUSt2 for prediction of metagenome functions - PubMed (original) (raw)

PICRUSt2 for prediction of metagenome functions

Gavin M Douglas et al. Nat Biotechnol. 2020 Jun.

No abstract available

Figures

Figure 1:. PICRUSt2 algorithm.

(a) The PICRUSt2 method consists of phylogenetic placement, hidden-state-prediction and sample-wise gene and pathway abundance tabulation. ASV sequences and abundances are taken as input, and gene family and pathway abundances are output. All necessary reference tree and trait databases for the default workflow are included in the PICRUSt2 implementation. (b) The default PICRUSt1 pipeline restricted predictions to reference operational taxonomic units (Ref. OTUs) in the Greengenes database. This requirement resulted in the exclusion of many study sequences across four representative 16S rRNA gene sequencing datasets. PICRUSt2 relaxes this requirement and is agnostic to whether the input sequences are within a reference or not, which results in almost all of the input amplicon sequence variants (ASVs) being retained in the final output. (c) An increase in the taxonomic diversity in the default PICRUSt2 database is observed compared to PICRUSt1.

Figure 2:. PICRUSt2 performance characteristics.

Validation of PICRUSt2 KEGG ortholog (KO) predictions comparing metagenome prediction performance against gold-standard shotgun metagenomic sequencing (MGS). (a) Boxplots of Spearman correlation coefficients observed in stool samples from Cameroonian individuals (n=57), the human microbiome project (HMP, n=137), stool samples from Indian individuals (n=91), non-human primate stool samples (n=77), mammalian stool (n=8), ocean water (n=6), and blueberry soil (n=22) datasets. The significance of paired-sample, two-tailed Wilcoxon tests is indicated above each tested grouping (*, **, and ns correspond to P < 0.05, P < 0.001, and not significant respectively). (b) Comparison of significantly differentially abundant KOs between predicted metagenomes and MGS. Precision, recall, and F1 score are reported for each category compared to the MGS data. Precision corresponds to the proportion of significant KOs for that category also significant in the MGS data. Recall corresponds to the proportion of significant KOs in the MGS data also significant for that category. The F1 score is the harmonic mean of these metrics. The subsets of the four datasets compared are indicated above each panel (the Cameroonian parasite is Entamoeba). Wilcoxon tests were performed on the KO relative abundances after normalizing by the median number of universal single-copy genes per sample. Significance was defined at a false discovery rate < 0.05. The “Shuffled ASVs” category corresponds to PICRUSt2 predictions with ASV labels shuffled per dataset. The “Alt. MGS” category corresponds to an alternative MGS processing pipeline with reads aligned to the KEGG database rather than the default HUMAnN2 pipeline.

Figure 3:. PICRUSt2 accurately predicts MetaCyc pathways and phenotypes for characterizing overall environments.

(a) Spearman correlation coefficients between PICRUSt2 predicted pathway abundances and gold-standard metagenomic sequencing (MGS). Results are shown for each validation dataset: stool from Cameroonian individuals, The Human Microbiome Project (HMP), stool from Indian individuals, mammalian stool, ocean water, non-human primate stool, and blueberry soil. These results are limited to the 575 pathways that could potentially be identified by PICRUSt2 and HUMAnN2. (b) Performance of binary phenotype predictions based on three metrics: F1 score, precision, and recall. Each point corresponds to one of the 41 phenotypes tested. Predictions assessed here are based on holding out each genome individually, predicting the phenotypes for that holdout genome, and comparing the predicted and observed values. The null distribution in this case is based on randomizing the phenotypes across the reference genomes and comparing to the actual values, which results in the same output for all three metrics. The P-values of paired-sample, two-tailed Wilcoxon tests is indicated above each tested grouping (* and ** correspond to P < 0.05 and P < 0.001, respectively). Note that in panel a the y-axis is truncated below 0.5 rather than 0 to better visualize small differences between categories. The sample sizes in panel a are 57 (Cameroonian), 137 (HMP), 91 (Indian), 8 (mammal), 6 (ocean), 77 (primate), and 22 (soil).

References

1. Langille MGI et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol 31, 814–821 (2013). - PMC - PubMed
1. Iwai S et al. Piphillin: Improved prediction of metagenomic content by direct inference from human microbiomes. PLoS One 11, e0166104 (2016). - PMC - PubMed
1. Jun SR, Robeson MS, Hauser LJ, Schadt CW & Gorin AA PanFP: Pangenome-based functional profiles for microbial communities. BMC Res. Notes 8, 479 (2015). - PMC - PubMed
1. Aßhauer KP, Wemheuer B, Daniel R & Meinicke P Tax4Fun: Predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31, 2882–2884 (2015). - PMC - PubMed
1. Wemheuer F et al. Tax4Fun2: a R-based tool for the rapid prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene marker gene sequences. bioRxiv (2018). doi: 10.1101/490037. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

PICRUSt2 for prediction of metagenome functions - PubMed (original) (raw)