Full pipeline script (original) (raw)

The entire PICRUSt2 pipeline can be run using a single script, called picrust2_pipeline.py. This script will run each of the 4 key steps outlined on this wiki: (1) sequence placement, (2) hidden-state prediction of genomes, (3) metagenome prediction, (4) pathway-level predictions.

The option of this program are the same as for each individual scripts overall.

The standard pipeline will generate metagenome predictions for 16S rRNA gene data. The input files should be a FASTA of amplicon sequences variants (ASVs; i.e. your representative sequences, not your raw reads, which is study_seqs.fna below) and a BIOM table of the abundance of each ASV across each sample (study_seqs.biom below). Note that a tab-delimited table with ASV ids as the first column and sample abundances as all subsequent columns will also work.

The below command will run the full default pipeline on the two input files. EC number and KO metagenomes are predicted as well as MetaCyc pathway abundances and coverages predicted based on the predicted EC number abundances. The nearest-sequenced taxon index (NSTI) will be calculated for each input ASV and by default any ASVs with NSTI > 2 will be excluded from the output by default. Stratified output will only be calculated when the --stratified option is set, which can greatly increase run-time.

picrust2_pipeline.py -s study_seqs.fna -i study_seqs.biom -o picrust2_out_pipeline -p 1

Note on using SEPP: if you use SEPP for sequence placement, you may need to first run export LC_ALL=C

Please note that before PICRUSt2-v2.6.0 the default running of this command was with the PICRUSt2-oldIMG database. As of PICRUSt2-v2.6.0 the default database will be the PICRUSt2-SC database. See here for further details on this new database. If you are using v2.6.0 you can still use the PICRUSt2-oldIMG database, but you will need to use the picrust2_pipeline_oldIMG.py command. The options/output will then be the same as v2.5.3 below.

All of the output files produced by the pipeline (including intermediate files, which can be useful for troubleshooting issues), will be output in picrust2_out_pipeline. Note that these are the default outputs, but if you specify different functional databases or custom reference databases (e.g. non-16S amplicon reference data) then the output will differ. See About the output for full details on all files produced.

v2.6.0 onwards

All of the output files produced by the pipeline (including intermediate files, which can be useful for troubleshooting issues), will be output in picrust2_out_pipeline. Note that these are the default outputs, but if you specify different functional databases or custom reference databases (e.g. non-16S amplicon reference data) then the output will differ.

The key output files are:

Additional output files, which can be useful for advanced users are:

See About the output for full details on all files produced.

Options:

v2.5.3 and earlier

Note that the behaviour of v2.5.3 and earlier can be replicated using picrust2_pipeline_oldIMG.py.

The key output files are:

Additional output files, which can be useful for advanced users are:

See About the output for full details on all files produced.

Options: