Generating shuffled predictions (original) (raw)

It can be helpful to compare the PICRUSt2 output tables with tables based on shuffling the predictions across all amplicon sequence variants (ASVs). The script shuffle_predictions.py was added in v2.4.0 to make this task easier. This script randomizes the ASV labels for all predicted genomes (so all the same individual predicted genomes are the same - they just are linked to different ASV abundances across samples).

This is how you could run the command with the tutorial data:

shuffle_predictions.py -i EC_predicted.tsv.gz \
                           -o EC_predicted_shuffled \
                           -r 5 \
                           -s 131

Where -r specifies how many random replicates to make and -s 131 specifies a random seed so that the same shuffled tables will be output reproducibly if this seed were used again.

The gene family and pathway-level prediction tables can then be generated from these shuffled tables by running the standard PICRUSt2 commands. Below is an example of how to quickly run metagenome_pipeline.py and pathway_pipeline.py on all shuffled tables with a bash loop.

# Make folders for shuffled output
mkdir EC_metagenome_out_shuffled
mkdir pathways_out_shuffled

for i in {1..5}; do
    
    # Define in and out file paths.
    EC_SHUFFLED="EC_predicted_shuffled/EC_predicted_shuf"$i".tsv.gz"
    OUT_META="EC_metagenome_out_shuffled/rep"$i
    OUT_PATHWAYS="pathways_out_shuffled/rep"$i
    
    # PICRUSt2 scripts to get prediction abundance tables for gene and pathway levels, respectively.
    metagenome_pipeline.py -i ../table.biom -m marker_predicted_and_nsti.tsv.gz -f $EC_SHUFFLED \
                       -o $OUT_META \
                       --strat_out
    
     pathway_pipeline.py -i $OUT_META/pred_metagenome_contrib.tsv.gz \
                         -o $OUT_PATHWAYS \
                         -p 1
done   

These shuffled tables are especially helpful to get a baseline for how the predicted functional data differentiates samples (e.g. based on ordination or differential abundance testing) when the predicted ASV genomes are assigned randomly.