Infer pathway abundances (original) (raw)

Pathway abundances are calculated using the same approach as HUMAnN2 based on the abundances of gene families that can be linked to reactions within pathways (E.C. numbers regrouped to MetaCyc reactions be default). By default, pathways will first be identified as present or not with MinPath.

Either a structured or unstructured pathway mapfile can be input (the mapfile is structured by default), which will identify which set of pathways are likely present based on the presence of requisite gene families.

Input gene family abundances can be stratified or unstratified by contributing organisms; however, stratified pathway abundances will only be written if the input gene families are in stratified format. Note that stratified abundances refer to how much each predicted genome is contributing to the community pathway abundances (not the predicted level of that pathway within that organism alone!). To get pathway abundances broken down by contributing sequence you need to use the --per_sequence_contrib option (see below).

There are two default mapfiles used by this script. These files are specified by default so you do not need to specify them yourself! However, it is useful to understand what this script does by default. First E.C. numbers are regrouped to MetaCyc RXNs using this mapfile: default_files/pathway_mapfiles/ec_level4_to_metacyc_rxn.tsv. These MetaCyc RXNs can then be used to infer MetaCyc pathway abundances using this mapfile: default_files/pathway_mapfiles/metacyc_path2rxn_struc_filt_pro.txt. This second mapfile contains maps of reactions to pathways for the subset of MetaCyc pathways found in prokaryotes.

Please note that before PICRUSt2-v2.6.0 the default running of this command was with the PICRUSt2-oldIMG database. As of PICRUSt2-v2.6.0 the default database will be the PICRUSt2-SC database. See here for further details on this new database. See the details for the -db option for using the PICRUSt2-oldIMG database with PICRUSt2-v2.6.0.

Use this command to run MinPath on the outputted predicted gene families to get unstratified pathway abundances (of pathways found in prokaryotes):

pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz \
                    -o pathways_out \
                    --intermediate minpath_working \
                    -p 1

The input arguments and options to this command are: