Sequence placement (original) (raw)

PICRUSt2 wraps HMMER to place study sequences into a reference multiple-sequence alignment and then places these sequences into the reference phylogeny with EPA-NG or SEPP. The "study sequences" referred to will be the representative OTUs and/or ASVs under the typical workflow. The tool GAPPA is used to convert the resulting .jplace object into newick format.

Please note that before PICRUSt2-v2.6.0 the default running of this command was with the PICRUSt2-oldIMG database. As of PICRUSt2-v2.6.0 the default database will be the PICRUSt2-SC database. See here for further details on this new database. See the details for the --ref_dir option for using the PICRUSt2-oldIMG database with PICRUSt2-v2.6.0.

Note that your input study sequences need to be on the positive strand!

Default placement in bacterial reference tree:

place_seqs.py -s study_seqs.fna -o bac_placed_seqs.tre -p 1 --intermediate bac_placement_working

Placement in archaeal reference tree:

place_seqs.py -s study_seqs.fna --ref_dir arc -o arc_placed_seqs.tre -p 1 --intermediate arc_placement_working

Placement in PICRUSt2-oldIMG combined bacterial and archaeal tree:

place_seqs.py -s study_seqs.fna --ref_dir oldIMG -o placed_seqs.tre -p 1 --intermediate placement_working

Note on using SEPP: if you use SEPP for sequence placement, you may need to first run export LC_ALL=C

The script takes these arguments/options:

Using Custom Reference Files

Note that while you may wish to do this if you want to create an entirely new database, the easiest option may be to add your trait to the existing PICRUSt2 database. We now provide some directions on how to do this here.

To use custom reference files you need to specify a directory with --ref_dir that contains:

  1. A multiple-sequence alignment (with the extension .fna or .fasta and can optionally be gzipped)
  2. A tree in newick format (extension .tre)
  3. A hidden-markov model of the multiple-sequence alignment (extension .hmm)
  4. A modelfile output by RaXmL specifying the best parameters for the tree (extension .model)

Note that the prefix of these files needs to be the same as the specified folder name. For instance, the default reference files (prokaryotic 16S rRNA gene alignment) are in picrust2/default_files/prokaryotic/pro_ref and they all have the prefix "pro_ref":

pro_ref.fna.gz
pro_ref.hmm
pro_ref.model
pro_ref.tre

If you do not have a model file you can create one by following these instructions. You can create an HMM of your alignment with hmmbuild.

Further details on creating these files can be found in the wiki describing how the updated database was built here.