Building Galaxy Tools Using Planemo — Planemo 0.75.31.dev0 documentation (original) (raw)

This tutorial is a gentle introduction to writing Galaxy tools usingPlanemo. Please read the installation instructions for Planemoif you have not already installed it.

The Basics

This guide is going to demonstrate building up tools for commands from Heng Li’s Seqtk package - a package for processing sequence data in FASTA andFASTQ files.

To get started let’s install Seqtk. Here we are going to use conda to install Seqtk - but however you obtain it should be fine.

$ conda install --force --yes -c conda-forge -c bioconda seqtk=1.2 ... seqtk installation ... $ seqtk seq Usage: seqtk seq [options] <in.fq>|<in.fa> Options: -q INT mask bases with quality lower than INT [0] -X INT mask bases with quality higher than INT [255] -n CHAR masked bases converted to CHAR; 0 for lowercase [0] -l INT number of residues per line; 0 for 2^32-1 [0] -Q INT quality shift: ASCII-INT gives base quality [33] -s INT random seed (effective with -f) [11] -f FLOAT sample FLOAT fraction of sequences [1] -M FILE mask regions in BED or name list FILE [null] -L INT drop sequences with length shorter than INT [0] -c mask complement region (effective with -M) -r reverse complement -A force FASTA output (discard quality) -C drop comments at the header lines -N drop sequences containing ambiguous bases -1 output the 2n-1 reads only -2 output the 2n reads only -V shift quality by '(-Q) - 33'

Next we will download an example FASTQ file and test out the a simple Seqtk command - seq which converts FASTQ files into FASTA.

$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq $ seqtk seq -A 2.fastq > 2.fasta $ cat 2.fasta

EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG

For fully featured Seqtk wrappers check out Helena Rasche’swrapperson GitHub.

Galaxy tool files are just XML files, so at this point one could open a text editor and start writing the tool. Planemo has a commandtool_init to quickly generate some of the boilerplate XML, so let’s start by doing that.

$ planemo tool_init --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'

The tool_init command can take various complex arguments - but the two most basic ones are shown above --id and --name. Every Galaxy tool needs an id (this is a short identifier used by Galaxy itself to identify the tool) and a name (this is displayed to the Galaxy user and should be a short description of the tool). A tool’s name can have whitespace but its idmust not.

The above command will generate the file seqtk_seq.xml - which looks like this.

This tool file has the common sections required for a Galaxy tool but you will still need to open up the editor and fill out the command template, describe input parameters, tool outputs, write a help section, etc.

The tool_init command can do a little bit better than this as well. We can use the test command we tried above seqtk seq -A 2.fastq > 2.fasta as an example to generate a command block by specifing the inputs and the outputs as follows.

$ planemo tool_init --force
--id 'seqtk_seq'
--name 'Convert to FASTA (seqtk)'
--requirement seqtk@1.2
--example_command 'seqtk seq -A 2.fastq > 2.fasta'
--example_input 2.fastq
--example_output 2.fasta

This will generate the following XML file - which now has correct definitions for the input and output as well as an actual command template.

seqtk

As shown at the beginning of this section, the command seqtk seq generates a help message for the seq command. tool_init can take that help message and stick it right in the generated tool file using the help_from_command option.

Generally command help messages aren’t exactly appropriate for tools since they mention argument names and simillar details that are abstracted away by the tool - but they can be an excellent place to start.

The following Planemo’s tool_init call has been enhanced to use --help_from_command.

$ planemo tool_init --force
--id 'seqtk_seq'
--name 'Convert to FASTA (seqtk)'
--requirement seqtk@1.2
--example_command 'seqtk seq -A 2.fastq > 2.fasta'
--example_input 2.fastq
--example_output 2.fasta
--test_case
--cite_url 'https://github.com/lh3/seqtk'
--help_from_command 'seqtk seq'

In addition to demonstrating --help_from_command, this demonstrates generating a test case from our example with --test_case and adding a citation for the underlying tool. The resulting tool XML file is:

seqtk