Building Galaxy Tools Using Planemo — Planemo 0.75.31.dev0 documentation (original) (raw)
This tutorial is a gentle introduction to writing Galaxy tools usingPlanemo. Please read the installation instructions for Planemoif you have not already installed it.
The Basics
This guide is going to demonstrate building up tools for commands from Heng Li’s Seqtk package - a package for processing sequence data in FASTA andFASTQ files.
To get started let’s install Seqtk. Here we are going to use conda
to install Seqtk - but however you obtain it should be fine.
$ conda install --force --yes -c conda-forge -c bioconda seqtk=1.2 ... seqtk installation ... $ seqtk seq Usage: seqtk seq [options] <in.fq>|<in.fa> Options: -q INT mask bases with quality lower than INT [0] -X INT mask bases with quality higher than INT [255] -n CHAR masked bases converted to CHAR; 0 for lowercase [0] -l INT number of residues per line; 0 for 2^32-1 [0] -Q INT quality shift: ASCII-INT gives base quality [33] -s INT random seed (effective with -f) [11] -f FLOAT sample FLOAT fraction of sequences [1] -M FILE mask regions in BED or name list FILE [null] -L INT drop sequences with length shorter than INT [0] -c mask complement region (effective with -M) -r reverse complement -A force FASTA output (discard quality) -C drop comments at the header lines -N drop sequences containing ambiguous bases -1 output the 2n-1 reads only -2 output the 2n reads only -V shift quality by '(-Q) - 33'
Next we will download an example FASTQ file and test out the a simple Seqtk command - seq
which converts FASTQ files into FASTA.
$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq $ seqtk seq -A 2.fastq > 2.fasta $ cat 2.fasta
EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG
For fully featured Seqtk wrappers check out Helena Rasche’swrapperson GitHub.
Galaxy tool files are just XML files, so at this point one could open a text editor and start writing the tool. Planemo has a commandtool_init
to quickly generate some of the boilerplate XML, so let’s start by doing that.
$ planemo tool_init --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'
The tool_init
command can take various complex arguments - but the two most basic ones are shown above --id
and --name
. Every Galaxy tool needs an id
(this is a short identifier used by Galaxy itself to identify the tool) and a name
(this is displayed to the Galaxy user and should be a short description of the tool). A tool’s name
can have whitespace but its id
must not.
The above command will generate the file seqtk_seq.xml
- which looks like this.
This tool file has the common sections required for a Galaxy tool but you will still need to open up the editor and fill out the command template, describe input parameters, tool outputs, write a help section, etc.
The tool_init
command can do a little bit better than this as well. We can use the test command we tried above seqtk seq -A 2.fastq > 2.fasta
as an example to generate a command block by specifing the inputs and the outputs as follows.
$ planemo tool_init --force
--id 'seqtk_seq'
--name 'Convert to FASTA (seqtk)'
--requirement seqtk@1.2
--example_command 'seqtk seq -A 2.fastq > 2.fasta'
--example_input 2.fastq
--example_output 2.fasta
This will generate the following XML file - which now has correct definitions for the input and output as well as an actual command template.
seqtkAs shown at the beginning of this section, the command seqtk seq
generates a help message for the seq
command. tool_init
can take that help message and stick it right in the generated tool file using the help_from_command
option.
Generally command help messages aren’t exactly appropriate for tools since they mention argument names and simillar details that are abstracted away by the tool - but they can be an excellent place to start.
The following Planemo’s tool_init
call has been enhanced to use --help_from_command
.
$ planemo tool_init --force
--id 'seqtk_seq'
--name 'Convert to FASTA (seqtk)'
--requirement seqtk@1.2
--example_command 'seqtk seq -A 2.fastq > 2.fasta'
--example_input 2.fastq
--example_output 2.fasta
--test_case
--cite_url 'https://github.com/lh3/seqtk'
--help_from_command 'seqtk seq'
In addition to demonstrating --help_from_command
, this demonstrates generating a test case from our example with --test_case
and adding a citation for the underlying tool. The resulting tool XML file is: