Advanced Tool Development Topics — Planemo 0.75.31.dev0 documentation (original) (raw)

This tutorial covers some more advanced tool development topics. It assumes some basic knowledge about developing CWL tools and that you have an environment with Planemo available - check out the CWL User Guide CWL and the Planemo CWL intro tutorialif you have never developed a CWL tool.

Dependencies and Conda

Specifying and Using Software Requirements

Note

Planemo requires a Conda installation to target with its various Conda related commands. A properly configured Conda installation can be initialized with the conda_init command. This should only need to be executed once per development machine.

$ planemo conda_init galaxy.tools.deps.conda_util INFO: Installing conda, this may take several minutes. wget -q --recursive -O /var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLW5zn1.sh https://repo.continuum.io/miniconda/Miniconda3-4.3.31-MacOSX-x86_64.sh bash /var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLW5zn1.sh -b -p /Users/john/miniconda3 PREFIX=/Users/john/miniconda3 installing: python-3.6.3-h47c878a_7 ... Python 3.6.3 :: Anaconda, Inc. installing: ca-certificates-2017.08.26-ha1e5d58_0 ... installing: conda-env-2.6.0-h36134e3_0 ... installing: libcxxabi-4.0.1-hebd6815_0 ... installing: tk-8.6.7-h35a86e2_3 ... installing: xz-5.2.3-h0278029_2 ... installing: yaml-0.1.7-hc338f04_2 ... installing: zlib-1.2.11-hf3cbc9b_2 ... installing: libcxx-4.0.1-h579ed51_0 ... installing: openssl-1.0.2n-hdbc3d79_0 ... installing: libffi-3.2.1-h475c297_4 ... installing: ncurses-6.0-hd04f020_2 ... installing: libedit-3.1-hb4e282d_0 ... installing: readline-7.0-hc1231fa_4 ... installing: sqlite-3.20.1-h7e4c145_2 ... installing: asn1crypto-0.23.0-py36h782d450_0 ... installing: certifi-2017.11.5-py36ha569be9_0 ... installing: chardet-3.0.4-py36h96c241c_1 ... installing: idna-2.6-py36h8628d0a_1 ... installing: pycosat-0.6.3-py36hee92d8f_0 ... installing: pycparser-2.18-py36h724b2fc_1 ... installing: pysocks-1.6.7-py36hfa33cec_1 ... installing: python.app-2-py36h54569d5_7 ... installing: ruamel_yaml-0.11.14-py36h9d7ade0_2 ... installing: six-1.11.0-py36h0e22d5e_1 ... installing: cffi-1.11.2-py36hd3e6348_0 ... installing: setuptools-36.5.0-py36h2134326_0 ... installing: cryptography-2.1.4-py36h842514c_0 ... installing: wheel-0.30.0-py36h5eb2c71_1 ... installing: pip-9.0.1-py36h1555ced_4 ... installing: pyopenssl-17.5.0-py36h51e4350_0 ... installing: urllib3-1.22-py36h68b9469_0 ... installing: requests-2.18.4-py36h4516966_1 ... installing: conda-4.3.31-py36_0 ... installation finished. /Users/john/miniconda3/bin/conda install -y --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults conda=4.3.33 conda-build=2.1.18 Fetching package metadata ................... Solving package specifications: .

Package plan for installation in environment /Users/john/miniconda3:

The following NEW packages will be INSTALLED:

beautifulsoup4: 4.6.0-py36_0  conda-forge
conda-build:    2.1.18-py36_0 conda-forge
conda-verify:   2.0.0-py36_0  conda-forge
filelock:       3.0.4-py36_0  conda-forge
jinja2:         2.10-py36_0   conda-forge
markupsafe:     1.0-py36_0    conda-forge
pkginfo:        1.4.2-py36_0  conda-forge
pycrypto:       2.6.1-py36_1  conda-forge
pyyaml:         3.12-py36_1   conda-forge

The following packages will be UPDATED:

conda:          4.3.31-py36_0             --> 4.3.33-py36_0 conda-forge

beautifulsoup4 100% |###################################################################| Time: 0:00:00 782.08 kB/s filelock-3.0.4 100% |###################################################################| Time: 0:00:00 7.95 MB/s markupsafe-1.0 100% |###################################################################| Time: 0:00:00 5.82 MB/s pkginfo-1.4.2- 100% |###################################################################| Time: 0:00:00 1.18 MB/s pycrypto-2.6.1 100% |###################################################################| Time: 0:00:00 1.69 MB/s pyyaml-3.12-py 100% |###################################################################| Time: 0:00:00 3.31 MB/s conda-verify-2 100% |###################################################################| Time: 0:00:00 6.91 MB/s jinja2-2.10-py 100% |###################################################################| Time: 0:00:00 2.81 MB/s conda-4.3.33-p 100% |###################################################################| Time: 0:00:00 621.27 kB/s conda-build-2. 100% |###################################################################| Time: 0:00:00 2.16 MB/s Conda installation succeeded - Conda is available at '/Users/john/miniconda3/bin/conda'

Note

Why not just use containers?

Containers are great, use containers (be it Docker, Singularity, etc.) whenever possible to increase reproducibility and portability of your tools and workflow. Building ad hoc containers to support CWL tools (e.g. custom Dockerfile definitions) has serious limitations, in the next tutorial on containers we will argue that using Biocontainers built or discovered from your tool’s Software Requirements is a superior approach.

Besides leading to better containers, there are other reasons to describeSoftware Requirements also - it will allow your tool to be used in environments without container runtimes available and provides valuable and actionable metadata about the computation described by the tool.

Read more about this whole dependency stack in our preprint Practical computational reproducibility in the life sciences

The Common Workflow Language specification loosely describesSoftware Requirements - a way to map CWL hints to packages, environment modules, or any other mechanism to describe dependencies for running a tool outside of a container. The large and active Galaxy tool development community has built an open source library and set of best practices for describing dependencies for Galaxy that should work just as well for CWL. The library has been integrated with cwltool and Toil to enable CWL tool authors and users to leverage the power and flexibility of the Galaxy dependency management and best practices.

While Software Requirements can be configured to resolve dependencies various ways, Planemo is configured with opinionated defaults geared at making building CWL tools that target Conda as easy as possible and build tools with requirements compatible withcwltool and Toil when running outside containers.

During the tool development introductory tutorial, we called planemo tool_initwith the argument --requirement seqtk@1.2 and the resulting tool contained such a SoftwareRequirement in the form the following the YAML fragment:

SoftwareRequirement: packages:

Planemo (and cwltool and Toil) can interpret these SoftwareRequirement annotations in various ways including as Conda packages. When interpreting these as Conda packages these runtimes can setup isolated, reproducible Conda environments for tool execution with the correct packages installed (e.g. seqtk in the above example).

Note

Why Conda?

Many different package managers could potentially be targeted here, but we focus on Condafor a few key reasons.

Note

Conda Terminology

Diagram describing the relationship between Conda, Miniconda, and Anaconda.

Conda recipes build packages that are published to channels.

Planemo is setup to target a few channels by default, these include iuc, bioconda,conda_forge, defaults - the whole dependency management scheme outlined here works a lot better if packages can be found in one of these “best practice” channels.

We can check if the requirements on a tool are available in best practice Conda channels using an extended form of the planemo lint command (planemo lint was introduced in the introductory tutorial). Passing --conda_requirements flag will ensure all listed requirements are found.

$ planemo lint --conda_requirements seqtk_seq.cwl Linting tool /Users/john/workspace/planemo/docs/writing/seqtk_seq.cwl ... Applying linter requirements_in_conda... CHECK .. INFO: Requirement [seqtk@1.2] matches target in best practice Conda channel [https://conda.anaconda.org/bioconda/osx-64].

Note

You can download a more complete version of the CWL seqtk seq from the Planemo tutorial using the command:

$ planemo project_init --template=seqtk_complete_cwl seqtk_example $ cd seqtk_example

We can verify these tool requirements install with the conda_install command. With its default parameters conda_install processes tools and creates isolated environments for their declared Software Requirements (mirroring what can be done in production withcwltool and Toil).

$ planemo conda_install seqtk_seq.cwl Install conda target CondaTarget[seqtk,version=1.2] /home/john/miniconda3/bin/conda create -y --name __seqtk@1.2 seqtk=1.2 Fetching package metadata ............... Solving package specifications: ..........

Package plan for installation in environment /home/john/miniconda2/envs/__seqtk@1.2:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
seqtk-1.2                  |                0          29 KB  bioconda

The following NEW packages will be INSTALLED:

seqtk: 1.2-0   bioconda
zlib:  1.2.8-3

Fetching packages ... seqtk-1.2-0.ta 100% |#############################################################| Time: 0:00:00 444.71 kB/s Extracting packages ... [ COMPLETE ]|################################################################################| 100% Linking packages ... [ COMPLETE ]|################################################################################| 100% #

To activate this environment, use:

> source activate __seqtk@1.2

To deactivate this environment, use:

> source deactivate __seqtk@1.2

$ which seqtk seqtk not found $

The above install worked properly, but seqtk is not on your PATH because this merely created an environment within the Conda directory for the seqtk installation. Planemo will configure cwltool during testing to reuse this environment. If you wish to interactively explore the resulting enviornment to explore the installed tool or produce test data the output of the conda_env command can be sourced.

$ . <(planemo conda_env seqtk_seq.cwl) Deactivate environment with conda_env_deactivate (seqtk_seq) $ which seqtk /home/planemo/miniconda3/envs/jobdepsiJClEUfecc6d406196737781ff4456ec60975c137e04884e4f4b05dc68192f7cec4656/bin/seqtk (seqtk_seq) $ seqtk seq

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0] -X INT mask bases with quality higher than INT [255] -n CHAR masked bases converted to CHAR; 0 for lowercase [0] -l INT number of residues per line; 0 for 2^32-1 [0] -Q INT quality shift: ASCII-INT gives base quality [33] -s INT random seed (effective with -f) [11] -f FLOAT sample FLOAT fraction of sequences [1] -M FILE mask regions in BED or name list FILE [null] -L INT drop sequences with length shorter than INT [0] -c mask complement region (effective with -M) -r reverse complement -A force FASTA output (discard quality) -C drop comments at the header lines -N drop sequences containing ambiguous bases -1 output the 2n-1 reads only -2 output the 2n reads only -V shift quality by '(-Q) - 33' -U convert all bases to uppercases -S strip of white spaces in sequences (seqtk_seq) $ conda_env_deactivate $

As shown above the conda_env_deactivate will be created in this environment and can be used to restore your initial shell configuration.

Here is a portion of the output from the testing command planemo test seqtk_seq.cwldemonstrating using this tool.

$ planemo test --no-container seqtk_seq.cwl Enable beta testing mode for testing. cwltool INFO: /Users/john/workspace/planemo/.venv/bin/planemo 1.0.20170828135420 cwltool INFO: Resolved '/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' cwltool INFO: [job seqtk_seq.cwl] /private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpaDQ1nK$ seqtk
seq
-a
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpJtPKCr/stg24cf7e67-5ca6-44a4-a46b-26cbe104e1d4/2.fastq > /private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpaDQ1nK/out cwltool INFO: [job seqtk_seq.cwl] completed success cwltool INFO: Final process status is success galaxy.tools.parser.factory INFO: Loading CWL tool - this is experimental - tool likely will not function in future at least in same way. All 1 test(s) executed passed. seqtk_seq_0: passed

Since seqtk isn’t on the path and we did not use a container, we can see the SoftwareRequirement resolution was successful and it found the environment we previously installed with conda_install.

This can be used outside of Planemo testing as well, the following invocation shows running a job with cwltool using an environment like the one created above:

$ cwltool --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml /Users/john/workspace/planemo/.venv/bin/cwltool 1.0.20180508202931 Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' No handlers could be found for logger "rdflib.term" [job seqtk_seq.cwl] /private/tmp/docker_tmpDQYeqC$ seqtk
seq
-a
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpQwBqPo/stg8cf2282a-d807-4f90-b94d-feeda004cacd/2.fastq > /private/tmp/docker_tmpDQYeqC/out PREFIX=/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda installing: python-3.6.3-h47c878a_7 ... Python 3.6.3 :: Anaconda, Inc. installing: ca-certificates-2017.08.26-ha1e5d58_0 ... installing: conda-env-2.6.0-h36134e3_0 ... installing: libcxxabi-4.0.1-hebd6815_0 ... installing: tk-8.6.7-h35a86e2_3 ... installing: xz-5.2.3-h0278029_2 ... installing: yaml-0.1.7-hc338f04_2 ... installing: zlib-1.2.11-hf3cbc9b_2 ... installing: libcxx-4.0.1-h579ed51_0 ... installing: openssl-1.0.2n-hdbc3d79_0 ... installing: libffi-3.2.1-h475c297_4 ... installing: ncurses-6.0-hd04f020_2 ... installing: libedit-3.1-hb4e282d_0 ... installing: readline-7.0-hc1231fa_4 ... installing: sqlite-3.20.1-h7e4c145_2 ... installing: asn1crypto-0.23.0-py36h782d450_0 ... installing: certifi-2017.11.5-py36ha569be9_0 ... installing: chardet-3.0.4-py36h96c241c_1 ... installing: idna-2.6-py36h8628d0a_1 ... installing: pycosat-0.6.3-py36hee92d8f_0 ... installing: pycparser-2.18-py36h724b2fc_1 ... installing: pysocks-1.6.7-py36hfa33cec_1 ... installing: python.app-2-py36h54569d5_7 ... installing: ruamel_yaml-0.11.14-py36h9d7ade0_2 ... installing: six-1.11.0-py36h0e22d5e_1 ... installing: cffi-1.11.2-py36hd3e6348_0 ... installing: setuptools-36.5.0-py36h2134326_0 ... installing: cryptography-2.1.4-py36h842514c_0 ... installing: wheel-0.30.0-py36h5eb2c71_1 ... installing: pip-9.0.1-py36h1555ced_4 ... installing: pyopenssl-17.5.0-py36h51e4350_0 ... installing: urllib3-1.22-py36h68b9469_0 ... installing: requests-2.18.4-py36h4516966_1 ... installing: conda-4.3.31-py36_0 ... installation finished. Fetching package metadata ................. Solving package specifications: .

Package plan for installation in environment /Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda:

The following packages will be UPDATED:

conda: 4.3.31-py36_0 --> 4.3.33-py36_0 conda-forge

conda-4.3.33-p 100% |#################################################################| Time: 0:00:00 1.13 MB/s

Package plan for installation in environment /Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/cwltool_deps/_conda/envs/__seqtk@1.2:

The following NEW packages will be INSTALLED:

seqtk: 1.2-1    bioconda
zlib:  1.2.11-0 conda-forge

[job seqtk_seq.cwl] completed success { "output1": { "checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2", "basename": "out", "location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out", "path": "/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out", "class": "File", "size": 150 } } Final process status is success

This demonstrates that cwltool will install the packages needed on the first run, if we rerun cwltool it will reuse that previous environment.

$ cwltool --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml /Users/john/workspace/planemo/.venv/bin/cwltool 1.0.20180508202931 Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' No handlers could be found for logger "rdflib.term" [job seqtk_seq.cwl] /private/tmp/docker_tmp4vvE_i$ seqtk
seq
-a
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpcvQ3Ph/stg2ef3a21c-9fb0-4099-88c2-36e24719901d/2.fastq > /private/tmp/docker_tmp4vvE_i/out [job seqtk_seq.cwl] completed success { "output1": { "checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2", "basename": "out", "location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out", "path": "/Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out", "class": "File", "size": 150 } } Final process status is success

And the same thing is possible with Toil.

$ cwltoil --no-container --beta-conda-dependencies seqtk_seq.cwl seqtk_seq_job.yml jlaptop17.local 2018-05-23 15:27:25,754 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'. jlaptop17.local 2018-05-23 15:27:25,785 MainThread INFO toil.jobStores.abstractJobStore: The workflow ID is: '92328fb2-33b7-44cd-879f-41d8cbf94555' Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' jlaptop17.local 2018-05-23 15:27:25,787 MainThread INFO cwltool: Resolved 'seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' jlaptop17.local 2018-05-23 15:27:27,002 MainThread WARNING rdflib.term: http://schema.org/docs/!DOCTYPE html does not look like a valid URI, trying to serialize this will break. jlaptop17.local 2018-05-23 15:27:27,396 MainThread INFO rdflib.plugins.parsers.pyRdfa: Current options: preserve space : True output processor graph : True output default graph : True host language : RDFa Core accept embedded RDF : False check rdfa lite : False cache vocabulary graphs : False

jlaptop17.local 2018-05-23 15:27:29,797 MainThread INFO toil.common: Using the single machine batch system jlaptop17.local 2018-05-23 15:27:29,798 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxCores to CPU count of system (8). jlaptop17.local 2018-05-23 15:27:29,798 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (17179869184). jlaptop17.local 2018-05-23 15:27:29,808 MainThread INFO toil.common: Created the workflow directory at /var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/toil-92328fb2-33b7-44cd-879f-41d8cbf94555-132281828025877 jlaptop17.local 2018-05-23 15:27:29,808 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxDisk to physically available disk (202669449216). jlaptop17.local 2018-05-23 15:27:29,815 MainThread INFO toil.common: User script ModuleDescriptor(dirPath='/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages', name='toil.cwl.cwltoil', fromVirtualEnv=True) belongs to Toil. No need to auto-deploy it. jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: No user script to auto-deploy. jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: Written the environment for the jobs to the environment file jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: Caching all jobs in job store jlaptop17.local 2018-05-23 15:27:29,816 MainThread INFO toil.common: 0 jobs downloaded. jlaptop17.local 2018-05-23 15:27:29,911 MainThread INFO toil: Running Toil version 3.15.0-0e3a87e738f5e0e7cff64bfdad337d592bd92704. jlaptop17.local 2018-05-23 15:27:29,911 MainThread INFO toil.realtimeLogger: Real-time logging disabled jlaptop17.local 2018-05-23 15:27:29,937 MainThread INFO toil.toilState: (Re)building internal scheduler state 2018-05-23 15:27:29,937 - toil.toilState - INFO - (Re)building internal scheduler state jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Found 1 jobs to start and 0 jobs with successors to run 2018-05-23 15:27:29,938 - toil.leader - INFO - Found 1 jobs to start and 0 jobs with successors to run jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Checked batch system has no running jobs and no updated jobs 2018-05-23 15:27:29,938 - toil.leader - INFO - Checked batch system has no running jobs and no updated jobs jlaptop17.local 2018-05-23 15:27:29,938 MainThread INFO toil.leader: Starting the main loop 2018-05-23 15:27:29,938 - toil.leader - INFO - Starting the main loop jlaptop17.local 2018-05-23 15:27:29,939 MainThread INFO toil.leader: Issued job 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU with job batch system ID: 0 and cores: 1, disk: 3.0 G, and memory: 2.0 G 2018-05-23 15:27:29,939 - toil.leader - INFO - Issued job 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU with job batch system ID: 0 and cores: 1, disk: 3.0 G, and memory: 2.0 G jlaptop17.local 2018-05-23 15:27:31,409 MainThread INFO toil.leader: Job ended successfully: 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU 2018-05-23 15:27:31,409 - toil.leader - INFO - Job ended successfully: 'file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/seqtk_seq.cwl' seqtk seq e/V/jobsxUpTU jlaptop17.local 2018-05-23 15:27:31,411 MainThread INFO toil.leader: Finished the main loop: no jobs left to run 2018-05-23 15:27:31,411 - toil.leader - INFO - Finished the main loop: no jobs left to run jlaptop17.local 2018-05-23 15:27:31,411 MainThread INFO toil.serviceManager: Waiting for service manager thread to finish ... 2018-05-23 15:27:31,411 - toil.serviceManager - INFO - Waiting for service manager thread to finish ... jlaptop17.local 2018-05-23 15:27:31,946 MainThread INFO toil.serviceManager: ... finished shutting down the service manager. Took 0.535056114197 seconds 2018-05-23 15:27:31,946 - toil.serviceManager - INFO - ... finished shutting down the service manager. Took 0.535056114197 seconds jlaptop17.local 2018-05-23 15:27:31,947 MainThread INFO toil.statsAndLogging: Waiting for stats and logging collator thread to finish ... 2018-05-23 15:27:31,947 - toil.statsAndLogging - INFO - Waiting for stats and logging collator thread to finish ... jlaptop17.local 2018-05-23 15:27:31,960 MainThread INFO toil.statsAndLogging: ... finished collating stats and logs. Took 0.0131621360779 seconds 2018-05-23 15:27:31,960 - toil.statsAndLogging - INFO - ... finished collating stats and logs. Took 0.0131621360779 seconds jlaptop17.local 2018-05-23 15:27:31,961 MainThread INFO toil.leader: Finished toil run successfully 2018-05-23 15:27:31,961 - toil.leader - INFO - Finished toil run successfully { "output1": { "checksum": "sha1$322e001e5a99f19abdce9f02ad0f02a17b5066c2", "basename": "out", "nameext": "", "nameroot": "out", "http://commonwl.org/cwltool#generation": 0, "location": "file:///Users/john/workspace/planemo/project_templates/seqtk_complete_cwl/out", "class": "File", "size": 150 } jlaptop17.local 2018-05-23 15:27:31,972 MainThread INFO toil.common: Successfully deleted the job store: <toil.jobStores.fileJobStore.FileJobStore object at 0x10554d490> }2018-05-23 15:27:31,972 - toil.common - INFO - Successfully deleted the job store: <toil.jobStores.fileJobStore.FileJobStore object at 0x10554d490>

Finding Existing Conda Packages

How did we know what software name and software version to use? We found the existing packages available for Conda and referenced them. To do this yourself, you can simply use the planemo command conda_search. If we do a search for seqt it will show all the software and all the versions available matching that search term - includingseqtk.

$ planemo conda_search seqt /Users/john/miniconda3/bin/conda search --override-channels --channel iuc --channel conda-forge --channel bioconda --channel defaults 'seqt' Loading channels: done

Name Version Build Channel

bioconductor-htseqtools 1.26.0 r3.4.1_0 bioconda bioconductor-seqtools 1.10.0 r3.3.2_0 bioconda bioconductor-seqtools 1.10.0 r3.4.1_0 bioconda bioconductor-seqtools 1.12.0 r3.4.1_0 bioconda seqtk r75 0 bioconda seqtk r82 0 bioconda seqtk r82 1 bioconda seqtk r93 0 bioconda seqtk 1.2 0 bioconda seqtk 1.2 1 bioconda

Note

The Planemo command conda_search is a light wrapper around the underlyingconda search command but configured to use the same channels and other options as Planemo and Galaxy. The following Conda command would also work to search:

$ $HOME/miniconda3/bin/conda -c iuc -c conda-forge -c bioconda 'seqt'

For Conda versions 4.3.X or less, the search invocation would be something a bit different:

$ $HOME/miniconda3/bin/conda -c iuc -c conda-forge -c bioconda seqt

Alternatively the Anaconda website can be used to search for packages. Typing seqtkinto the search form on that page and clicking the top result will bring on to this page with information about the Bioconda package.

When using the website to search though, you need to aware of what channel you are using. By default, Planemo and Galaxy will search a few different Conda channels. While it is possible to configure a local Planemo or Galaxy to target different channels - the current best practice is to add tools to the existing channels.

The existing channels include:

Exercise - Leveraging Bioconda

Use the project_init command to download this exercise.

$ planemo project_init --template conda_exercises_cwl conda_exercises $ cd conda_exercises/exercise_1 $ ls pear.cwl test-data

This project template contains a few exercises. The first uses a CWL tool forPEAR - Paired-End reAd mergeR. This tool however has no SoftwareRequirement or container annotations and so will not work properly without modification.

  1. Run planemo test pear.cwl to verify the tool does not function without dependencies defined.
  2. Use --conda_requirements flag with planemo lint to verify it does indeed lack requirements.
  3. Use planemo conda_search or the Anaconda website to search for the correct package and version in a best practice channel.
  4. Update pear.cwl with the correct SoftwareRequirement hints.
  5. Re-run the lint command from above to verify the tool now has the correct dependency definition.
  6. Re-run the test command from above to verify the tool test now works properly.

Building New Conda Packages

Frequently packages your tool will require are not found in Biocondaor conda-forge yet. In these cases, it is likely best to contribute your package to one of these projects. Unless the tool is exceedingly general Bioconda is usually the correct starting point.

Note

Many things that are not strictly or even remotely “bio” have been accepted into Bioconda - including tools for image analysis, natural language processing, and cheminformatics.

To get quickly learn to write Conda recipes for typical Galaxy tools, please read the following pieces of external documentation.

These guidelines in particular can be skimmed depending on your recipe type, for instance that document provides specific advice for:

To go a little deeper, you may want to read:

And finally to debug problems the Bioconda troubleshootingdocumentation may prove useful.

Exercise - Build a Recipe

If you have just completed the exercise above - this exercise can be found in parent folder. Get there with cd ../exercise_2. If not, the exercise can be downloaded with

$ planemo project_init --template conda_exercises_cwl conda_exercises $ cd conda_exercises/exercise_2 $ ls fleeqtk_seq.cwl fleeqtk_seq_tests.yml test-data

This is the skeleton of a tool wrapping the parody bioinformatics software package fleeqtk. fleeqtk is a fork of the project seqtk that many Planemo tutorials are built around and the example tool should hopefully be fairly familiar. fleeqtk version 1.3 can be downloaded from here and built usingmake. The result of make includes a single executable fleeqtk.

  1. Clone and branch Bioconda.
  2. Build a recipe for fleeqtk version 1.3. You may wish to start from scratch (conda skeleton is not available for C programs like fleeqtk), or copy the recipe of seqtk and modify it for fleeqtk.
  3. Use conda build or Bioconda tooling to build the recipe.
  4. Run planemo test --conda_use_local fleeqtk_seq.cwl to verify the resulting package works as expected.

Congratulations on writing a Conda recipe and building a package! Upon succesfully building and testing such a Bioconda package, you would normally push your branch to Github and open a pull request. This step is skipped here as to not pollute Bioconda with unneeded software packages.

Dependencies and Containers

Note

This section is a continuation of Dependencies and Conda, please review that section for background information on resolvingSoftware Requirements with Conda.

Common Workflow Language tools can be annotated with arbitrary Docker requirements, see the CWL User Guidefor a discussion about how to do this in general.

This document will discuss some techniques to find containers automatically from the SoftwareRequirement annotations when using Planemo, cwltool, or Toil. You will ultimately want to explicitly annotate your tools with the containers we describe here so that other CWL implementations will be able to find containers for your tool, but there are real advantages to using these containers instead of ad-hoc things you may build with a Dockerfile.

Read more about this reproducibility stack in our preprint Practical computational reproducibility in the life sciences.

BioContainers

Note

This section is a continuation of Dependencies and Conda, please review that section for background information on resolvingSoftware Requirements with Conda.

Finding BioContainers

If a tool contains Software Requirements in best practice Conda channels, aBioContainers-style container can be found or built for it.

As reminder, planemo lint --conda_requirements <tool.cwl> can be used to check if a tool contains only best-practice requirement tags. The lintcommand can also be fed the --biocontainers flag to check if aBioContainers container has been registered that is compatible with that tool.

This last linter indicates that indeed a container has been registered that is compatible with this tool – quay.io/biocontainers/seqtk:1.2--1. We didn’t do any extra work to build this container for this tool, allBioconda recipes are packaged into containers and registered on quay.ioas part of the BioContainers project.

This tool can be tested using planemo test in its BioContainer Docker container using the flag --biocontainers as shown below.

The Conda exercises project template has an example tool (exercise3) that we can use to demonstrate --biocontainers. If you are continuing from the Conda tutorial, simply move to ../exercise3 otherwise using planemo project_initto grab the exercise as show below.

$ planemo project_init --template conda_exercises_cwl conda_exercises $ cd conda_exercises/exercise3 $ planemo lint --biocontainers seqtk_seq.cwl Linting tool /home/planemo/conda_exercises_cwl/exercise_3/seqtk_seq.cwl Applying linter general... CHECK .. CHECK: Tool defines a version [0.0.1]. .. CHECK: Tool defines a name [Convert to FASTA (seqtk)]. .. CHECK: Tool defines an id [seqtk_seq]. .. CHECK: Tool specifies profile version [16.04]. Applying linter cwl_validation... CHECK .. INFO: CWL appears to be valid. Applying linter docker_image... WARNING .. WARNING: Tool does not specify a DockerPull source. Applying linter new_draft... CHECK .. INFO: Modern CWL version [v1.0] Applying linter biocontainer_registered... CHECK .. INFO: BioContainer best-practice container found [quay.io/biocontainers/seqtk:1.2--1]. Failed linting

$ planemo test --biocontainers seqtk_seq.cwl Enable beta testing mode for testing. cwltool INFO: /Users/john/workspace/planemo/.venv/bin/planemo 1.0.20180508202931 cwltool INFO: Resolved '/Users/john/workspace/planemo/project_templates/conda_exercises_cwl/exercise_3/seqtk_seq.cwl' to 'file:///Users/john/workspace/planemo/project_templates/conda_exercises_cwl/exercise_3/seqtk_seq.cwl' galaxy.tools.deps.containers INFO: Checking with container resolver [ExplicitContainerResolver[]] found description [None] galaxy.tools.deps.containers INFO: Checking with container resolver [CachedMulledDockerContainerResolver[namespace=biocontainers]] found description [None] galaxy.tools.deps.containers INFO: Checking with container resolver [MulledDockerContainerResolver[namespace=biocontainers]] found description [ContainerDescription[identifier=quay.io/biocontainers/seqtk:1.2--1,type=docker]] cwltool INFO: [job seqtk_seq.cwl] /private/tmp/docker_tmpMEipaU$ docker
run
-i
--volume=/private/tmp/docker_tmpMEipaU:/private/tmp/docker_tmpMEipaU:rw
--volume=/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpxkm9dp:/tmp:rw
--volume=/Users/john/workspace/planemo/project_templates/conda_exercises_cwl/exercise_3/test-data/2.fastq:/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpjAVM_1/stgddf6fc2a-dd13-4322-9b88-68571a1697dd/2.fastq:ro
--workdir=/private/tmp/docker_tmpMEipaU
--read-only=true
--log-driver=none
--user=502:20
--rm
--env=TMPDIR=/tmp
--env=HOME=/private/tmp/docker_tmpMEipaU
quay.io/biocontainers/seqtk:1.2--1
seqtk
seq
-a
/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpjAVM_1/stgddf6fc2a-dd13-4322-9b88-68571a1697dd/2.fastq > /private/tmp/docker_tmpMEipaU/out cwltool INFO: [job seqtk_seq.cwl] completed success cwltool INFO: Final process status is success All 1 test(s) executed passed. seqtk_seq_0: passed

Exercise - Leveraging Bioconda

  1. Try the above command without the --biocontainers argument. Verify the tool does not run in a container by default.
  2. Add a DockerRequirement based on the the lint output above to annotate this tool with a Biocontainers Docker container and rerun test to verify the tool works now.

Building BioContainers

In this seqtk example above the relevant BioContainer already existed on quay.io, this won’t always be the case. For tools that contain multiple Software Requirementstags an existing container likely won’t exist. The mulled toolkit (distributed with planemo or available standalone) can be used to build containers for such tools. For such tools, if cwltool or Toil is configured to use BioContainers it will attempt to build these containers on the fly by default (though this behavior can be disabled).

You can try it directly using the mull command in Planemo. The conda_testingPlanemo project template has a toy example tool with two requirements for demonstrating this - bwa_and_samtools.cwl.

$ planemo project_init --template=conda_testing_cwl conda_testing $ cd conda_testing/ $ planemo mull bwa_and_samtools.cwl /Users/john/.planemo/involucro -v=3 -f /Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua -set CHANNELS='iuc,bioconda,r,defaults,conda-forge' -set TEST='true' -set TARGETS='samtools=1.3.1,bwa=0.7.15' -set REPO='quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820' -set BINDS='build/dist:/usr/local/' -set PREINSTALL='conda install --quiet --yes conda=4.3' build /Users/john/.planemo/involucro -v=3 -f /Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua -set CHANNELS='iuc,bioconda,r,defaults,conda-forge' -set TEST='true' -set TARGETS='samtools=1.3.1,bwa=0.7.15' -set REPO='quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820' -set BINDS='build/dist:/usr/local/' -set PREINSTALL='conda install --quiet --yes conda=4.3' build [Jun 19 11:28:35] DEBU Run file [/Users/john/workspace/planemo/.venv/lib/python2.7/site-packages/galaxy_lib-17.9.0-py2.7.egg/galaxy/tools/deps/mulled/invfile.lua] [Jun 19 11:28:35] STEP Run image [continuumio/miniconda:latest] with command [[rm -rf /data/dist]] [Jun 19 11:28:35] DEBU Creating container [step-730a02d79e] [Jun 19 11:28:35] DEBU Created container [5e4b5f83c455 step-730a02d79e], starting it [Jun 19 11:28:35] DEBU Container [5e4b5f83c455 step-730a02d79e] started, waiting for completion [Jun 19 11:28:36] DEBU Container [5e4b5f83c455 step-730a02d79e] completed with exit code [0] as expected [Jun 19 11:28:36] DEBU Container [5e4b5f83c455 step-730a02d79e] removed [Jun 19 11:28:36] STEP Run image [continuumio/miniconda:latest] with command [[/bin/sh -c conda install --quiet --yes conda=4.3 && conda install -c iuc -c bioconda -c r -c defaults -c conda-forge samtools=1.3.1 bwa=0.7.15 -p /usr/local --copy --yes --quiet]] [Jun 19 11:28:36] DEBU Creating container [step-e95bf001c8] [Jun 19 11:28:36] DEBU Created container [72b9ca0e56f8 step-e95bf001c8], starting it [Jun 19 11:28:37] DEBU Container [72b9ca0e56f8 step-e95bf001c8] started, waiting for completion [Jun 19 11:28:46] SOUT Fetching package metadata ......... [Jun 19 11:28:47] SOUT Solving package specifications: . [Jun 19 11:28:50] SOUT [Jun 19 11:28:50] SOUT Package plan for installation in environment /opt/conda: [Jun 19 11:28:50] SOUT [Jun 19 11:28:50] SOUT The following packages will be UPDATED: [Jun 19 11:28:50] SOUT [Jun 19 11:28:50] SOUT conda: 4.3.11-py27_0 --> 4.3.22-py27_0 [Jun 19 11:28:50] SOUT [Jun 19 11:29:04] SOUT Fetching package metadata ................. [Jun 19 11:29:06] SOUT Solving package specifications: . [Jun 19 11:29:56] SOUT [Jun 19 11:29:56] SOUT Package plan for installation in environment /usr/local: [Jun 19 11:29:56] SOUT [Jun 19 11:29:56] SOUT The following NEW packages will be INSTALLED: [Jun 19 11:29:56] SOUT [Jun 19 11:29:56] SOUT bwa: 0.7.15-1 bioconda [Jun 19 11:29:56] SOUT curl: 7.52.1-0 [Jun 19 11:29:56] SOUT libgcc: 5.2.0-0 [Jun 19 11:29:56] SOUT openssl: 1.0.2l-0 [Jun 19 11:29:56] SOUT pip: 9.0.1-py27_1 [Jun 19 11:29:56] SOUT python: 2.7.13-0 [Jun 19 11:29:56] SOUT readline: 6.2-2 [Jun 19 11:29:56] SOUT samtools: 1.3.1-5 bioconda [Jun 19 11:29:56] SOUT setuptools: 27.2.0-py27_0 [Jun 19 11:29:56] SOUT sqlite: 3.13.0-0 [Jun 19 11:29:56] SOUT tk: 8.5.18-0 [Jun 19 11:29:56] SOUT wheel: 0.29.0-py27_0 [Jun 19 11:29:56] SOUT zlib: 1.2.8-3 [Jun 19 11:29:56] SOUT [Jun 19 11:29:57] DEBU Container [72b9ca0e56f8 step-e95bf001c8] completed with exit code [0] as expected [Jun 19 11:29:57] DEBU Container [72b9ca0e56f8 step-e95bf001c8] removed [Jun 19 11:29:57] STEP Wrap [build/dist] as [quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0] [Jun 19 11:29:57] DEBU Creating container [step-6f1c176372] [Jun 19 11:29:58] DEBU Packing succeeded

As the output indicates, this command built the container namedquay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0. This is the same namespace / URL that would be used if or when published by the BioContainers project.

Note

The first part of this mulled-v2 hash is a hash of the package names that went into it, the second the packages used and build number. Check out the Multi-package Containersweb application to explore best practice channels and build such hashes.

We can see this new container when running the Docker command images and explore the new container interactively with docker run.

$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40 03dc1d2818d9de56938078b8b78b82d967c1f820-0 a740fe1e6a9e 16 hours ago 104 MB quay.io/biocontainers/seqtk 1.2--0 10bc359ebd30 2 days ago 7.34 MB continuumio/miniconda latest 6965a4889098 3 weeks ago 437 MB bgruening/busybox-bash 0.1 3d974f51245c 9 months ago 6.73 MB $ docker run -i -t quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0 /bin/bash bash-4.2# which samtools /usr/local/bin/samtools bash-4.2# which bwa /usr/local/bin/bwa

As before, we can test running the tool inside its container in cwltool using the --biocontainers flag.

$ planemo test --biocontainers bwa_and_samtools.cwl Enable beta testing mode for testing. cwltool INFO: /Users/john/workspace/planemo/.venv/bin/planemo 1.0.20180508202931 cwltool INFO: Resolved '/Users/john/workspace/planemo/project_templates/conda_testing_cwl/bwa_and_samtools.cwl' to 'file:///Users/john/workspace/planemo/project_templates/conda_testing_cwl/bwa_and_samtools.cwl' galaxy.tools.deps.containers INFO: Checking with container resolver [ExplicitContainerResolver[]] found description [None] galaxy.tools.deps.containers INFO: Checking with container resolver [CachedMulledDockerContainerResolver[namespace=biocontainers]] found description [ContainerDescription[identifier=quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0,type=docker]] cwltool INFO: [job bwa_and_samtools.cwl] /private/tmp/docker_tmpYJnmO4$ docker
run
-i
--volume=/private/tmp/docker_tmpYJnmO4:/private/tmp/docker_tmpYJnmO4:rw
--volume=/private/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/tmpVI06me:/tmp:rw
--workdir=/private/tmp/docker_tmpYJnmO4
--read-only=true
--user=502:20
--rm
--env=TMPDIR=/tmp
--env=HOME=/private/tmp/docker_tmpYJnmO4
quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0
sh
-c
'bwa > bwa_help.txt 2>&1; samtools > samtools_help.txt 2>&1' cwltool INFO: [job bwa_and_samtools.cwl] completed success cwltool INFO: Final process status is success All 1 test(s) executed passed. bwa_and_samtools_0: passed

In particular take note of the line:

2017-03-01 10:20:59,142 INFO [galaxy.tools.deps.containers] Checking with container resolver [CachedMulledDockerContainerResolver[namespace=biocontainers]] found description [ContainerDescription[identifier=quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0,type=docker]]

Here we can see the container ID (quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:03dc1d2818d9de56938078b8b78b82d967c1f820-0) from earlier has been cached on our Docker host is picked up by cwltool. This is used to run the simple tool tests and indeed they pass.

In our initial seqtk example, the container resolver that matched was of typeMulledDockerContainerResolver indicating that the Docker image would be downloaded from the BioContainers repository and this time the resolve that matched was of typeCachedMulledDockerContainerResolver meaning that cwltool would just use the locally cached version from the Docker host (i.e. the one we built with planemo mullabove).

Note

Planemo doesn’t yet expose options that make it possible to build mulled containers for local packages that have yet to be published to anaconda.org but the mulled toolkit allows this. See mulled documentation for more information. However, once a container for a local package is built withmulled-build-tool the --biocontainers command should work to test it.

Publishing BioContainers

Building unpublished BioContainers on the fly is great for testing but for production use and to increase reproducibility such containers should ideally be published as well.

BioContainers maintains a registry of package combinations to be published using these long mulled hashes. This registry is represented as a Github repository named multi-package-containers. The Planemo command container_register will inspect a tool and open a Github pull request to add the tool’s combination of packages to the registry. Once merged, this pull request will result in the corresponding BioContainers image to be published (with the correct mulled has as its name) - these can be subsequently be picked up by Galaxy.

Various Github related settings need to be configured in order for Planemo to be able to open pull requests on your behalf as part of thecontainer_register command. To simplify all of this - the Planemo community maintains a list of Github repositories containing Galaxy and/or CWL tools that are scanned daily by Travis. For each such repository, the Travis job will runcontainer_register across the repository on all tools resulting in new registry pull requests for all new combinations of tools. This list is maintained in a script named monitor.sh in the planemo-monitor repository. The easiest way to ensure new containers are built for your tools is simply to open open a pull request to add your tool repositories to this list.