Creating, Editing and Importing Galaxy Workflows (original) (raw)

Workflows are a powerful feature in Galaxy that allow you to link multiple steps of complex analysis. In this tutorial we will demonstrate how to use the Workflow Editor to construct multiple variants of a simple workflow. Note that these workflows are meant to illustrate different concepts. Not all workflows require using all of the features described below, but we hope this tutorial will inspire you to make your analysis tasks more efficient.

Read about extracting workflows from histories in this tutorial.

Agenda

In this tutorial, we will cover:

  1. Workflow steps
  2. Creating a new workflow
  3. Editing our simple workflow
  4. Embedding a workflow within a workflow
  5. Conclusion

Workflow steps

Workflows logically connect a collection of steps. Possible step types are currently workflow inputs, tools, and workflows.

Creating a new workflow

Hands-on: Create a new workflow

A new empty workflow. Open image in new tab

Figure 1: A new empty workflow

On the left hand side of the Editor you see the available tools in the tool panel. The center panel (or “canvas”) holds the workflow layout. Steps will appear in the center panel. On the right you see the attributes of the workflow, such as name, version, annotation and tags. Depending on the context the contents of the right panel will change, but you can always return to these attributes by clicking on the Edit Attributes button (the Pencil icon on the upper right). If there is no Pencil icon you can find the Edit Attributesbutton under the the Workflow options button (a wheel icon) on the top right of the editor.

We will start by creating a very simple workflow with just 2 tools, and then add more advanced features.

Hands-on: Insert a dataset input

  1. Expand the “Inputs” section in the tool panel and click on “Input dataset” to create a new dataset input
  2. Click on the new input dataset in the center panel. Set the following parameter on the right side:

We’re now ready to add a first tool and connect it to our input dataset.

Hands-on: Add tac reverse a file (reverse cat) to your workflow

  1. Find tac reverse a file (reverse cat) tool in the tool panel and click on it
  2. A new box labeled tac tool will appear in the center panel
  3. Click on tac in the center panel and see the tool parameters on the right side
  4. We will keep the default tool settings and only give the step a label
  1. Click on the round blue symbol of the input dataset and drag the connection to the highlighted round green tool input

Connecting an input. Open image in new tab

Figure 2: Connecting outputs and inputs

This is great, but while a single tool in a workflow might be handy (for instance if there are many parameters to be set), let’s add another tool that works on the output of tac reverse a file (reverse cat) tool for an authentic workflow experience. From now on we’ll contract steps 1 to 4 and just mention the tool and parameters to insert, since the procedure is always the same.

Hands-on: Add Select first lines from a dataset to your workflow

  1. Select first lines from a dataset tool
  1. Connect the output of the Reverse dataset step to the input
  2. Save galaxy-save your workflow using the save button on the top right

We now have a very simple workflow that will reverse the contents of a file and then output the first line of the resulting dataset. Now we’re ready to upload a test dataset and run our workflow.

Hands-on: Running the workflow

  1. Return to the analysis page by clicking the Home button galaxy-home (or Analyze Data on older versions of Galaxy) on the top
  2. Upload a dataset using “Paste/Fetch data” with the contents
  3. Run your workflow
    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

The outputs of the workflow will now appear in your history. In addition to our input file we will see 2 new datasets: 2: tac on data 1which contains the reversed dataset and 3: Select first on data 2 which just contains the line F.

This is fine, but if we want to process many datasets at once the naming of input datasets in the history will be difficult to follow. Luckily we can use dataset collections as inputs, which will maintain element identifiers across all steps of an analysis. We can also add colorful tags that can help us identify groups of datasets and we can label and rename outputs.

Editing our simple workflow

We will now add tags to step outputs and label one of the 2 output datasets.

Hands-on: Editing our simple workflow

  1. Open our simple workflow in the Workflow Editor
  2. Remove the input dataset called A simple text input dataset using the white galaxy-cross icon
  3. Add an input dataset collection and label it
  1. Disconnect the exisiting connections and reconnect
  2. Select the Reverse dataset step and under Configure Output: outfile set
  1. Select the Select first lines step and under Configure Output: outfile set
  1. Save galaxy-save your workflow using the save button on the top right

Hands-on: Running the workflow

  1. Return to the analysis page by clicking the Home button galaxy-home (or Analyze Data on older versions of Galaxy) on the top
  2. Create a dataset collection from the first 2 files in your history
    • Click on galaxy-selector Select Items at the top of the history panel Select Items button
    • Check all the datasets in your history you would like to include
    • Click n of N selected and choose Build Dataset List
      build list collection menu item
    • Enter a name for your collection
    • Click Create collection to build your collection
    • Click on the checkmark icon at the top of your history again
  3. Run your workflow using the newly created collection input
    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

You will now see only 1 new dataset collection, Renamed datasets, in your history. This is because we have labeled only the last step in the workflow. This collection has 2 name tags, reverse and first. The other output collection is hidden in the history but can be seen by clicking on hidden in your history.

We will now use this workflow and embed it in a new workflow.

Embedding a workflow within a workflow

Another step type is the subworkflow. We can use this to include a section of a workflow that is repeated within a workflow or a workflow that contains steps that are useful in more than one workflow, so that we don’t have to maintain and update closely related workflows.

Here we will include our workflow twice within a new workflow and then paste the contents of each workflow together.

Hands-on: Embedding a workflow

  1. Create a new, empty workflow
  2. Insert a dataset collection input
  3. On the left side scroll down until you see the Workflows section
  4. Insert the previously created workflow by clicking on the workflow name
  5. Label the new workflow step:
  1. Repeat steps 4 and 5, but change the Label
  1. Insert Paste two files side by side tool
  2. Connect the 2 workflow outputs to the Paste two files side by side tool input
  3. Save galaxy-save your workflow using the save button on the top right
    • Click on Workflow on the top menu bar of Galaxy. You will see a list of all your workflows.
    • Click on the workflow-run (Run workflow) button next to your workflow
    • Configure the workflow as needed
    • Click the Run Workflow button at the top-right of the screen
    • You may have to refresh your history to see the queued jobs

This is a very contrived example, but this technique can be used to separate re-useable steps in real world scenarios.

Conclusion

You now know the ins and outs of Workflows in Galaxy and should be able to make your analyses more efficient and less manual!