Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences - PubMed (original) (raw)

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks et al. Genome Biol. 2010.

Abstract

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Galaxy analysis workspace. The Galaxy analysis workspace is where users perform genomic analyses. The workspace has four areas: the navigation bar, tool panel (left column), detail panel (middle column), and history panel (right column). The navigation bar provides links to Galaxy's major components, including the analysis workspace, workflows, data libraries, and user repositories (histories, workflows, Pages). The tool panel lists the analysis tools and data sources available to the user. The detail panel displays interfaces for tools selected by the user. The history panel shows data and the results of analyses performed by the user, as well as automatically tracked metadata and user-generated annotations. Every action by the user generates a new history item, which can then be used in subsequent analyses, downloaded, or visualized. Galaxy's history panel helps to facilitate reproducibility by showing provenance of data and by enabling users to extract a workflow from a history, rerun analysis steps, visualize output datasets, tag datasets for searching and grouping, and annotate steps with information about their purpose or importance. Here, step 12 is being rerun.

Figure 2

Figure 2

Galaxy workflow editor. Galaxy's workflow editor provides a graphical user interface for creating and modifying workflows. The editor has four areas: navigation bar, tool bar (left column), editor panel (middle column), and details panel. A user adds tools from the tool panel to the editor panel and configures each step in the workflow using the details panel. The details panel also enables a user to add tags to a workflow and annotate a workflow and workflow steps. Workflows are run in Galaxy's analysis workspace; like all tools executed in Galaxy, Galaxy automatically generates history items and provenance information for each tool executed via a workflow.

Figure 3

Figure 3

Galaxy public repositories and published items. (a) Galaxy's public repository for Pages; there are also public repositories for histories and workflows. Repositories can be searched by name, annotation, owner, and community tags. (b) A published Galaxy workflow. Each shared or published item is displayed in a webpage with its metadata (for example, execution details, user annotations), a link for copying the item into a user's workspace, and links for viewing related items.

Figure 4

Figure 4

Galaxy Pages. Galaxy Page that is an online, interactive supplement for a metagenomic study performed in Galaxy [21]. The Page communicates all facets of the experiment via increasing levels of detail, starting with supplementary text, two embedded histories, and an embedded workflow. Readers can open the embedded items and view details for each step, including provenance information, parameter settings, and annotations. For history steps, readers can view corresponding datasets (red arrow). Readers can also copy histories (green arrow) or the workflow (blue arrow) into their analysis workspace and both reproduce and extend the experiment's analyses without leaving Galaxy or their web browser.

Similar articles

Cited by

References

    1. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007;4:651–657. doi: 10.1038/nmeth1068. - DOI - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009;6:S22–S32. doi: 10.1038/nmeth.1371. - DOI - PMC - PubMed
    1. Statistics Using R with Biological Examples. http://cran.r-project.org/doc/contrib/Seefeld_StatsRBio.pdf
    1. Introduction to Sequence Analysis using EMBOSS. http://emboss.sourceforge.net/docs/emboss_tutorial/emboss_tutorial.html

Publication types

MeSH terms

Grants and funding

LinkOut - more resources