flowr — Easy, scalable big data pipelines using a Computing Cluster (original) (raw)

Build Status cran downloads

docs.flowr.space Streamlining Workflows

This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow to process big data.

Highlights

Disclamer: Since we are using the same source for HTML and PDF, some plots/tables may not render perfectly in the PDF.

Example

ex_fq_bam

A few lines, to get started

## Latest stable release from CRAN (updated every other month)
## visit docs.flowr.space/install for more details
## for a latest official version (from CRAN)
Rscript -e 'install.packages("flowr", repos = c(CRAN="http://cran.rstudio.com"))'

## Latest stable release from DRAT (updated every other week); CRAN for dependencies
Rscript -e 'install.packages("flowr", repos = c(CRAN="http://cran.rstudio.com", DRAT="http://sahilseth.github.io/drat"))'

Rscript -e 'library(flowr);setup()'

# Run an example pipeline

# style 1: sleep_pipe() function creates system cmds
flowr run x=sleep_pipe platform=local execute=TRUE

# style 2: we start with a tsv of system cmds
# get example files
wget --no-check-certificate http://raw.githubusercontent.com/sahilseth/flowr/master/inst/pipelines/sleep_pipe.tsv
wget --no-check-certificate http://raw.githubusercontent.com/sahilseth/flowr/master/inst/pipelines/sleep_pipe.def

# submit to local machine
flowr to_flow x=sleep_pipe.tsv def=sleep_pipe.def platform=local execute=TRUE
# submit to local LSF cluster
flowr to_flow x=sleep_pipe.tsv def=sleep_pipe.def platform=lsf execute=TRUE

Resources

Acknowledgements