GitHub - yocra3/NetActivityTrain: Nextflow pipeline to train models for NetActivity (original) (raw)

Introduction

NetActivityTrain is a bioinformatics pipeline to encode gene expression measurements into gene set activity scores. NetActivityTrain uses sparsely connected autoencoders to perform the encoding.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Functionalities

NetActivityTrain can be used to train a model or to compute the gene set activity scores from a pre-trained model (under development). The training of a model with NetActivityTrain has the following steps:

  1. Gene expression standardization
  2. Split of input data in training and test datasets
  3. Model training
  4. Model export for use with NetActivity

Quick Start

  1. Install Nextflow (>=22.0.3)
  2. Install any of Docker, Singularity (you can follow this tutorial). See docs)_.
  3. Download the pipeline and test it on a minimal dataset with a single command:
    nextflow run yocra3/NetActivityTrain -profile test,YOURPROFILE --outdir
    Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.
    • The pipeline comes with config profiles called docker and singularity which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
    • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
    • If you are using singularity, please use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
  4. Start running your own analysis!
    nextflow run yocra3/NetActivityTrain --data_prefix SE_h5 --gene_mask gene_mask.txt --network network.py --network_params params.py --outdir -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>

Documentation

The yocra3/NetActivityTrain pipeline comes with documentation about the pipeline usage, parameters and output.

Credits

yocra3/NetActivityTrain was originally written by @yocra3.

Support

For further information or help, don't hesitate to contact Carlos Ruiz at cruizarenas@unav.es.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.