GitHub - yocra3/NetActivityTrain: Nextflow pipeline to train models for NetActivity (original) (raw)
Introduction
NetActivityTrain is a bioinformatics pipeline to encode gene expression measurements into gene set activity scores. NetActivityTrain uses sparsely connected autoencoders to perform the encoding.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
Functionalities
NetActivityTrain can be used to train a model or to compute the gene set activity scores from a pre-trained model (under development). The training of a model with NetActivityTrain has the following steps:
- Gene expression standardization
- Split of input data in training and test datasets
- Model training
- Model export for use with NetActivity
Quick Start
- Install Nextflow (
>=22.0.3
) - Install any of Docker, Singularity (you can follow this tutorial). See docs)_.
- Download the pipeline and test it on a minimal dataset with a single command:
nextflow run yocra3/NetActivityTrain -profile test,YOURPROFILE --outdir
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE
in the example command above). You can chain multiple config profiles in a comma-separated string.- The pipeline comes with config profiles called
docker
andsingularity
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
. - Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. - If you are using
singularity
, please use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
- The pipeline comes with config profiles called
- Start running your own analysis!
nextflow run yocra3/NetActivityTrain --data_prefix SE_h5 --gene_mask gene_mask.txt --network network.py --network_params params.py --outdir -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
Documentation
The yocra3/NetActivityTrain pipeline comes with documentation about the pipeline usage, parameters and output.
Credits
yocra3/NetActivityTrain was originally written by @yocra3.
Support
For further information or help, don't hesitate to contact Carlos Ruiz at cruizarenas@unav.es.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.