GitHub - AnimalGenomicsETH/bovine-graphs: Integrate multiple genome assemblies into a pangenome graph (original) (raw)

Snakemake License: MIT snakemaker Preprint at bioRxiv

Pangenome Graph Pipeline

Pipeline to integrate multiple genome assemblies into a pangenome graph representation. This pipeline wraps the functionality Minigraph for genome graph construction. We add the utility to labels nodes with the assemblies it derives, which is crucial for many pangenome analyses. Analyses which are common in pangenome studies are also performed, including:

See Here for the scheme of the pipeline and Here for an example of the generated report. Detailed description of the command run is available Here.

Developed for analysis of bovine genomes, but should be applicable to the other species as well.

Citation

Please cite our paper below when using the pipeline/scripts in your research

Danang Crysnanto, Alexander S. Leonard, Zih-Hua Fang, Hubert Pausch. Novel functional sequences uncovered through a bovine multi-assembly graph. PNAS https://doi.org/10.1073/pnas.2101056118

Pipeline usage

Input

Minigraph and Gfatools need to be installed and available in $PATH. Please download Here to get the same version used in the paper. Required python packages, R libraries, and bioinformatic softwares are listed Here. Alternatively, one could use mamba / conda to create an environment with all softwares installed (Minigraph not included). To generate pdf report one need to install weasyprint.

conda env create -f envs/environment.yml
conda activate pangenome 
graph1 UCD,OBV,Angus 
graph2 UCD,Angus 

Usage

Small dataset from three bovine assemblies (10 Mb each) located in test/assembly folder are available for testing. Once all requirements are fullfiled, test can be done with command as below. This will be run test of pangenome construction and SV discovery with the local execution in a single core mode (< 5 mins). When test success it will generate a pdf report in test/report folder as in Here

snakemake -rp -j 1 -s snake_graph.py 

Command below can be invoked for running on real dataset:

# local execution
snakemake -s snake_graph.py

# LSF cluster execution
snakemake --profile "snakemake_profile/lsf" -s snake_graph.py

Output