GitHub - genpat-it/vcf2mst: Hamming Distance based Minimum Spanning Tree from Samples vcf using graptree (original) (raw)

Hamming Distance based Minimum Spanning Tree from Samples vcf using graptree.

When using vcf2mst please use the following citation:

SARS-CoV-2 surveillance in Italy through phylogenomic inferences based on Hamming distances derived from functional annotations of SNPs, MNPs and InDels Adriano Di Pasquale, Nicolas Radomski, Iolanda Mangone, Paolo Calistri, Alessio Lorusso, Cesare Camma BMC Genomics 22, 782 (2021). https://doi.org/10.1186/s12864-021-08112-0

Synopsis

Basic usage

vcf2mst.pl input_tsv_file out_file type_of_input [options]

type_of_input=[vcf,gisaid,algn2pheno,nextclade,tsv,code]

Examples

type of inputs

vcf2mst.pl list_of_vcfiles mst.nwk vcf vcf2mst.pl gisaid_metadata.tsv mst.nwk gisaid vcf2mst.pl algn2pheno_metadata.tsv mst.nwk algn2pheno vcf2mst.pl nextclade_metadata.tsv mst.nwk nextclade vcf2mst.pl samples_vcfcodes.tsv mst.nwk code

profile file only. return a matrix compatible with grapetree input

vcf2mst.pl list_of_vcfiles profile.tsv vcf -out profile

usage 5: filter positions

vcf2mst.pl list_of_vcfiles mst.tsv vcf -minmax-include 10-10000

See examples folder for the different file formats. See section Usage for further details

Installation

Prerequisites

Installation Step

Docker based installation

Usage

Basic usage

vcf2mst.pl input_tsv_file out_file type_of_input [options]

type_of_input=[vcf,gisaid,algn2pheno,nextclade,tsv,code]

Options

Example of minmax,debug,tsv options

perl vcf2mst.pl examples/nextclade_example.tsv profile.tsv tsv -out profile -tsv-sample-pos 0 -tsv-mutationslist-find pos -tsv-mutationslist-pos 15 -tsv-mutation-pos-regexp '\w(\d+)\w' -debug 1 -minmax 9000-10000 -minmax-exclude 9534-9534

Example of file-minmax,debug,tsv options

perl vcf2mst.pl examples/nextclade_example.tsv profile.tsv tsv -out profile -tsv-sample-pos 0 -tsv-mutationslist-find pos -tsv-mutationslist-pos 15 -tsv-mutation-pos-regexp '\w(\d+)\w' -file-minmax-include examples/intervals/interval-example-1.txt -file-minmax-exclude examples/intervals/interval-example-2.txt

Examples of file intervals format: see examples/intervals/interval-example.txt

GrapeTree command

In case the vcf2mst is installed locally and the docker version it is not used, there are different way to launch the required grapetree tool.

The environment variable GRAPETREE_EXEC can be used for different needs. In case there a local installation of grapetree is used, GRAPETREE_EXEC can be set like export GRAPETREE_EXEC=grapetree (this is the default) In case there a docker installation of grapetree is used, GRAPETREE_EXEC can be set like export GRAPETREE_EXEC=docker run --mount type=bind,source=/tmp,destination=/tmp --rm quay.io/biocontainers/grapetree:2.1--pyh3252c3a_0 grapetree -p

Grapetree can be also set with -grapetree-bin options.

Change the examples according to your specific installation and docker image

Examples

From snippy format vcf files

plain

vcf2mst.pl list_of_vcfiles mst.nwk vcf

docker

docker run -u $UID -v /tmp:/tmp --rm vcf2mst vcf2mst.pl /tmp/list_of_vcfiles /tmp/mst.nwk vcf

See examples/list_of_vcfiles for file format

From snippy format vcf director

plain

vcf2mst.pl folder_with_vcfiles mst.nwk vcf

docker

docker run -u $UID -v /tmp:/tmp --rm vcf2mst vcf2mst.pl /tmp/folder_with_vcfiles /tmp/mst.nwk vcf

See examples/vcfiles for folder format

From gisaid metadata.tsv

plain

vcf2mst.pl gisaid_metadata.tsv mst.nwk gisaid

docker

docker run -u $UID -v /tmp:/tmp --rm vcf2mst vcf2mst.pl /tmp/gisaid_metadata.tsv /tmp/mst.nwk gisaid

See examples folder for gisaid_metadata.tsv file format (minimum information needed) and gisaid_full_metadata.tsv (downloadable from gisaid) file format.

From vcfcodes csv file

plain

vcf2mst.pl samples_vcfcodes.tsv mst.nwk code

docker

docker run -u $UID -v /tmp:/tmp --rm vcf2mst vcf2mst.pl /tmp/samples_vcfcodes.tsv /tmp/mst.nwk code

See examples folder for samples_vcfcodes.tsv file format

The VCFCODE in samples_vcfcodes.tsv might be: