GitHub - Finn-Lab/emeraldBGC: SMBGC detection tool. This project has been deprecated and is no longer being maintained. All ongoing and future development has been moved to SanntiS (original) (raw)

Docker Repository on Quay

emeraldBGC (Deprecated)

Notice: This project has been deprecated and is no longer being maintained. All ongoing and future development has been moved to SanntiS.

emeraldBGC - SMBGC detection tool -

How to use emeraldBGC?

Conda

Requires:

Installation

conda create -n emeraldbgc emeraldbgc conda activate emeraldbgc conda install -c bioconda emeraldbgc

Basic tests

$ conda activate emeraldbgc $ emeraldbgc test/files/BGC0001472.fna $ conda deactivate emerald

Run with interproscan file:

$ conda activate emeraldbgc $ emeraldbgc --ip-file test/files/BGC0001472.fna.prodigal.faa.gff3 test/files/BGC0001472.fna.prodigal.faa.gb $ conda deactivate emerald

Docker:

Get InterProsScan data:

The size of download file is ~24G, the final directory is 16G. Be sure to have enough space

Docker ready to use script:

Only works if "data/" and emeraldbgc_container.py are in the same directory

$ emeraldbgc_container.py --help $ emeraldbgc_container.py [OPTIONS] ARGUMENTS

Docker image shell:

$ docker -it --entrypoint bash -v <path to emeraldBGC/docker>/data/:/opt/interproscan quay.io/repository/microbiome-informatics/emerald-bgc $ emeraldbgc --help $ emeraldbgc [OPTIONS] ARGUMENTS

Ouput

GFF3 format file

The fields in this header are as follows:

seqname: SeqID of contig, as in prodigal output.
source: emeraldbgc version.
feature: Feature type name, i.e. CLUSTER, CLUSTER_border, CDS.
start: Start position of feature
end: End position of feature
score: empty
strand: empty
frame: empty
attributes:
  ID: ordinal ID for the cluster, beginning with 1.
  nearest_MiBIG: MiBIG accession of the nearest BGC to the cluster in the MIBIG space, measured in Dice dissimilarity coefficient.
  nearest_MiBIG_class: BGC class of nearest_MiBIG.
  nearest_MiBIG_diceDistance: Dice dissimilarity coefficient between ID and nearest_MiBIG.
  score: Post-processing probability output.
  partial: Indicates if a CLUSTER is at the edge of the contig. First and second digits represent 5' and 3' end, respectively. Same as in prodigal's `partial`. "0" shows the cluster is not at the edge, whereas a "1" indicates is at that edge, (i.e. a partial cluster).

Sample:

##gff-version 3
DS999642	EMERALDv0.9.0	CLUSTER	1	136970	.	.	.	ID=DS999642_emrld_1;nearest_MiBIG=BGC0001397;nearest_MiBIG_class=NRP Polyketide;nearest_MiBIG_diceDistance=0.561;partial=10