GitHub - GenomicsCoreLeuven/GBSX: GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments (original) (raw)

Overview

Genotyping by Sequencing is an emerging technology for cost effective variant discovery and genotyping. However, current analysis tools do not fulfill all experimental design and analysis needs.

GBSX is a package of tools to first aid in experimental design, including choice of enzymes and barcode design. Secondly, it provides a first analysis step to demultiplex samples using in-line barcodes, providing fastq files that can easily be plugged into existing variant analysis pipelines.

Download

The perl script for in silico digests and the compiled program for all other analyses can be found in the releases directory. The latest directory has the latest version. However, previous versions are still available. The complete source code can be found in the src directory. Example data and results for the tool can be found in the example directory.

##Licence

All parts of this tool is licenced under GPLv3.
A copy of this licence is included under LICENSE.

Contact

Genomics Core
Center for Human Genetics
UZ – KU Leuven
Herestraat 49 PO box 602
B-3000 Leuven, Belgium

Mail: koen.herten@kuleuven.be

Citing GBSX

We ask that you cite this paper if you use GBSX in work that leads to publication.

Herten, Koen and Hestand, Matthew S. and Vermeesch, Joris R. and Van Houdt, Jeroen KJ (2015) GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments BMC Bioinformatics 2015, 16:73 doi:10.1186/s12859-015-0514-3

Help

Genotyping By Sequencing demultipleXing toolkit (GBSX) is a toolkit with an inline barcode demultiplexer for usage in the analysis of single read or paired-end genotyping by sequence (GBS) data, a barcode generator, a barcode discovery tool, and a restriction enzyme predictor. GBSX can easily be incorperated as a preceding analysis step for already deployed SNP pipelines.

Restriction Enzyme Predictor

mandatory parameters:

-f file of reference fasta file location(s)


optional parameters:

* `-e` enzyme name to use (default: Enzyme)
* ```  
`-g`     genome name to use in bed file name (default: genome)  

-n minimum size fragments to include (default: 100)

* ```  
`-m`     maximum size fragments to use (default: 1000)  

-E second enzyme name to use (default: Enzyme2)

* ```  
`-D`     digest sequence for a second enzyme (default: not declared)  

-R digest sequence for a third enzyme (default: not declared)

```

Barcode Generator

mandatory parameters:

optional parameters:

Demultiplexer

This program demultiplexes fastq or fastq.gz files obtained from sequencing with inline barcodes.
Like used in GBS, RAD, ... protocols.

These parameters are mandatory:

These parameters are optional:

Possible Standard Enzymes for the info file: (NA is no enzyme)

Barcode Discovery

This program searches for possible barcodes and barcode enzyme combinations.
Designed for the discovery of sequencing errors, or unused barcodes when a large proportion of the demultiplex is undetermined.

Mandatory parameters:

Tutorial

See the Tutorial file and the example folder.

Change Logs

v1.0

v1.0.1

v1.1

v1.1.1

v1.1.2

v1.1.3

v1.1.4

v1.1.5

v1.2

v1.3