GitHub - COMBINE-lab/quark: semi-reference-based short read compression (original) (raw)
Quark
semi-reference-based short read compression
Assumption
The read files are in gzipped format i.e. they should be like .. 1.fastq.gz and 2.fastq.gz
The software is tested on paired end and single end data on bash compatible shell (redirection might not work with fish kind of ad on), single end support will be added to the "quark.sh" script soon.
Dependency
Quark depends on plzip
for downstream compression. More information about Plzip and installation guide can be found here.
Compile
$git clone www.github.com/COMBINE-lab/quark.git
$cd quark
$mkdir build
$cd build
$cmake ..
$make
$cd ..
##Running Quark
To see the options
To build the index with kmer size k
snakemake -s quark.snake make_index --config out="<output dir>" fasta="<fasta file>" kmer=<#k>
To Encode
Single End
snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" r="<mate>" p=<#threads> lib="single" quality=0
Paired end
snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" m1="<mate1>" m2="<mate2>" p=<#threads> lib="paired" quality=0
To Decode
snakemake -s quark.snake decode --config in="<in dir>" out="<out dir>" lib="paired/single" quality=0
To check the encoded and decoded sequences are same !! (it is lossless)
$./check_pair.sh <original left end> <original right end> <quark left end> <quark right end>
Link to the preprint
Quark enables semi-reference-based compression of RNA-seq data by Hirak Sarkar, Rob Patro