Genozip | Compression for FASTQ, BAM, VCF (original) (raw)

Daniel S. T. Hughes MBioch (Hons; Oxford) PhD (Cambridge)
Director of Bioinformatics, Institute of Genomic Medicine
Columbia University

"The Institute of Genomic Medicine's (IGM) Bioinformatics Core, situated within the Columbia University Irving School of Medicine, manages a variant warehouse containing approximately 130,000 whole-genome sequencing (WGS) and whole-exome sequencing (WES) samples. This warehouse serves the dual purpose of gene discovery and diagnostic analysis and has been utilized in numerous published analyses. Additionally, the IGM acts as a long-term repository for original off-machine FASTQ files of internally and externally sequenced samples, which must be preserved in their original form.

After an extensive evaluation of the cost, compute, compression benefits of multiple options we decided upon the use of Genozip Premium package.

We applied the lossless Genozip compression on approximately 172,000 of our most recent internally stored FASTQ pairs. This reduced their data footprint from 537.4 TB to 115.6 TB, resulting in an average space savings of 78.5%. Not only did this significantly reduce storage costs, but it also facilitated the migration of the entire dataset to our cloud infrastructure.

I can highly recommend Genozip to any organization looking to reduce the storage footprint of their FASTQ files."

James Bonfield

Current maintainer of the CRAM specification & co-developer of a CRAM implementation

"Use Genozip if you want a commercial alternative to CRAM" ¹

¹ Personal opinion posted on X

More testimonials