Genozip | Compression for FASTQ, BAM, VCF (original) (raw)
Columbia University, Institute of Genomic Medicine
Daniel S. T. Hughes, Director of Bioinformatics
"The Institute of Genomic Medicine's (IGM) Bioinformatics Core, situated within the Columbia University Irving School of Medicine, manages a variant warehouse containing approximately 130,000 whole-genome sequencing (WGS) and whole-exome sequencing (WES) samples. This warehouse serves the dual purpose of gene discovery and diagnostic analysis and has been utilized in numerous published analyses. Additionally, the IGM acts as a long-term repository for original off-machine FASTQ files of internally and externally sequenced samples, which must be preserved in their original form.
After an extensive evaluation of the cost, compute, compression benefits of multiple options we decided upon the use of Genozip Premium package.
We applied the lossless Genozip compression on approximately 172,000 of our most recent internally stored FASTQ pairs. This reduced their data footprint from 537.4 TB to 115.6 TB, resulting in an average space savings of 78.5%. Not only did this significantly reduce storage costs, but it also facilitated the migration of the entire dataset to our cloud infrastructure.
I can highly recommend Genozip to any organization looking to reduce the storage footprint of their FASTQ files."
Lille University Hospital Center
Bioinformatics Team
"At Lille University Hospital Center, we regularly manage massive volumes of genomic data. These data require significant storage capacities and efficient management for routine operations.
Since we have started using Genozip, our way of handling genomic data has radically changed. Its ultra-efficient compression technology has allowed us to significantly reduce the digital footprint of our files, often by more than 60%, while maintaining impeccable data quality. This has already led to the freeing up of more than 200 TB of genomic data.
With Genozip, we have seen a significant reduction in costs associated with data storage. For any organization that deals with large amounts of genomic data, we highly recommend Genozip. It is an essential tool that optimizes storage space and improves the efficiency of genomic data management operations."