GitHub - koszullab/HiC-Box: GUI-based pipeline for Hi-C data processing and visualization (original) (raw)

HiC-Box

HiC-Box is a HiC data processing pipeline and visualizer, written mostly in Python (as of 2018, it is only compatible to Python 2; with deprecation coming up in 2020, work is undergoing to convert it to Python 3). It uses bowtie2 as a backend to align input paired-end reads onto an input genome, and derives a contact map from the alignment data and the position of restriction sites along the genome (the restriction enzyme is also given as input).

The restriction fragments are then binned (and the corresponding matrix sum-pooled) at different scales, hence building a pyramid. Each level of the pyramid is a contact map at a different resolution. Each such level can be browsed with the box.

HiC-Box generates datasets that are compatible with GRAAL for reassembly. Both softwares operate on the same data template and their codebase is redundant but since GRAAL has specific requirements that can be difficult to deploy, they are kept on separate repos. Nevertheless, if you wish to use GRAAL on your own genome and paired-end reads, you will need to use the box to convert them into GRAAL-digestible input data. Here are examples of what such prepared datasets should look like.

Dependencies

Python packages

Other software

How to use

Advanced

Visualizer options

Output templating

The box is designed to go through all steps of a typical Hi-C pipeline, from alignment to visualization or a possible reassembly by GRAAL. However, if you wish to customize the alignment step, you may directly provide a sam file named 0.sam in the output folder. HiC-Box will skip the alignment part and proceed to the contact matrix generation.

Troubleshooting