GitHub - js2264/HiCool: Processing Hi-C raw data within R (original) (raw)

HiCool

Please cite:

Serizay J, Matthey-Doret C, Bignaud A, Baudry L, Koszul R (2024). “Orchestrating chromosome conformation capture analysis with Bioconductor.” Nature Communications, 15, 1-9. doi:10.1038/s41467-024-44761-x.

DOI


The HiCool R/Bioconductor package provides an end-to-end interface to process and normalize Hi-C paired-end fastq reads into .(m)cool files.

  1. The heavy lifting (fastq mapping, pairs parsing and pairs filtering) is performed by the underlying lightweight hicstuff python library (https://github.com/koszullab/hicstuff).
  2. Pairs filering is done using the approach described inCournac et al., 2012 and implemented in hicstuff.
  3. Cooler (https://github.com/open2c/cooler) library is used to parse pairs into a multi-resolution, balanced .mcool file..(m)cool is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. The .(m)cool file format was developed by Abdennur and Mirny andpublished in 2019.
  4. Internally, all these external dependencies are automatically installed and managed in R by a basilisk environment.

Processing .fastq paired-end files into a .mcool Hi-C contact matrix

The main processing function offered in this package is HiCool(). One simply needs to specify:

library(HiCool) x <- HiCool( r1 = '<PATH-TO-R1.fq.gz>', r2 = '<PATH-TO-R2.fq.gz>', restriction = 'DpnII,HinfI', genome = 'R64-1-1' )

HiCool :: Recovering bowtie2 genome index from AWS iGenomes...

HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpARIRQo/DZ28I8]...

HiCool :: Mapping fastq files...

HiCool :: Best-suited minimum resolution automatically inferred: 1000

HiCool :: Remove unwanted chromosomes...

HiCool :: Generating multi-resolution .mcool file...

HiCool :: Balancing .mcool file...

HiCool :: Tidying up everything for you...

HiCool :: .fastq to .mcool processing done!

HiCool :: Check /home/rsg/repos/HiCool/HiCool folder to find the generated files

HiCool :: Generating HiCool report. This might take a while.

HiCool :: Report generated and available @ sample^mapped-R64-1-1^DZ28I8.html

HiCool :: All processing successfully achieved. Congrats!

CoolFile object

.mcool file: sample^mapped-R64-1-1^55IONQ.mcool

resolution: 1000

pairs file: sample^55IONQ.pairs

metadata(3): log args stats

Output files

HiCool/

|-- sample^mapped-R64-1-1^55IONQ.html

|-- logs

| |-- sample^mapped-R64-1-1^55IONQ.log

|-- matrices

| |-- sample^mapped-R64-1-1^55IONQ.mcool

|-- pairs

| |-- sample^mapped-R64-1-1^55IONQ.pairs

`-- plots

|-- sample^mapped-R64-1-1^55IONQ_event_distance.pdf

|-- sample^mapped-R64-1-1^55IONQ_event_distribution.pdf

Reporting

On top of processing fastq reads, HiCool provides convenient reports for single/multiple sample(s).

x <- importHiCoolFolder(output = 'HiCool/', hash = '55IONQ') HiCReport(x)

Installation

As an R/Bioconductor package, HiCool should be very easy to install. The only dependency is R (>= 4.2). In R, one can run:

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("HiCool")

The first time a HiCool() function is executed, a basilisk environment will be automatically set up. In this environment, few dependencies will be installed:

HiCExperiment ecosystem

HiCool is integrated within the HiCExperiment ecosystem in Bioconductor. Read more about the HiCExperiment class and handling Hi-C data in Rhere.