Bioconductor Code: HiCool (original) (raw)

[![](https://img.shields.io/badge/lifecycle-maturing-blue.svg)\](https://www.tidyverse.org/lifecycle/#maturing) [![](https://img.shields.io/badge/license-MIT-green.svg)\](https://opensource.org/licenses/MIT) # HiCool The `HiCool` R/Bioconductor package provides an **end-to-end interface** to process and normalize Hi-C paired-end fastq reads into `.(m)cool` files. 1. The heavy lifting (fastq mapping, pairs parsing and pairs filtering) is performed by the underlying lightweight `hicstuff` python library ([https://github.com/koszullab/hicstuff\](https://github.com/koszullab/hicstuff)). 2. Pairs filering is done using the approach described in [Cournac et al., 2012](https://doi.org/10.1186/1471-2164-13-436) and implemented in `hicstuff`. 3. `Cooler` ([https://github.com/open2c/cooler\](https://github.com/open2c/cooler)) library is used to parse pairs into a multi-resolution, balanced `.mcool` file. `.(m)cool` is a compact, indexed HDF5 file format specifically tailored for efficiently storing HiC-based data. The `.(m)cool` file format was developed by Abdennur and Mirny and [published in 2019](https://doi.org/10.1093/bioinformatics/btz540). 4. Internally, all these external dependencies are automatically installed and managed in R by a `basilisk` environment. ![](https://raw.githubusercontent.com/js2264/HiCool/master/man/figures/pipeline.png) ## Processing `.fastq` paired-end files into a `.mcool` Hi-C contact matrix The main processing function offered in this package is `HiCool()`. One simply needs to specify: - The path to each fastq file; - The genome reference, as a `.fasta` sequence, a pre-computed `bowtie2` index or a supported ID (`hg38`, `mm10`, `dm6`, `R64-1-1`, `WBcel235`, `GRCz10`, `Galgal4`); - The restriction enzyme(s) used for Hi-C. ```r library(HiCool) x <- HiCool( r1 = '<PATH-TO-R1.fq.gz>', r2 = '<PATH-TO-R2.fq.gz>', restriction = 'DpnII,HinfI', genome = 'R64-1-1' ) ``` ```sh ## HiCool :: Recovering bowtie2 genome index from AWS iGenomes... ## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpARIRQo/DZ28I8]... ## HiCool :: Mapping fastq files... ## HiCool :: Best-suited minimum resolution automatically inferred: 1000 ## HiCool :: Remove unwanted chromosomes... ## HiCool :: Generating multi-resolution .mcool file... ## HiCool :: Balancing .mcool file... ## HiCool :: Tidying up everything for you... ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check /home/rsg/repos/HiCool/HiCool folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. ## HiCool :: Report generated and available @ sample^mapped-R64-1-1^DZ28I8.html ## HiCool :: All processing successfully achieved. Congrats! ``` ```r x ``` ```sh ## CoolFile object ## .mcool file: sample^mapped-R64-1-1^55IONQ.mcool ## resolution: 1000 ## pairs file: sample^55IONQ.pairs ## metadata(3): log args stats ``` ## Output files ```sh ## HiCool/ ## |-- sample^mapped-R64-1-1^55IONQ.html ## |-- logs ## | |-- sample^mapped-R64-1-1^55IONQ.log ## |-- matrices ## | |-- sample^mapped-R64-1-1^55IONQ.mcool ## |-- pairs ## | |-- sample^mapped-R64-1-1^55IONQ.pairs ## `-- plots ## |-- sample^mapped-R64-1-1^55IONQ_event_distance.pdf ## |-- sample^mapped-R64-1-1^55IONQ_event_distribution.pdf ``` ## Reporting On top of processing fastq reads, HiCool provides convenient reports for single/multiple sample(s). ```r x <- importHiCoolFolder(output = 'HiCool/', hash = '55IONQ') HiCReport(x) ``` ## Installation As an R/Bioconductor package, `HiCool` should be very easy to install. The only dependency is R (>= 4.2). In R, one can run: ```r if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("HiCool") ``` The first time a `HiCool()` function is executed, a `basilisk` environment will be automatically set up. In this environment, few dependencies will be installed: - python (pinned 3.9.1) - numpy (pinned 1.23.4) - bowtie2 (pinned 2.4.5) - samtools (pinned 1.7) - **hicstuff** (pinned 3.1.5) - **cooler** (pinned 0.8.11) ## HiCExperiment ecosystem `HiCool` is integrated within the `HiCExperiment` ecosystem in Bioconductor. Read more about the `HiCExperiment` class and handling Hi-C data in R [here](https://github.com/js2264/HiCExperiment). ![](https://raw.githubusercontent.com/js2264/HiCExperiment/master/man/figures/HiCExperiment\_ecosystem.png) - [HiCExperiment](https://github.com/js2264/HiCExperiment): Parsing Hi-C files in R - [HiCool](https://github.com/js2264/HiCool): End-to-end integrated workflow to process fastq files into .cool and .pairs files - [HiContacts](https://github.com/js2264/HiContacts): Investigating Hi-C results in R - [HiContactsData](https://github.com/js2264/HiContactsData): Data companion package - [fourDNData](https://github.com/js2264/fourDNData): Gateway package to 4DN-hosted Hi-C experiments