Principles of ChIP-seq Data Analysis Illustrated with Examples (original) (raw)

Important biological information uncovered in previously unaligned reads from chromatin immunoprecipitation experiments (ChIP-Seq)

Scientific reports, 2015

Establishing the architecture of gene regulatory networks (GRNs) relies on chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) methods that provide genome-wide transcription factor binding sites (TFBSs). ChIP-Seq furnishes millions of short reads that, after alignment, describe the genome-wide binding sites of a particular TF. However, in all organisms investigated an average of 40% of reads fail to align to the corresponding genome, with some datasets having as much as 80% of reads failing to align. We describe here the provenance of previously unaligned reads in ChIP-Seq experiments from animals and plants. We show that a substantial portion corresponds to sequences of bacterial and metazoan origin, irrespective of the ChIP-Seq chromatin source. Unforeseen was the finding that 30%-40% of unaligned reads were actually alignable. To validate these observations, we investigated the characteristics of the previously unaligned reads corresponding to TAL1,...

Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

PLoS Computational Biology, 2013

Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data.

A pipeline for the identification and characterization of chromatin modifications derived from ChIP-Seq datasets

Biochimie, 2012

The advent of massive parallel sequencing of immunopurified chromatin and its determinants has provided new avenues for researchers to map epigenome-wide changes and there is tremendous interest to uncover regulatory signatures to understand fundamental questions associated with chromatin structure and function. Indeed, the rapid development of large genome annotation projects has seen a resurgence in chromatin immunoprecipitation (ChIP) based protocols which are used to distinguish protein interactions coupled with large scale sequencing (Seq) to precisely map epigenome-wide interactions. Despite some of the great advances in our understanding of chromatin modifying complexes and their determinants, the development of ChIP-Seq technologies also pose specific demands on the integration of data for visualization, manipulation and analysis. In this article we discuss some of the considerations for experimental design planning, quality control, and bioinformatic analysis.

A high-throughput ChIP-Seq for large-scale chromatin studies

Molecular systems biology, 2015

We present a modified approach of chromatin immuno-precipitation followed by sequencing (ChIP-Seq), which relies on the direct ligation of molecular barcodes to chromatin fragments, thereby permitting experimental scale-up. With Bar-ChIP now enabling the concurrent profiling of multiple DNA-protein interactions, we report the simultaneous generation of 90 ChIP-Seq datasets without any robotic instrumentation. We demonstrate that application of Bar-ChIP to a panel of Saccharomyces cerevisiae chromatin-associated mutants provides a rapid and accurate genome-wide overview of their chromatin status. Additionally, we validate the utility of this technology to derive novel biological insights by identifying a role for the Rpd3S complex in maintaining H3K14 hypo-acetylation in gene bodies. We also report an association between the presence of intragenic H3K4 tri-methylation and the emergence of cryptic transcription in a Set2 mutant. Finally, we uncover a crosstalk between H3K14 acetylatio...

Comparing genome-wide chromatin profiles using ChIP-chip or ChIP-seq

Bioinformatics, 2010

ChIP-chip and ChIP-seq technologies provide genome-wide measurements of various types of chromatin marks at an unprecedented resolution. With ChIP samples collected from different tissue types and/or individuals, we can now begin to characterize stochastic or systematic changes in epigenetic patterns during development (intra-individual) or at the population level (inter-individual). This requires statistical methods that permit a simultaneous comparison of multiple ChIP samples on a global as well as locus-specific scale. Current analytical approaches are mainly geared toward single sample investigations, and therefore have limited applicability in this comparative setting. This shortcoming presents a bottleneck in biological interpretations of multiple sample data. To address this limitation, we introduce a parametric classification approach for the simultaneous analysis of two (or more) ChIP samples. We consider several competing models that reflect alternative biological assumptions about the global distribution of the data. Inferences about locus-specific and genome-wide chromatin differences are reached through the estimation of multivariate mixtures. Parameter estimates are obtained using an incremental version of the Expectation-Maximization algorithm (IEM). We demonstrate efficient scalability and application to three very diverse ChIP-chip and ChIP-seq experiments. The proposed approach is evaluated against several published ChIP-chip and ChIP-seq software packages. We recommend its use as a first-pass algorithm to identify candidate regions in the epigenome, possibly followed by some type of second-pass algorithm to fine-tune detected peaks in accordance with biological or technological criteria. R source code is available at http://gbic.biol.rug.nl/supplementary/2009/ChromatinProfiles/. Access to Chip-seq data: GEO repository GSE17937.

Computational methodology for ChIP-seq analysis

Quantitative biology, 2013

Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of DNA binding proteins such as transcription factors or modified histones. As more and more experimental laboratories are adopting ChIP-seq to unravel the transcriptional and epigenetic regulatory mechanisms, computational analyses of ChIP-seq also become increasingly comprehensive and sophisticated. In this article, we review current computational methodology for ChIP-seq analysis, recommend useful algorithms and workflows, and introduce quality control measures at different analytical steps. We also discuss how ChIP-seq could be integrated with other types of genomic assays, such as gene expression profiling and genome-wide association studies, to provide a more comprehensive view of gene regulatory mechanisms in important physiological and pathological processes. QB 54

The ChIP-Seq tools and web server: a resource for analyzing ChIP-seq and other types of genomic data

BMC genomics, 2016

ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources ...

jChIP: a graphical environment for exploratory ChIP-Seq data analysis

BMC Research Notes, 2014

Background: Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-Seq) provides a powerful tool for discovering protein-DNA interactions. Still, the computational analysis of the great amount of ChIP-Seq data generated, involving mapping of raw data to reference genome, has been a bottle neck for most of researchers in the transcriptional and epigenetic fields. Thus, user-friendly ChIP-Seq processing method sare much needed to enable greater community of computational and bench biologists to exploit the power of ChIP-Seq technology. Findings: jChIP is a graphical tool that was developed to analyze and display ChIP-Seq data. It matches reads to the corresponding loci downloaded from Ensembl Genes or Ensembl Regulation databases. jChIP provides a friendly interface for exploratory analysis of mapped reads as well as peak calling data. The built-in functions for graphical display of reads distribution allows to evaluate the quality and meaning of ChIP-Seq data. Conclusion: jChIP is a user-friendly GUI-based software for the analysis of ChIP-Seq data within context of known genomic features. Further, jChIP provides tools for discovering new and refining known genome-wide protein binding patterns.

Standardizing chromatin research: a simple and universal method for ChIP-seq

Nucleic acids research, 2015

Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a key technique in chromatin research. Although heavily applied, existing ChIP-seq protocols are often highly fine-tuned workflows, optimized for specific experimental requirements. Especially the initial steps of ChIP-seq, particularly chromatin shearing, are deemed to be exceedingly cell-type-specific, thus impeding any protocol standardization efforts. Here we demonstrate that harmonization of ChIP-seq workflows across cell types and conditions is possible when obtaining chromatin from properly isolated nuclei. We established an ultrasound-based nuclei extraction method (NEXSON: Nuclei EXtraction by SONication) that is highly effective across various organisms, cell types and cell numbers. The described method has the potential to replace complex cell-type-specific, but largely ineffective, nuclei isolation protocols. By including NEXSON in ChIP-seq workflows, we completely eliminate the need for e...

ChIP-seq: Using high-throughput sequencing to discover protein–DNA interactions

Methods, 2009

Chromatin immunoprecipitation (ChIP) allows specific protein-DNA interactions to be isolated. Combining ChIP with high-throughput sequencing reveals the DNA sequence involved in these interactions. Here, we describe how to perform ChIP-seq starting with whole tissues or cell lines, and ending with millions of short sequencing tags that can be aligned to the reference genome of the species under investigation. We also outline additional procedures to recover ChIP-chip libraries for ChIP-seq and discuss contemporary issues in data analysis.