Using FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) to isolate active regulatory DNA (original) (raw)

. Author manuscript; available in PMC: 2013 Sep 26.

Published in final edited form as: Nat Protoc. 2012 Jan 19;7(2):256–267. doi: 10.1038/nprot.2011.444

Abstract

Eviction or destabilization of nucleosomes from chromatin is a hallmark of functional regulatory elements of the eukaryotic genome. Historically identified by nuclease hypersensitivity, these regulatory elements are typically bound by transcription factors or other regulatory proteins. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) is an alternative approach to identify these genomic regions and has proven successful in a multitude of eukaryotic cell and tissue types. Cells or dissociated tissues are crosslinked briefly with formaldehyde, lysed, and sonicated. Sheared chromatin is subjected to phenol-chloroform extraction and the isolated DNA, typically encompassing 1–3% of the human genome, is purified. We provide guidelines for quantitative analysis by PCR, microarrays, or next-generation sequencing. Regulatory elements enriched by FAIRE display high concordance with those identified by nuclease hypersensitivity or ChIP, and the entire procedure can be completed in three days. FAIRE exhibits low technical variability, which allows its use in large-scale studies of chromatin from normal or diseased tissues.

Keywords: FAIRE, open chromatin, nucleosome, next-generation sequencing

Introduction

Understanding the regulation of transcription by sequence-specific regulatory factors and subsequent remodeling of chromatin is central to studies of health and disease. The activities of regulatory factors at promoters, enhancers, silencers, and insulators typically cause nucleosomes to be evicted from chromatin in eukaryotic cells1. Therefore, one of the most effective means of discovering transcriptional regulatory elements is through the identification of nucleosome-depleted regions (“open chromatin”). Historically, this was accomplished by exploiting regional hypersensitivity to nucleases such as DNase I29. More recently, we demonstrated an alternative methodology for the detection of open chromatin, which we termed FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements)1013. FAIRE was first characterized in yeast and subsequently applied to human cells and tissues1316. The technique has proven useful for a wide range of eukaryotes, from Plasmodium17 to maize18. Here, we present recent methodological enhancements that improve the utility and reliability of FAIRE, especially for use on tissues or lipid-laden cells such as adipocytes.

Overview

FAIRE does not rely on the use of antibodies or enzymes, and is based on differences in crosslinking efficiencies between DNA and nucleosomes or sequence-specific DNA-binding proteins. DNA in nucleosome depleted regions of chromatin (for example through the activity of a sequence-specific regulatory factor) is much less efficiently crosslinked to protein12. DNA not crosslinked to protein will segregate to the aqueous phase during phenol-chloroform extraction. In contrast, DNA covalently linked to proteins will demonstrate hydrophilic properties, and will become trapped between the organic and aqueous phase. To perform FAIRE (Figure 1), cells or dissociated tissues are cross-linked briefly with formaldehyde, lysed, and sonicated. Sheared chromatin is then subjected to phenol/chloroform extraction. The DNA in the aqueous phase is then purified and assayed. FAIRE-enriched chromatin is detected using one of several quantitative approaches. Options include quantitative amplification by PCR (FAIRE-qPCR)13, hybridization to a tiling DNA microarray (FAIRE-chip)11,13, or sequencing via next-generation sequencing technologies (FAIRE-seq)13,16. Due to declining costs of sequencing and higher quality and resolution of sequencing-based data, FAIRE-seq has now nearly fully supplanted FAIRE-chip and FAIRE-qPCR, especially for larger genomes, but also for smaller genomes through multiplexing. Analysis by next-generation sequencing requires alignment of high-quality reads to a reference genome (e.g. Bowtie19) followed by detection of regions of significant enrichment (we recommend ZINBA20). Bowtie and ZINBA are both freely available.

Figure 1.

Figure 1

Example timeline for FAIRE protocol. Steps are grouped by day for the typical timeline, but utilizing Pause Points will extend the duration.

Applications

Our lab has used FAIRE extensively to characterize active regulatory elements of several human cell lines as part of the ENCODE consortium21, as well as different cell, tissue, and tumor samples from humans, mice, and other eukaryotes. FAIRE has been used to create catalogs of regulatory elements in normal or diseased cells13,14,16, narrow the search space for causal sequence variants in human disease13,22, and understand the interactions between transcription factors and chromatin remodeling23,24. When coupled with high-throughput sequencing, FAIRE can also be used to identify both large- and small-scale structural variations such as copy number variants (CNV)20.

Comparison with other methods

We have previously shown that regions in the yeast genome enriched by FAIRE were anti-correlated with occupancy of histones H3 and H410, and that FAIRE regions encompass promoters, enhancers, insulators, and other regulatory elements, most of which are also captured by DNase I hypersensitivity assays10,1214,16. An in-depth comparison between regulatory elements captured by FAIRE, DNase I hypersensitivity, and ChIP-seq found that while a large set of elements were identified by all methods, each assay also identified a unique set of features16. FAIRE was able to detect some distal regulatory elements, such as transcriptional enhancers, that DNase-seq could not, whereas DNase-seq identified some promoters that FAIRE did not.

Advantages of FAIRE

Antibody and enzyme independency

In contrast to ChIP, which is highly subject to antibody reliability and variability issues25, FAIRE offers the consistency of a chemical based isolation. Moreover, FAIRE does not require enzymes, such as DNase or MNase, which are commonly used in analogous methods for detecting nucleosome-free regions. Avoiding the optimization and extra steps necessary for enzymatic processing or immunoprecipitations eliminates a major source of variation, and thus makes it a much more reliable and robust method.

Enhancer detection

As described in Comparison with other methods and in Song et al16, FAIRE may identify additional transcriptional enhancers and other distal regulatory elements in comparison to other methods such as DNase-seq.

Sequenced input control not required

As discussed in Rashid et al20, a sequenced input control is not required for proper analysis of FAIRE-enriched regions. This reduces next-generation sequencing costs as well as the cost of reagents.

Applicability to tissue samples

Since FAIRE does not require a single-cell suspension or nuclear isolation, it is easily adapted for use on tissue samples. The only additional step needed is pulverization of frozen tissue into a coarse powder prior to fixation.

Limitations

Promoter detection

As described in Comparison with other methods and in Song et al16, other methods, such as DNase-seq, may be better at identifying nucleosome-depleted promoters of highly expressed genes.

Analysis

As noted below in Experimental design, although FAIRE is relatively straightforward experimentally, an extensive amount of computational processing and analysis are required for comprehensive interpretation of genome-wide results. Groups without access to bioinformatics specialists and computers with sufficient memory, computing power, and storage capacity may experience challenges in interpreting their results. Quantification of FAIRE signal by qPCR or microarrays may be more straightforward.

Absence of transcription factor footprinting

Transcription factor motifs can be identified in regions of open chromatin identified by FAIRE. However, the higher resolution and increased signal-to-noise of DNase-seq permits detection of specific transcription factor footprints in very deeply sequenced data1.

Low signal-to-noise

Relative to ChIP-seq or DNase-seq experiments, FAIRE has a lower signal-to-noise ratio. Therefore, the sites detected by FAIRE can, at times, be only marginally enriched above the background signal. This leads to a reduced confidence in the sites identified. This effect can be exacerbated when using non-sequencing based detection methods. Consequently, primer and array design as well as the selection of control regions are critical. Despite the low signal-to-noise ratio, we note that FAIRE is remarkably reproducible from experiment to experiment.

Fixation variation among tissues

Fixation efficiency can vary drastically due to many reasons, including differences in cellularity, permeability, purity, fat content, and surface area. Although dissociation by pulverization seems to make fixation slightly more consistent compared to mincing or other methods, this variability can still lead to inconsistent results; optimization is thus recommended.

Experimental design

Replicates

Studies utilizing FAIRE, like many other genome-wide assays, should include biological replicates. This entails the use of multiple independently grown batches of cells or tissues treated in the same fashion. Several methods have been developed for the assessment of concordance among replicates, such as Irreproducible Discovery Rate (IDR)26, which is currently employed by the ENCODE consortium. Methods like IDR often require a ranked set of statistically enriched regions, which can be obtained by most peak-calling algorithms, including ZINBA20 (see Analysis below).

Control sample

For sequencing-based detection of FAIRE enrichment, we have found that a control sample, such as genomic or input DNA, while always better to have, is not strictly necessary for samples that have been sequenced to sufficient depth and coverage20. When detecting enrichment by qPCR or tiling DNA microarrays, a genomic or input DNA sample is necessary for use as a reference.

Analysis

Although FAIRE is a relatively straightforward experimental protocol that can be completed in three days, extensive computational processing and analysis are required for interpretation of the results. This includes quality assessment of the sequencing library and the sequencing reactions themselves, reference genome alignment, detection of enrichment, and assessment of replicate concordance. We recommend a combination of the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and TagDust27 for quality control of the sequencing reactions and libraries, respectively. Although we typically use Bowtie19 for reference genome alignment, other similar algorithms such as BWA28 are equally suitable. To detect regions of significant FAIRE enrichment (“peaks”), we found that methods such as MACS29 and Fseq30, though commonly used successfully for ChIP-seq or DNase-seq data, do not perform well on FAIRE-seq data, likely due to its relatively lower inherent signal-to-noise ratio. We thus developed a novel statistical algorithm called ZINBA20. The regions identified by ZINBA can then be used to assess concordance among replicates using algorithms such as IDR26. If possible, the data should be compared to existing maps of open chromatin, such as DNase-seq and FAIRE-seq data made available by the ENCODE consortium21, or with gene expression data. FAIRE enrichment at gene promoters is strongly linked to gene expression. Therefore, strong FAIRE enrichment is expected around genes known to be highly expressed. A large fraction (~30–50%) of the regions enriched by FAIRE are in intergenic regions of the genome Typically only ~5–15% of all FAIRE sites are at proximal promoters13,16. To determine if an experiment was successful, we often examine the pattern from a locus on human chromosome 19 that produces a remarkably consistent level of FAIRE enrichment across cell types (see Anticipated Results).

Detection method

In cases where a reference genome assembly is available, FAIRE coupled with high-throughput sequencing is likely the most cost-effective option, especially if multiplexing is applicable. In smaller eukaryotes or for very targeted experiments, detection by microarray or quantitative PCR may be preferable, but array and primer design will play a key role in the overall success of the experiment (see FAIRE-chip Microarray design and FAIRE-qPCR Primer design below).

Fixation

The most common reason for a failed FAIRE experiment is under-fixation of the cells. We have found that for a majority of mammalian cells in culture, fixation for five minutes with formaldehyde is both adequate and ideal. The protocol below includes quantification of both input control and FAIRE DNA, and we describe a diagnostic for determining if the sample has not been fixed sufficiently. For tissues, samples must first be pulverized into a course powder and then fixed for 7–9 minutes. The adequacy of fixation will depend heavily on the tissue size and composition and thus may need to be optimized. Other techniques or adaptations for fixation may be required for plants or fungi, such as significantly increased fixation time10 or modified fixation solutions31. For lipid-laden cells, it may be beneficial to perform both fixation and cell lysis (to extract nuclei) prior to attempting to harvest the cells, as outlined below in step 1B.

Sonication

Sonication parameters must be optimized for each experiment due to variation in cell number, composition, sonicator and probe type, and fixation. In Figure 2, we present a representative agarose gel that provides examples of over-, under-, and sufficiently sonicated chromatin. Ideally, chromatin is sheared to a range of about 150–750 bp with an average fragment length around 300–400 bp. Sonication yielding average fragment sizes smaller than this can result in reduced detection of highly nucleosome-depleted regions. High molecular weight bands may be visible especially when beginning with frozen tissue, but their presence in lieu of a distribution of smaller fragments is indicative of under-sonication or poor cell lysis.

Figure 2.

Figure 2

Representative gel image showing varying degrees of sonication. NIH3T3 cells were fixed and lysed as described above. Chromatin was then sheared by sonication for 0, 2, 4, 6, 8, and 10 cycles using the parameters outlined in step 2A. After clearing cell debris, crosslinks were reversed, and purified DNA was run on a 1% agarose gel. A 100 bp ladder (lane marked M) is included for reference. The target range for fragment sizes is shown. Six cycles yields an ideal distribution of fragment lengths; fewer than six cycles of sonication is insufficient for solubilization and shearing of chromatin, whereas sonication beyond six cycles leads to oversonication. A high molecular weight band is slightly visible and marked with an asterisk.

FAIRE-chip Microarray design

The two main considerations for microarray design are the resolution (or spacing) of the probes throughout the genome and the set of genomic loci covered by the probes. Resolution is the genomic distance from one probe to the next and must be sufficiently dense to capture the physiologically relevant size of the DNA fragments recovered by FAIRE (~200 bp). Probe spacing should allow a minimum of 3 probes per FAIRE DNA fragment or ~65 bp resolution. The set of genomic regions represented on the array is important as it provides a relative interpretation of the results. This is due to all the measurements being expressed as a ratio of the FAIRE signal over a reference sample, which is normalized by centering based on the mean ratio. The majority of probes should span regions that correspond to background (not open) chromatin. There are a number of published protocols that address specific aspects of array design and include recommendations for reliable detection3241.

FAIRE-qPCR Primer design

When detecting FAIRE enrichment via quantitative PCR, careful consideration of experimental design will maximize the chance of success. In addition to the methodology for quantification of the results, selection of an appropriate set of control regions and locations of primers play an important role in calculating relative enrichment. This is often difficult due to the lack of a priori knowledge of both true FAIRE-positive and -negative sites for most cell or tissue types or growth conditions. The data made available by the ENCODE consortium may be helpful in this regard21. We often employ a tiling approach for detection of open chromatin sites using qPCR, such that primer pairs are designed so the amplicons are either directly overlapping or closely spaced across the assayed genomic regions. As a control, we recommend using primer sets that flank the regions isolated by FAIRE. Since primers spanning or near the edges of sonication breakpoints of FAIRE fragments are unlikely to properly amplify, primer pairs should be designed such that they amplify 60–100 bp within the center of the region of interest. Primer sets should be validated on a dilution series of input DNA to confirm consistent and proportionate amplification characteristics. For these and other reasons, FAIRE-chip and FAIRE-seq are strongly preferred over FAIRE-qPCR.

Materials

Reagents

Equipment

Lab equipment

Computer and software

Reagent Setup

Equipment Setup

Procedure

Formaldehyde crosslinking and cell lysis. TIMING Day 1, 4–6 hours

Sonication. TIMING: Day 1, 1–2 hours

CRITICAL STEP: Foaming should be avoided, as this likely decreases sonication efficiency. If foaming occurs, let sample settle on ice until bubbles have subsided or centrifuge briefly and gently resuspend all material. Probe positioning heavily influences both sonication efficiency and whether or not sample will foam. In most cases, the probe should be placed in the center of the tube approximately one-quarter to one-half an inch from the bottom.

Preparation of input control DNA. Day 1, 1.5 hours and overnight incubation

Purification and assessment of input control DNA. Day 2, 3–4 hours

Preparation of FAIRE DNA. Day 2, 3–4 hours and overnight incubation

Purification and assessment of FAIRE DNA. Day 3, 1 hour

CRITICAL STEP: To test if the FAIRE yield is within an acceptable range, we recommend dividing the total FAIRE yield (in nanograms) by the volume of cell lysate used for FAIRE (in uL, the number of lysate aliquots multiplied by the aliquot volume). A similar value should be calculated for the input control (total yield in nanograms over lysate aliquot volume). The volume-normalized ratio of FAIRE DNA isolated with respect to input control DNA isolated should not exceed 5% and will ideally fall in the 1–3% range. A retrieval ratio significantly higher than 5% is often indicative of under-fixation and may predict experimental failure due to poor signal enrichment.

PAUSE POINT: FAIRE DNA can be frozen and stored indefinitely at −80°C.

TROUBLESHOOTING.

Detection of FAIRE enrichment and basic data analysis

Timing

Day 1: Steps 1–8 (approximately 7–8 hours and overnight incubation)

Day 2: Steps 9–31 (approximately 3–4 hours)

Day 2: Steps 32–56 (approximately 3–4 hours and overnight incubation)

Day 3: Steps 57–59 (approximately 1 hour)

Troubleshooting

Troubleshooting advice can be found in Table 1.

Table 1.

Troubleshooting

Step Problem Possible reason Solution
29 Low input control yield Low starting cell number Start experiment with more cells or larger tissue (step 1)
Poor cell lysis Vary dissociation and cell lysis conditions (step 1)
30 Sheared chromatin has incorrect average fragment length Solution foamed Make sure sonicator tip is centered and located ¼ to ½ an inch from bottom of tube and that sample has been cooled in ice-water bath (step 2)
Under-sonicated Increase number of sonication cycles (step 2)
Under-fixation Insufficiently crosslinked chromatin will lead to production of very small fragments. Increase fixation time or vary fixation conditions (step 1)
36 Aqueous layer is cloudy Phenol may be overloaded due to high cell number Start experiment with fewer cells or smaller tissue (step 1)
58 High DNA yield Under-fixation Insufficiently crosslinked chromatin will lead to high DNA yields with respect to input control. Increase fixation time or vary fixation conditions (step 1)
58 Low DNA yield Low starting cell number Start experiment with more cells or larger tissue (step 1)
Over-fixation Over-crosslinking will reduce recovery of nucleosome-depleted regions. Reduce fixation time or vary fixation conditions (step 1)
59 Poor signal-to-noise Under-fixation Insufficiently crosslinked chromatin will lead to decreased enrichment by FAIRE. Increase fixation time or vary fixation conditions (step 1)

Anticipated Results

Visualize FAIRE-seq or FAIRE-chip data in a browser such as the UCSC Genome Browser47. For data from human cells or tissues, we expect to see enrichment similar to that presented in Figure 3a. This genomic locus on chromosome 19 contains several genes that each contain a nucleosome-depleted promoter detectable in nearly every cell or tissue type assayed to date, including all Tier-1 and Tier-2 cell types assayed by ENCODE (a total of 19 cell types to date)21. Additionally, there are some cell-type-selective regions of open chromatin, such as the region immediately upstream of CNOT3, which is selective for embryonic stem cells and HepG2. The aggregated FAIRE signal around all transcription start sites (TSS) ranked by their gene expression should be similar to that presented in Figure 3b, showing a strong nucleosome-free region approximately 125 bp upstream of TSS and depletion (representing a well-positioned nucleosome) immediately downstream of TSS. The average signal across all genes is presented in Figure 3c. The number of regions of the genome enriched by FAIRE should be approximately 100,000 in any given cell or tissue type. FAIRE additionally detects distal regulatory regions, such as those marked by CTCF (Figure 3d).

Figure 3.

Figure 3

Expected results from FAIRE-seq experiments. A. Genomic locus residing on chromosome 19 as visualized with the UCSC Genome Browser47 shows consistent FAIRE enrichment at transcriptional start sites (TSS) across seven ENCODE cell lines16. Data are presented as number of aligned, in silico extended reads per base, on a scale of 0 to 50 reads. Pink coloring atop tall peaks of enrichment represent where signal exceeded this range. B. Heatmap of normalized GM12878 FAIRE signal ±3kb around TSS ranked by gene expression in GM12878 cells. Color was assigned on a log2 scale of −6 (background) to −2 (enriched). C. Average GM12878 FAIRE signal ±3kb around TSS across all genes. Enrichment peaks around −125bp. D. Average GM12878 FAIRE signal ±3kb around GM12878 CTCF sites, representing a class of distal regulatory elements.

Acknowledgments

We would like to acknowledge members of the Lieb and Davis labs for their constructive feedback. Support for this work was provided by grants from the NHGRI.

Footnotes

Key references:

Hogan, G.J., Lee, C.-K., and Lieb, J.D., PLoS Genet 2 (9), e158 (2006).

Giresi, P.G., Kim, J., McDaniell, R.M. et al., Genome Res 17 (6), 877 (2007).

Giresi, P.G. and Lieb, J.D., Methods 48 (3), 233 (2009).

Author Contributions

The work presented here was carried out in collaboration between all authors. PG and JMS designed and improved the method. JMS, PG, IJD, and JDL wrote the manuscript. All authors have contributed to, seen, and approved of the manuscript.

Competing Interests

The authors declare that they have no competing financial interests.

Contributor Information

Jeremy M. Simon, Email: jmsimon@unc.edu.

Paul G. Giresi, Email: paulg@email.unc.edu.

Ian J. Davis, Email: ian_davis@med.unc.edu.

Jason D. Lieb, Email: jlieb@bio.unc.edu.

References