Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes - PubMed (original) (raw)

doi: 10.1038/nmeth.3205. Epub 2014 Dec 1.

Ferhat Ay 1, Choli Lee 1, Gunhan Gulsoy 1, Xinxian Deng 2, Savannah Cook 3, Jennifer Hesson 3, Christopher Cavanaugh 3, Carol B Ware 3, Anton Krumm 4, Jay Shendure 1, Carl Anthony Blau 5, Christine M Disteche 2, William S Noble 1, Zhijun Duan 5

Affiliations

Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes

Wenxiu Ma et al. Nat Methods. 2015 Jan.

Abstract

High-throughput methods based on chromosome conformation capture have greatly advanced our understanding of the three-dimensional (3D) organization of genomes but are limited in resolution by their reliance on restriction enzymes. Here we describe a method called DNase Hi-C for comprehensively mapping global chromatin contacts. DNase Hi-C uses DNase I for chromatin fragmentation, leading to greatly improved efficiency and resolution over that of Hi-C. Coupling this method with DNA-capture technology provides a high-throughput approach for targeted mapping of fine-scale chromatin architecture. We applied targeted DNase Hi-C to characterize the 3D organization of 998 large intergenic noncoding RNA (lincRNA) promoters in two human cell lines. Our results revealed that expression of lincRNAs is tightly controlled by complex mechanisms involving both super-enhancers and the Polycomb repressive complex. Our results provide the first glimpse of the cell type-specific 3D organization of lincRNA genes.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Validation of DNase Hi-C

(a) Overview of DNase Hi-C and targeted DNase Hi-C. For details see Online Methods. (b) Boxplots showing the comparison of chromatin accessibility (DHSs)-associated biases between DNase Hi-C (dark blue) and RE Hi-C libraries (light blue; for details see Supplementary Note 4). Whisker widths are w= 0.5 and outliers are not shown. Data of the two biological replicates of H1 ESC HindIII Hi-C libraries are from Dixon et al. and the K562 HindIII Hi-C library is from Lieberman-Aiden et al. . (c) Boxplots showing the comparison of chromatin accessibility bias at the scale of open/closed chromatin compartment between DNase Hi-C and RE Hi-C libraries. The ratio of observed over expected read coverage (Supplementary Note 4) of each 1 Mb-window located in the active (Open) or inactive (Closed) compartments was computed and shown here for both DNase (dark blue) and HindIII (light blue) Hi-C K562 libraries. Whisker widths are w= 0.5 and outliers are not shown. Both the compartment calls and the RE-based Hi-C data for K562 cells are from Lieberman-Aiden et al. . (d) Boxplots showing the comparison of overall bias between DNase Hi-C and RE Hi-C libraries (two biological replicates). The total number of long-range (>20 kb intra- and inter-chromosomal) contacts associated with each bin was computed, divided by the overall mean and plotted for each library at a resolution of 40 kb. Whisker widths are w= 1 and outliers are not shown. (e) Comparison of genome coverage by DNase Hi-C and RE-based Hi-C libraries. The percent of the genome covered with at least one read (long-range (> 1 kb), uniquely mapped, nonredundant read pairs) is shown for two DNase Hi-C libraries (H1 ESCs and K562). Each track measures paired-end reads subsampled to 15 M and 30 M (subsampling repeated 20 times for each number, standard deviation is negligible) for each library and the last measurement corresponds to the full library sequencing depth. Dashed line indicates the maximum theoretical coverage of the human genome (hg19) by a Hi-C library generated by using the HindIII enzyme.

Figure 1

Figure 1. Validation of DNase Hi-C

(a) Overview of DNase Hi-C and targeted DNase Hi-C. For details see Online Methods. (b) Boxplots showing the comparison of chromatin accessibility (DHSs)-associated biases between DNase Hi-C (dark blue) and RE Hi-C libraries (light blue; for details see Supplementary Note 4). Whisker widths are w= 0.5 and outliers are not shown. Data of the two biological replicates of H1 ESC HindIII Hi-C libraries are from Dixon et al. and the K562 HindIII Hi-C library is from Lieberman-Aiden et al. . (c) Boxplots showing the comparison of chromatin accessibility bias at the scale of open/closed chromatin compartment between DNase Hi-C and RE Hi-C libraries. The ratio of observed over expected read coverage (Supplementary Note 4) of each 1 Mb-window located in the active (Open) or inactive (Closed) compartments was computed and shown here for both DNase (dark blue) and HindIII (light blue) Hi-C K562 libraries. Whisker widths are w= 0.5 and outliers are not shown. Both the compartment calls and the RE-based Hi-C data for K562 cells are from Lieberman-Aiden et al. . (d) Boxplots showing the comparison of overall bias between DNase Hi-C and RE Hi-C libraries (two biological replicates). The total number of long-range (>20 kb intra- and inter-chromosomal) contacts associated with each bin was computed, divided by the overall mean and plotted for each library at a resolution of 40 kb. Whisker widths are w= 1 and outliers are not shown. (e) Comparison of genome coverage by DNase Hi-C and RE-based Hi-C libraries. The percent of the genome covered with at least one read (long-range (> 1 kb), uniquely mapped, nonredundant read pairs) is shown for two DNase Hi-C libraries (H1 ESCs and K562). Each track measures paired-end reads subsampled to 15 M and 30 M (subsampling repeated 20 times for each number, standard deviation is negligible) for each library and the last measurement corresponds to the full library sequencing depth. Dashed line indicates the maximum theoretical coverage of the human genome (hg19) by a Hi-C library generated by using the HindIII enzyme.

Figure 1

Figure 1. Validation of DNase Hi-C

(a) Overview of DNase Hi-C and targeted DNase Hi-C. For details see Online Methods. (b) Boxplots showing the comparison of chromatin accessibility (DHSs)-associated biases between DNase Hi-C (dark blue) and RE Hi-C libraries (light blue; for details see Supplementary Note 4). Whisker widths are w= 0.5 and outliers are not shown. Data of the two biological replicates of H1 ESC HindIII Hi-C libraries are from Dixon et al. and the K562 HindIII Hi-C library is from Lieberman-Aiden et al. . (c) Boxplots showing the comparison of chromatin accessibility bias at the scale of open/closed chromatin compartment between DNase Hi-C and RE Hi-C libraries. The ratio of observed over expected read coverage (Supplementary Note 4) of each 1 Mb-window located in the active (Open) or inactive (Closed) compartments was computed and shown here for both DNase (dark blue) and HindIII (light blue) Hi-C K562 libraries. Whisker widths are w= 0.5 and outliers are not shown. Both the compartment calls and the RE-based Hi-C data for K562 cells are from Lieberman-Aiden et al. . (d) Boxplots showing the comparison of overall bias between DNase Hi-C and RE Hi-C libraries (two biological replicates). The total number of long-range (>20 kb intra- and inter-chromosomal) contacts associated with each bin was computed, divided by the overall mean and plotted for each library at a resolution of 40 kb. Whisker widths are w= 1 and outliers are not shown. (e) Comparison of genome coverage by DNase Hi-C and RE-based Hi-C libraries. The percent of the genome covered with at least one read (long-range (> 1 kb), uniquely mapped, nonredundant read pairs) is shown for two DNase Hi-C libraries (H1 ESCs and K562). Each track measures paired-end reads subsampled to 15 M and 30 M (subsampling repeated 20 times for each number, standard deviation is negligible) for each library and the last measurement corresponds to the full library sequencing depth. Dashed line indicates the maximum theoretical coverage of the human genome (hg19) by a Hi-C library generated by using the HindIII enzyme.

Figure 1

Figure 1. Validation of DNase Hi-C

(a) Overview of DNase Hi-C and targeted DNase Hi-C. For details see Online Methods. (b) Boxplots showing the comparison of chromatin accessibility (DHSs)-associated biases between DNase Hi-C (dark blue) and RE Hi-C libraries (light blue; for details see Supplementary Note 4). Whisker widths are w= 0.5 and outliers are not shown. Data of the two biological replicates of H1 ESC HindIII Hi-C libraries are from Dixon et al. and the K562 HindIII Hi-C library is from Lieberman-Aiden et al. . (c) Boxplots showing the comparison of chromatin accessibility bias at the scale of open/closed chromatin compartment between DNase Hi-C and RE Hi-C libraries. The ratio of observed over expected read coverage (Supplementary Note 4) of each 1 Mb-window located in the active (Open) or inactive (Closed) compartments was computed and shown here for both DNase (dark blue) and HindIII (light blue) Hi-C K562 libraries. Whisker widths are w= 0.5 and outliers are not shown. Both the compartment calls and the RE-based Hi-C data for K562 cells are from Lieberman-Aiden et al. . (d) Boxplots showing the comparison of overall bias between DNase Hi-C and RE Hi-C libraries (two biological replicates). The total number of long-range (>20 kb intra- and inter-chromosomal) contacts associated with each bin was computed, divided by the overall mean and plotted for each library at a resolution of 40 kb. Whisker widths are w= 1 and outliers are not shown. (e) Comparison of genome coverage by DNase Hi-C and RE-based Hi-C libraries. The percent of the genome covered with at least one read (long-range (> 1 kb), uniquely mapped, nonredundant read pairs) is shown for two DNase Hi-C libraries (H1 ESCs and K562). Each track measures paired-end reads subsampled to 15 M and 30 M (subsampling repeated 20 times for each number, standard deviation is negligible) for each library and the last measurement corresponds to the full library sequencing depth. Dashed line indicates the maximum theoretical coverage of the human genome (hg19) by a Hi-C library generated by using the HindIII enzyme.

Figure 1

Figure 1. Validation of DNase Hi-C

(a) Overview of DNase Hi-C and targeted DNase Hi-C. For details see Online Methods. (b) Boxplots showing the comparison of chromatin accessibility (DHSs)-associated biases between DNase Hi-C (dark blue) and RE Hi-C libraries (light blue; for details see Supplementary Note 4). Whisker widths are w= 0.5 and outliers are not shown. Data of the two biological replicates of H1 ESC HindIII Hi-C libraries are from Dixon et al. and the K562 HindIII Hi-C library is from Lieberman-Aiden et al. . (c) Boxplots showing the comparison of chromatin accessibility bias at the scale of open/closed chromatin compartment between DNase Hi-C and RE Hi-C libraries. The ratio of observed over expected read coverage (Supplementary Note 4) of each 1 Mb-window located in the active (Open) or inactive (Closed) compartments was computed and shown here for both DNase (dark blue) and HindIII (light blue) Hi-C K562 libraries. Whisker widths are w= 0.5 and outliers are not shown. Both the compartment calls and the RE-based Hi-C data for K562 cells are from Lieberman-Aiden et al. . (d) Boxplots showing the comparison of overall bias between DNase Hi-C and RE Hi-C libraries (two biological replicates). The total number of long-range (>20 kb intra- and inter-chromosomal) contacts associated with each bin was computed, divided by the overall mean and plotted for each library at a resolution of 40 kb. Whisker widths are w= 1 and outliers are not shown. (e) Comparison of genome coverage by DNase Hi-C and RE-based Hi-C libraries. The percent of the genome covered with at least one read (long-range (> 1 kb), uniquely mapped, nonredundant read pairs) is shown for two DNase Hi-C libraries (H1 ESCs and K562). Each track measures paired-end reads subsampled to 15 M and 30 M (subsampling repeated 20 times for each number, standard deviation is negligible) for each library and the last measurement corresponds to the full library sequencing depth. Dashed line indicates the maximum theoretical coverage of the human genome (hg19) by a Hi-C library generated by using the HindIII enzyme.

Figure 2

Figure 2. Validation of targeted DNase Hi-C

(a) A profile of targeted DNase Hi-C contacts within 250 kb of the HS2-HS3 region of the beta-globin LCR in H1 and K562 cells. Red arrows indicate the position of the target in each domainogram. The geometric mean of read coverage in 5 kb sliding windows (computed in overlapping offsets of 1 kb) in each domainogram is indicated in the y-axis. The color scale of each domainogram was set according to the range of geometric means in 12 kb windows (also computed with 1 kb offsets), and the contact frequency corresponding to the color scale is indicated. In the spline fitting plots, the high-confidence bins (red dots) and those with FDR<0.05 (blue dots) and FDR<0.1 (green dots) are indicated, and the positions of merged high-confidence contacts are highlighted with pink bars. High confidence contacts identified by targeted DNase Hi-C, DNase hypersensitive sites (DHS) track from the UCSC Genome Browser using data from the ENCODE Project Consortium, DNase Hi-C read coverage (at 1 bp resolution) and topological domains and the virtual 4C of the target region generated from H1 or K562 DNase Hi-C dataset and the RefSeq genes are shown. The beta-globin genes are highlighted in bright brown. (b) Reproducibility of the contact profiles of the Nanog promoter. Intra-chromosomal contacts within 250 kb of the target region are shown by domainograms. The contact frequency (geometric mean of the window coverage) corresponding to the color scale is indicated. Red arrows indicate the position of the target. The GDF3-DPPA3-NANOG locus is highlighted in green. The virtual 4C of the target region generated from H1 or K562 DNase Hi-C dataset and the RefSeq genes are also shown.

Figure 2

Figure 2. Validation of targeted DNase Hi-C

(a) A profile of targeted DNase Hi-C contacts within 250 kb of the HS2-HS3 region of the beta-globin LCR in H1 and K562 cells. Red arrows indicate the position of the target in each domainogram. The geometric mean of read coverage in 5 kb sliding windows (computed in overlapping offsets of 1 kb) in each domainogram is indicated in the y-axis. The color scale of each domainogram was set according to the range of geometric means in 12 kb windows (also computed with 1 kb offsets), and the contact frequency corresponding to the color scale is indicated. In the spline fitting plots, the high-confidence bins (red dots) and those with FDR<0.05 (blue dots) and FDR<0.1 (green dots) are indicated, and the positions of merged high-confidence contacts are highlighted with pink bars. High confidence contacts identified by targeted DNase Hi-C, DNase hypersensitive sites (DHS) track from the UCSC Genome Browser using data from the ENCODE Project Consortium, DNase Hi-C read coverage (at 1 bp resolution) and topological domains and the virtual 4C of the target region generated from H1 or K562 DNase Hi-C dataset and the RefSeq genes are shown. The beta-globin genes are highlighted in bright brown. (b) Reproducibility of the contact profiles of the Nanog promoter. Intra-chromosomal contacts within 250 kb of the target region are shown by domainograms. The contact frequency (geometric mean of the window coverage) corresponding to the color scale is indicated. Red arrows indicate the position of the target. The GDF3-DPPA3-NANOG locus is highlighted in green. The virtual 4C of the target region generated from H1 or K562 DNase Hi-C dataset and the RefSeq genes are also shown.

Figure 3

Figure 3. The intra-chromosomal contact profile within 500 kb of the HOTAIR promoter in H1 and K562 cells

The color scale of each domainogram is indicated. The spline fitting plots, the high confidence contacts, the DHS track, the DNase Hi-C read coverage (at 1 bp resolution) and topological domains are also shown.

Figure 3

Figure 3. The intra-chromosomal contact profile within 500 kb of the HOTAIR promoter in H1 and K562 cells

The color scale of each domainogram is indicated. The spline fitting plots, the high confidence contacts, the DHS track, the DNase Hi-C read coverage (at 1 bp resolution) and topological domains are also shown.

Figure 3

Figure 3. The intra-chromosomal contact profile within 500 kb of the HOTAIR promoter in H1 and K562 cells

The color scale of each domainogram is indicated. The spline fitting plots, the high confidence contacts, the DHS track, the DNase Hi-C read coverage (at 1 bp resolution) and topological domains are also shown.

Figure 3

Figure 3. The intra-chromosomal contact profile within 500 kb of the HOTAIR promoter in H1 and K562 cells

The color scale of each domainogram is indicated. The spline fitting plots, the high confidence contacts, the DHS track, the DNase Hi-C read coverage (at 1 bp resolution) and topological domains are also shown.

Figure 4

Figure 4. Identification of lincRNA promoter-associated _cis_-elements

(a) Percentage of the 7-8 chromatin state labels (Supplementary Table 18) that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. Z-scores and p-values were calculated using the Genome Structure Correlation (GSC). Red indicates enrichment, and blue represents depletion. (b) Percentage of the lincRNA promoter-associated target partners that overlap with the various chromatin state labels in H1 and K562 cells. (c) Percentage of the DHSs and FAIRE peak regions that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. In (a), (b), and (c), *3<|Z-score|<5, **|Z- score|≥5. In each cell line, the combined targeted DNase Hi-C library of the two biological replicates was used for these analyses.

Figure 4

Figure 4. Identification of lincRNA promoter-associated _cis_-elements

(a) Percentage of the 7-8 chromatin state labels (Supplementary Table 18) that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. Z-scores and p-values were calculated using the Genome Structure Correlation (GSC). Red indicates enrichment, and blue represents depletion. (b) Percentage of the lincRNA promoter-associated target partners that overlap with the various chromatin state labels in H1 and K562 cells. (c) Percentage of the DHSs and FAIRE peak regions that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. In (a), (b), and (c), *3<|Z-score|<5, **|Z- score|≥5. In each cell line, the combined targeted DNase Hi-C library of the two biological replicates was used for these analyses.

Figure 4

Figure 4. Identification of lincRNA promoter-associated _cis_-elements

(a) Percentage of the 7-8 chromatin state labels (Supplementary Table 18) that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. Z-scores and p-values were calculated using the Genome Structure Correlation (GSC). Red indicates enrichment, and blue represents depletion. (b) Percentage of the lincRNA promoter-associated target partners that overlap with the various chromatin state labels in H1 and K562 cells. (c) Percentage of the DHSs and FAIRE peak regions that overlap with the lincRNA promoter-associated target partners in H1 and K562 cells. In (a), (b), and (c), *3<|Z-score|<5, **|Z- score|≥5. In each cell line, the combined targeted DNase Hi-C library of the two biological replicates was used for these analyses.

Figure 5

Figure 5. Characterization of contacts connecting lincRNA promoters to super-enhancers

(a) Enrichment of super-enhancers in all (“overall”), active, or inactive lincRNA promoter-associated target partners. Enrichments are computed with respect to an artificial genome comprised of regions <10 Mb away from one of the 998 target loci (Online Methods). *3<|Z- score|<5, **|Z- score|≥5. (b) Left panel: distribution of the number of the associated promoters per super-enhancer in H1 and K562 cells. Right panel: distribution of the number of associated super-enhancers per target lincRNA promoter. (c) Distribution of genomic distances separating the target lincRNA promoters and their associated super-enhancers in H1 and K562 cells. (d) Receiver operating characteristic curves showing that the expression levels of lincRNA genes associated with super-enhancers is higher than those not associated with super-enhancers in H1 and K562 cells. Reported p-values are from Wilcoxon rank sum test.

Figure 5

Figure 5. Characterization of contacts connecting lincRNA promoters to super-enhancers

(a) Enrichment of super-enhancers in all (“overall”), active, or inactive lincRNA promoter-associated target partners. Enrichments are computed with respect to an artificial genome comprised of regions <10 Mb away from one of the 998 target loci (Online Methods). *3<|Z- score|<5, **|Z- score|≥5. (b) Left panel: distribution of the number of the associated promoters per super-enhancer in H1 and K562 cells. Right panel: distribution of the number of associated super-enhancers per target lincRNA promoter. (c) Distribution of genomic distances separating the target lincRNA promoters and their associated super-enhancers in H1 and K562 cells. (d) Receiver operating characteristic curves showing that the expression levels of lincRNA genes associated with super-enhancers is higher than those not associated with super-enhancers in H1 and K562 cells. Reported p-values are from Wilcoxon rank sum test.

Figure 5

Figure 5. Characterization of contacts connecting lincRNA promoters to super-enhancers

(a) Enrichment of super-enhancers in all (“overall”), active, or inactive lincRNA promoter-associated target partners. Enrichments are computed with respect to an artificial genome comprised of regions <10 Mb away from one of the 998 target loci (Online Methods). *3<|Z- score|<5, **|Z- score|≥5. (b) Left panel: distribution of the number of the associated promoters per super-enhancer in H1 and K562 cells. Right panel: distribution of the number of associated super-enhancers per target lincRNA promoter. (c) Distribution of genomic distances separating the target lincRNA promoters and their associated super-enhancers in H1 and K562 cells. (d) Receiver operating characteristic curves showing that the expression levels of lincRNA genes associated with super-enhancers is higher than those not associated with super-enhancers in H1 and K562 cells. Reported p-values are from Wilcoxon rank sum test.

Figure 5

Figure 5. Characterization of contacts connecting lincRNA promoters to super-enhancers

(a) Enrichment of super-enhancers in all (“overall”), active, or inactive lincRNA promoter-associated target partners. Enrichments are computed with respect to an artificial genome comprised of regions <10 Mb away from one of the 998 target loci (Online Methods). *3<|Z- score|<5, **|Z- score|≥5. (b) Left panel: distribution of the number of the associated promoters per super-enhancer in H1 and K562 cells. Right panel: distribution of the number of associated super-enhancers per target lincRNA promoter. (c) Distribution of genomic distances separating the target lincRNA promoters and their associated super-enhancers in H1 and K562 cells. (d) Receiver operating characteristic curves showing that the expression levels of lincRNA genes associated with super-enhancers is higher than those not associated with super-enhancers in H1 and K562 cells. Reported p-values are from Wilcoxon rank sum test.

Similar articles

Cited by

References

    1. Guelen L, et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 453:948–951. - PubMed
    1. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. - PubMed
    1. Dekker J, Marti-Renom MA, Mirny LA. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nature reviews. Genetics. 2013;14:390–403. - PMC - PubMed
    1. de Wit E, de Laat W. A decade of 3C technologies: insights into nuclear organization. Genes Dev. 2012;26:11–24. - PMC - PubMed
    1. Simonis M, et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature genetics. 2006;38:1348–1354. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources