Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes - PubMed (original) (raw)

Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes

Jill M Dowen et al. Cell. 2014.

Abstract

The pluripotent state of embryonic stem cells (ESCs) is produced by active transcription of genes that control cell identity and repression of genes encoding lineage-specifying developmental regulators. Here, we use ESC cohesin ChIA-PET data to identify the local chromosomal structures at both active and repressed genes across the genome. The results produce a map of enhancer-promoter interactions and reveal that super-enhancer-driven genes generally occur within chromosome structures that are formed by the looping of two interacting CTCF sites co-occupied by cohesin. These looped structures form insulated neighborhoods whose integrity is important for proper expression of local genes. We also find that repressed genes encoding lineage-specifying developmental regulators occur within insulated neighborhoods. These results provide insights into the relationship between transcriptional control of cell identity genes and control of local chromosome structure.

Copyright © 2014 Elsevier Inc. All rights reserved.

PubMed Disclaimer

Figures

Figure 1

Figure 1. DNA interactions involving cohesin

A) Units of chromosome organization. Chromosomes consist of multiple Topologically Associating Domains (TADs). TADs (image adapted from (Dixon et al., 2012)) contain multiple genes with DNA loops involving interactions between enhancers, promoters and other regulatory elements, which are mediated by cohesin (blue ring) and CTCF (purple balls). Nucleosomes represent the smallest unit of chromosome organization. B) Heatmap representation of ESC ChIP-seq data for SMC1, a merged dataset for the transcription factors OCT4, SOX2 and NANOG (OSN), MED12, RNA polymerase II (Pol2), H3K27me3, and CTCF at SMC1-occupied regions. Read density is displayed within a 10kb window and color scale intensities are shown in rpm/bp. Cohesin occupies three classes of sites: enhancer-promoter sites, Polycomb-occupied sites, and CTCF-occupied sites. C) ESC cohesin (SMC1) ChIA-PET data analysis at the Mycn locus. The algorithm used to identify paired-end tags (PETs) is described in detail in Extended Experimental Procedures. PETs and interactions involving enhancers and promoters within the window are displayed at each step in the analysis pipeline: unique PETs, PET peaks, interactions between PET peaks, and high-confidence interactions supported by at least 3 independent PETs and with a FDR of 0.01. D) Summary of the major classes of interactions and high-confidence interactions identified in the cohesin ChIA-PET data. Enhancers, promoters, and CTCF sites where interactions occur are displayed as blue circles, and the size of the circle is proportional to the number regions. The interactions between two sites are displayed as grey lines, and the thickness of the grey line is proportional to the number of interactions. The diagram on the left was generated using the interactions, and the diagram on the right was generated using the high confidence interactions. See also Figure S1, S2, Table S1, S2.

Figure 2

Figure 2. DNA interactions frequently occur within Topologically Associating Domains

A) An example Topologically Associating Domain (TAD) shown with normalized Hi-C interaction frequencies displayed as a two-dimensional heat map (Dixon et al., 2012) and the TAD is indicated as a grey bar. High-confidence SMC1 ChIAPET interactions are depicted as blue lines. B) Enrichment of CTCF, cohesin (SMC1), and PET peaks at TAD boundary regions. The metagene representation shows the number of regions per 10 kb window centered on the TAD boundary and +/− 500kb is displayed. C) Pie chart of high-confidence interactions that either fall within TADs (88%) or cross TAD boundaries (12%). D) High-confidence interactions are displayed as a two-dimensional heat map across a normalized TAD length for the ~2,200 TADs (Dixon et al., 2012). The display is centered on the normalized TAD and extends beyond each boundary to 10% of the size of the domain. See also Table S3A.

Figure 3

Figure 3. Super-enhancer Domain Structure

A) An example super-enhancer domain (SD) within a TAD. High-confidence SMC1 ChIA-PET interactions are depicted as blue lines. ChIP-Seq binding profiles (reads per million per base pair) for CTCF, cohesin (SMC1), and the master transcription factors OCT4, SOX2, and NANOG (OSN) are shown at the Lefty1 locus in ESCs. The super-enhancer is indicated by a red bar. B) Model of SD structure. The 197 SDs have interactions (blue) between cohesin-occupied CTCF sites that may serve as outer boundaries of the domain structure. SDs also contain interactions between super-enhancers and the promoters of their associated genes. C) Metagene analysis showing the occupancy of various factors at the key elements of TADs and SDs, including CTCF sites, super-enhancers and super-enhancer associated genes. ChIP-seq profiles are shown in reads per million per base pair. Boundary site metagenes are centered on the CTCF peak, and +/−2kb is displayed. Super-enhancer metadata is centered on the 195 super-enhancers in SDs and +/−3 kb is displayed. The data for associated genes are centered on the 219 super-enhancer -associated genes in SDs and +/−3kb is displayed. D) Heat map showing that cohesin ChIA-PET high-confidence interactions occur predominantly within the SDs. The density of high-confidence interactions is shown across a normalized SD length for the 197 SDs. E) Heat map showing that transcriptional proteins are contained within boundary sites of SDs. The occupancy of Mediator (MED12), H3K27ac and RNA polymerase II (Pol2) at super-enhancers and associated genes is shown across a normalized SD length for the 197 SDs. See also Figure S3, Table S4.

Figure 4

Figure 4. Super-enhancer Domains are functionally linked to gene expression

CRISPR-mediated genome editing of CTCF sites at five loci. The top of each panel shows high-confidence interactions depicted as blue lines, and ChIP-Seq binding profiles (reads per million per base pair) for CTCF, cohesin (SMC1), and OCT4, SOX2, and NANOG (OSN) in ESCs at the respective loci. The super-enhancer is indicated as a red bar. The bottom of each panel shows gene expression level of the indicated genes in wild type and CTCF site-deleted cells measured by qRT-PCR. Transcript levels were normalized to GAPDH. Gene expression was assayed in triplicate in at least two biological replicate samples, and is displayed as mean+SD. All P-values were determined using the Student's t-test. A) CRISPR-mediated genome editing of a CTCF site at the miR-290-295 locus. (P-value < 0.001, Pri-miR-290-295 and Nlrp12 in wild-type vs. CTCF site-deleted). B) CRISPR-mediated genome editing of a CTCF site at the Nanog locus. (P-value < 0.05, Nanog in wild-type vs. CTCF site-deleted). C) CRISPR-mediated genome editing of a CTCF site at the Tdgf1 locus. (P-value < 0.001, Gm590; P-value < 0.01, Lrrc2) in wild-type vs. CTCF site-deleted). D) CRISPR-mediated genome editing of a CTCF site at the Pou5f1 locus. (P-value < 0.012, H2Q-10 in wild-type vs. CTCF site-deleted). E) CRISPR-mediated genome editing of CTCF sites at the Prdm14 locus. (P-value < 0.001, Slco5a1 in wild-type vs. CTCF site-deleted). The CTCF-deletion lines at the Pou5f1 and Prdm14 (C1-2) loci are heterozygous, while the CTCF-deletion lines at the Nanog, Tdgf1 and miR-290-295 loci are homozygous for the mutation. See also Figure S4.

Figure 5

Figure 5. Polycomb Domain Structure

A) An example Polycomb Domain (PD) within a TAD. A high-confidence interaction is depicted as the blue line. ChIP-Seq binding profiles (reads per million per base pair) for CTCF, cohesin (SMC1), and H3K27me3 at the Gata2 locus in ESCs. B) Model of PD structure. The 349 PDs have interactions (blue) between CTCF sites that serve as putative boundaries of the domain structure. C) Metagene analysis reveals the occupancy of various factors at the key elements of TADs and PDs: CTCF sites and target genes. ChIP-seq profiles are shown in reads per million per base pair. Boundary site metagenes are centered on the CTCF peak and +/−2 kb is displayed. The metagenes depicting genes are centered on the 380 Polycomb target genes in PDs and +/−3 kb is displayed. D) Heat map showing that high-confidence interactions are largely constrained within PDs. The density of high-confidence interactions is shown across a normalized PD length for the 349 PDs. E) Heat map showing that Polycomb proteins are contained within boundary sites of PDs. The occupancy of CTCF, H3K27me3, SUZ12 and EZH2 is indicated within a 10 kb window centered on the left and right CTCF-occupied boundary regions is shown for the 120 PDs with this transition pattern. F) CRISPR-mediated genome editing of a CTCF site at the Tcfa2e locus. Top, high-confidence interactions are depicted by blue lines and ChIP-Seq binding profiles (reads per million per base pair) for CTCF, cohesin (SMC1), and H3K27me3 are shown in ESCs. Bottom, Expression level of the indicated genes in wild type and CTCF site-deleted cells measured by qRT-PCR. Transcript levels were normalized to GAPDH. Gene expression was assayed in triplicate in at least two biological replicate samples and is displayed as mean+SD (P-value < 0.05, Tcfap2e in C1 deletion cells; P-value < 0.001, Tcfap2e in C2 deletion cells_)_ in wild-type vs. CTCF site-deleted). P-values were determined using the Student's t-test. See also Figure S5, Table S5.

Figure 6

Figure 6. Insulated Neighborhoods are preserved in multiple cell types

A) Model depicting constitutive domain organization, mediated by interaction of two CTCF sites co-occupied by cohesin, in two cell types. B) An example SD in ESCs and a domain in NPCs. High-confidence interactions from the SMC1 ChIA-PET dataset are depicted by blue lines and 5C interactions from (Phillips-Cremins et al., 2013) are depicted by black lines. Super-enhancers are indicated by red bars. ChIP-Seq binding profiles (reads per million per base pair) for CTCF, cohesin (SMC1), and OCT4, SOX2, and NANOG (OSN), SOX2 and BRN2 are shown at the Nanog locus and the Olig1/Olig2 locus in ESCs and NPCs. C) Occupancy of CTCF peaks across 18 cell types. The CTCF peaks used for the analysis are the CTCF peaks found in ESCs. The percentage of these peaks that are observed in the indicated number of cell types is shown for four groups of CTCF sites: all CTCF peaks identified in ESCs, CTCF peaks at SD boundaries in ESCs, CTCF peaks at PD boundaries in ESCs, and CTCF peaks at PET peaks (identified by SMC1 ChIA-PET in ESCs). See also Figure S6, Table S3B.

Comment in

Similar articles

Cited by

References

    1. Baranello L, Kouzine F, Levens D. CTCF and cohesin cooperate to organize the 3D structure of the mammalian genome. Proceedings of the National Academy of Sciences of the United States of America. 2014;111:889–890. - PMC - PubMed
    1. Bell AC, West AG, Felsenfeld G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell. 1999;98:387–396. - PubMed
    1. Boyer LA, Plath K, Zeitlinger J, Brambrink T, Medeiros LA, Lee TI, Levine SS, Wernig M, Tajonar A, Ray MK, et al. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature. 2006;441:349–353. - PubMed
    1. Bracken AP, Dietrich N, Pasini D, Hansen KH, Helin K. Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes & development. 2006;20:1123–1136. - PMC - PubMed
    1. Cavalli G, Misteli T. Functional implications of genome topology. Nature structural & molecular biology. 2013;20:290–299. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources