Molecular basis for 5-carboxycytosine recognition by RNA polymerase II elongation complex - PubMed (original) (raw)

. 2015 Jul 30;523(7562):621-5.

doi: 10.1038/nature14482. Epub 2015 Jun 29.

Affiliations

Molecular basis for 5-carboxycytosine recognition by RNA polymerase II elongation complex

Lanfeng Wang et al. Nature. 2015.

Abstract

DNA methylation at selective cytosine residues (5-methylcytosine (5mC)) and their removal by TET-mediated DNA demethylation are critical for setting up pluripotent states in early embryonic development. TET enzymes successively convert 5mC to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC), with 5fC and 5caC subject to removal by thymine DNA glycosylase (TDG) in conjunction with base excision repair. Early reports indicate that 5fC and 5caC could be stably detected on enhancers, promoters and gene bodies, with distinct effects on gene expression, but the mechanisms have remained elusive. Here we determined the X-ray crystal structure of yeast elongating RNA polymerase II (Pol II) in complex with a DNA template containing oxidized 5mCs, revealing specific hydrogen bonds between the 5-carboxyl group of 5caC and the conserved epi-DNA recognition loop in the polymerase. This causes a positional shift for incoming nucleoside 5'-triphosphate (NTP), thus compromising nucleotide addition. To test the implication of this structural insight in vivo, we determined the global effect of increased 5fC/5caC levels on transcription, finding that such DNA modifications indeed retarded Pol II elongation on gene bodies. These results demonstrate the functional impact of oxidized 5mCs on gene expression and suggest a novel role for Pol II as a specific and direct epigenetic sensor during transcription elongation.

PubMed Disclaimer

Figures

Extended Data Figure 1

Extended Data Figure 1. Electron density maps of Pol II EC-I and EC-II

a, 2Fo-Fc map (blue) of Rpb2 Q531 in epi-DNA recognition loop and the opposite 5caC in Pol II EC-I, contoured at 1.0 sigma. b, Fo-Fc omit map (green) of Pol II EC-I (with 5caC omission), contoured at 3.0 sigma. c, 2Fo-Fc map (blue) of GMPCPP paired with 5caC in Pol II EC-II, contoured at 1.0 sigma. d, Fo-Fc omit map (green) of Pol II EC-II (with GMPCPP and 5caC omission), contoured at 3.0 sigma.

Extended Data Figure 2

Extended Data Figure 2. Structural comparison between Pol II EC-I, EC-II and Pol II EC containing unmodified C template and a matched GTP

a, Superimposition of Pol II EC-I and EC-II structures. Rpb2 Q531 and 5caC in EC-II are in magenta to differentiate between those counterparts in EC-I. These two structures are aligned using bridge helix region (Rpb1 822–840). b, Superposition of Pol II EC-II containing 5caC template and GMPCPP with Pol II EC with closed trigger loop (containing unmodified C template and GTP, PDB: 2E2H). The two structures are aligned using bridge helix region (Rpb1 822–840).

Extended Data Figure 3

Extended Data Figure 3. Kinetic study of GTP incorporation opposite 5caC template by purified Pol II proteins

Representative kinetic parameters fitting curves from three independent experiments for GTP incorporation opposite 5caC template for Pol II wt (a), Pol II Q531H (b), and Pol II Q531A (c), respectively. (d) Purified Pol II wt, Pol II Q531H, and Pol II Q531A proteins used in the in vitro transcription experiments.

Extended Data Figure 4

Extended Data Figure 4. Modeling potential similar interaction for recognition of 5fC and 5caC templates, but not for 5hmC, 5mC and C templates

a, Hydrogen bonds (black dotted lines) between Rpb2 Q531, 5caC, and GMPCPP in EC-II. b, Model of the interaction between Pol II EC with 5fC template through the same hydrogen bonds interaction network. c, Model of Pol II EC with 5hmC template reveals no obvious hydrogen bonding between Q531 and 5hmC. The 5hmC nucleotide structure was based on PDB: 4R2C. d, Model of Pol II EC with 5mC template. e, Model of Pol II EC with unmodified C template. The above models were derived from the Pol II EC-II structure.

Extended Data Figure 5

Extended Data Figure 5. Sequence alignment of Pol II epi-DNA recognition loop across different species

a, Pol II epi-DNA recognition loop (Rpb2 521–541) is conserved from fungi to human and strictly conserved among several fungal species highlighted with magenta dotted rectangle, which contains active TET/JBP enzymes. Key residues in the loop were highlighted in green box. b, Hydrogen bonds (black dotted lines) between yeast Pol II Rpb2 Q531, 5caC, and GMPCPP in EC-II. c, Model of human Pol II with the functionally equivalent His substitution based on EC-II structure. d, Comparison between Q531 and H531 substitution reveals the similar hydrogen bonding interaction.

Extended Data Figure 6

Extended Data Figure 6. Human Pol II slows down at 5caC template in comparison with unmodified template in the content of HeLa nuclear extract

The relative transcription elongation rate is normalized by the transcription elongation rate (kobs) from unmodified template. The relative rate from unmodified template and 5caC template are colored in black and gray, respectively. The error bars are standard deviations derived from three independent experiments.

Extended Data Figure 7

Extended Data Figure 7. Comparison of purified yeast Pol II (upper panel) and E. coli RNAP (lower panel) transcription on 5caC template in comparison with unmodified template

Time points are 0, 5 s, 15 s, 30 s, 1 min, 5 min, 20 min, and 1 hr (left to right). The upper panel is identical to Fig. 1c and is placed here for direct comparison.

Extended Data Figure 8

Extended Data Figure 8. Correlation between two replicates of GRO-seq data sets at different assay points

GRO-seq replicates (−1 and −2) were pairwise compared gene by gene on the normalized number of reads (rpm: reads per million total reads) for WT (left) and TDG KO (right) samples. The colors show the density of points or genes. The Pearson correlation coefficient were calculated from the points and shown on the top of each subfigure.

Figure 1

Figure 1

Pol II directly recognizes 5caC during transcription. a, Epigenetic modification cycle of cytosine. Cytosine (C), 5-methylcytosine (5mC), 5-hydroxylmethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC). b, The RNA/DNA scaffold used in both structural and biochemical analysis. C* stands for 5caC residue. c, Impeded Pol II elongation on the 5caC-containing template relative to the unmodified C template. Time points are 0, 5 s, 15 s, 30 s, 1 min, 5 min, 20 min, and 1 hr (left to right). d, The overall Pol II EC structure containing a site-specific 5caC (EC-I). Color-coded are template DNA (blue), non-template DNA (green), and RNA (red). The two 5caC conformers are highlighted in yellow and cyan, respectively. Part of bridge helix (BH) (Rpb1 822–840) is highlighted in green and the rest of Pol II subunits are in gray (Rpb2 is omitted). The addition site is represented by a dotted oval. e, The midway 5caC interacts with the Rpb2 Q531 residue via hydrogen bonds (black dotted lines). The epi-DNA recognition loop (fork loop 3) (Rpb2 521–541) is shown in cyan. f, The Q531 side chain rotates 90 degrees to form hydrogen bonds with 5caC. Pol II EC-I is superimposed with the Pol II EC containing an unmodified DNA template in post-translocation state (PDB: 1SFO). The fork loop 3 region of Pol II EC (1SFO) is shown in orange. g–h, Comparison of two 5caC conformers (cyan or yellow) with the corresponding canonical template nucleotide (bluewhite).

Figure 2

Figure 2

Interaction between 5caC and epi-DNA recognition loop compromises GTP incorporation. a, The Pol II EC structure containing a matched GMPCPP opposite 5caC site (EC-II). The color codes are the same as Fig. 1 except for 5caC (yellow) and GMPCPP (orange). b–d, The GMPCPP:5caC base pair is shifted toward the downstream main channel from the canonical GMPCPP:dC position (PDB: 2E2J). The side chain of Rpb2 Q531 rotates 100 degrees to interact with 5caC (b and c). e–g, Comparison of catalytic rate constants (_k_pol) (e), substrate dissociation constants _K_d,app (f), and specificity constants (kpol/Kd,app) (g) of GTP incorporation opposite 5caC template by wt, Q531H, and Q531A Pol II, respectively. The mean values are presented and error bars are standard deviations derived from three independent experiments.

Figure 3

Figure 3

Similar “above-the-bridge-helix” translocation intermediates captured in pausing/arrested Pol II ECs and a common 5caC-recognition mode shared by a variety of 5caC-recognition proteins. a–c. Superimposition of 5caC-paused Pol II EC with CPD-lesion-arrested EC (PDB: 4A93) (a), pyriplatin-lesion-arrested EC (PDB: 3M4O) (b), and α-amanitin-arrested EC (PDB: 2VUM) (c), respectively. The similar “above-the-bridge-helix” translocation intermediates region for accommodation of i+1 5caC (yellow) and DNA lesion (or translocation intermediate captured by α-amanitin) (blutewhite) is highlighted by a red-dotted oval. The damage-arrested or α-amanitin-arrested Pol II ECs are shown in gray. d–f, The conserved interactions and residue involved 5caC recognition by Pol II (Rpb 2-Q531) (d), Wilms tumor protein 1 (Q369, PDB: 4R2R) (e), and human thymine DNA glycosylase (N157, PDB: 3UO7) (f).

Figure 4

Figure 4

Impact of 5fC/5caC on Pol II transcription elongation in mouse embryonic stem cells (mESCs). a, Scheme of the DRB releasing assay. Wt and TDG-knockout mESCs were treated with DRB followed by washing out DRB to allow transcription for 10, 20, or 30 min. No DRB treatment (NODRB) or 3 hr DRB treatment (DRB3H) were performed as controls. All experiments were performed in duplicate and reproducibility was evident in all pairwise comparisons (Extended Data Fig. 8). b, The GRO-seq data on the representative Myo1e gene. Elevated 5fC/5caC levels in TDG-KO mESCs are derived from the published ChIP-seq data in duplicate. c, Comparative metagene analysis of GRO-seq signals between WT (upper) and KO mESCs (bottom). Dashed and non-dashed lines show the middle points of the ensemble transcription waves in WT and KO mESCs, respectively. d, Pairwise comparisons of the GRO-seq density (reads per million) of individual genes in the +/−10 kb window around different middle points between WT (x-axis) and KO cells (y-axis) in c (10M, 20M, 30M in cyan) with the NODRB data (red) as control. The coefficients are the slopes of the lines from linear regression on the scattered points. The _p_-values were calculated based on one-sided Kolmogorov-Smirnov test of comparing read density ratio (KO/WT) at 30 min. N: number of genes. e, Correlation between increased 5fC/5caC levels and retarded transcription elongation. Genes were divided into two groups according to increased 5fC/5caC levels in the gene bodies (low in group 1 and high in group 2). The numbers correspond to the middle point positions (bp) of the ensemble transcription waves relative to TSS in WT versus KO mESCs.

Comment in

Similar articles

Cited by

References

    1. Pastor WA, Aravind L, Rao A. TETonic shift: biological roles of TET proteins in DNA demethylation and transcription. Nat Rev Mol Cell Biol. 2013;14:341–356. - PMC - PubMed
    1. Wu H, Zhang Y. Reversing DNA methylation: mechanisms, genomics, and biological functions. Cell. 2014;156:45–68. - PMC - PubMed
    1. Tahiliani M, et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009;324:930–935. - PMC - PubMed
    1. Pfaffeneder T, et al. The discovery of 5-formylcytosine in embryonic stem cell DNA. Angew Chem Int Ed. 2011;50:7008–7012. - PubMed
    1. Ito S, et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333:1300–1303. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources