Intragenic DNA methylation prevents spurious transcription initiation (original) (raw)

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Robertson, K. D. DNA methylation, methyltransferases, and cancer. Oncogene 20, 3139–3155 (2001)
    Article CAS Google Scholar
  2. Chen, Z. X. & Riggs, A. D. DNA methylation and demethylation in mammals. J. Biol. Chem. 286, 18347–18353 (2011)
    Article CAS Google Scholar
  3. Neri, F. et al. Single-base resolution analysis of 5-formyl and 5-carboxyl cytosine reveals promoter DNA methylation dynamics. Cell Reports 10, 674–683 (2015)
    Article CAS Google Scholar
  4. Neri, F. et al. TET1 is a tumour suppressor that inhibits colon cancer growth by derepressing inhibitors of the WNT pathway. Oncogene 34, 4168–4176 (2015)
    Article CAS Google Scholar
  5. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247–257 (1999)
    Article CAS Google Scholar
  6. Bestor, T. H. The DNA methyltransferases of mammals. Hum. Mol. Genet. 9, 2395–2402 (2000)
    Article CAS Google Scholar
  7. Schübeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015)
    Article ADS Google Scholar
  8. Jeltsch, A. & Jurkowska, R. Z. New concepts in DNA methylation. Trends Biochem. Sci. 39, 310–318 (2014)
    Article CAS Google Scholar
  9. Neri, F. et al. Dnmt3L antagonizes DNA methylation at bivalent promoters and favors DNA methylation at gene bodies in ESCs. Cell 155, 121–134 (2013)
    Article CAS Google Scholar
  10. Baubec, T. et al. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature 520, 243–247 (2015)
    Article ADS CAS Google Scholar
  11. Morselli, M. et al. In vivo targeting of de novo DNA methylation by histone modifications in yeast and mouse. eLife 4, e06205 (2015)
    Article Google Scholar
  12. Edmunds, J. W., Mahadevan, L. C. & Clayton, A. L. Dynamic histone H3 methylation during gene induction: HYPB/Setd2 mediates all H3K36 trimethylation. EMBO J. 27, 406–420 (2008)
    Article CAS Google Scholar
  13. Yoh, S. M., Lucas, J. S. & Jones, K. A. The Iws1:Spt6:CTD complex controls cotranscriptional mRNA biosynthesis and HYPB/Setd2-mediated histone H3K36 methylation. Genes Dev. 22, 3422–3434 (2008)
    Article CAS Google Scholar
  14. Wagner, E. J. & Carpenter, P. B. Understanding the language of Lys36 methylation at histone H3. Nat. Rev. Mol. Cell Biol. 13, 115–126 (2012)
    Article CAS Google Scholar
  15. Carrozza, M. J. et al. Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell 123, 581–592 (2005)
    Article CAS Google Scholar
  16. Carvalho, S. et al. Histone methyltransferase SETD2 coordinates FACT recruitment with nucleosome dynamics during transcription. Nucleic Acids Res. 41, 2881–2893 (2013)
    Article CAS Google Scholar
  17. Maunakea, A. K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253–257 (2010)
    Article ADS CAS Google Scholar
  18. Maderious, A. & Chen-Kiang, S. Pausing and premature termination of human RNA polymerase II during transcription of adenovirus in vivo and in vitro. Proc. Natl Acad. Sci. USA 81, 5931–5935 (1984)
    Article ADS CAS Google Scholar
  19. Yankulov, K., Yamashita, K., Roy, R., Egly, J. M. & Bentley, D. L. The transcriptional elongation inhibitor 5,6-dichloro-1-β-d-ribofuranosylbenzimidazole inhibits transcription factor IIH-associated protein kinase. J. Biol. Chem. 270, 23922–23925 (1995)
    Article CAS Google Scholar
  20. Bochnig, P., Reuter, R., Bringmann, P. & Lührmann, R. A monoclonal antibody against 2,2,7-trimethylguanosine that reacts with intact, class U, small nuclear ribonucleoproteins as well as with 7-methylguanosine-capped RNAs. Eur. J. Biochem. 168, 461–467 (1987)
    Article CAS Google Scholar
  21. Deana, A., Celesnik, H. & Belasco, J. G. The bacterial enzyme RppH triggers messenger RNA degradation by 5′ pyrophosphate removal. Nature 451, 355–358 (2008)
    Article ADS CAS Google Scholar
  22. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006)
    Article CAS Google Scholar
  23. Butler, J. E. F. & Kadonaga, J. T. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592 (2002)
    Article CAS Google Scholar
  24. Clark, S. J., Harrison, J. & Molloy, P. L. Sp1 binding is inhibited by mCpmCpG methylation. Gene 195, 67–71 (1997)
    Article CAS Google Scholar
  25. Douet, V., Heller, M. B. & Le Saux, O. DNA methylation and Sp1 binding determine the tissue-specific transcriptional activity of the mouse Abcc6 promoter. Biochem. Biophys. Res. Commun. 354, 66–71 (2007)
    Article CAS Google Scholar
  26. Hogart, A. et al. Genome-wide DNA methylation profiles in hematopoietic stem and progenitor cells reveal overrepresentation of ETS transcription factor binding sites. Genome Res. 22, 1407–1418 (2012)
    Article CAS Google Scholar
  27. Uchiumi, F., Miyazaki, S. & Tanuma, S. The possible functions of duplicated ets (GGAA) motifs located near transcription start sites of various human genes. Cell. Mol. Life Sci. 68, 2039–2051 (2011)
    Article CAS Google Scholar
  28. Yu, M. et al. GA-binding protein-dependent transcription initiator elements. Effect of helical spacing between polyomavirus enhancer a factor 3(PEA3)/Ets-binding sites on initiator activity. J. Biol. Chem. 272, 29060–29067 (1997)
    Article CAS Google Scholar
  29. Gowher, H. & Jeltsch, A. Molecular enzymology of the catalytic domains of the Dnmt3a and Dnmt3b DNA methyltransferases. J. Biol. Chem. 277, 20409–20414 (2002)
    Article CAS Google Scholar
  30. Tani, H. & Akimitsu, N. Genome-wide technology for determining RNA stability in mammalian cells: historical perspective and recent advantages based on modified nucleotide labeling. RNA Biol. 9, 1233–1238 (2012)
    Article CAS Google Scholar
  31. Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011)
    Article CAS Google Scholar
  32. Maunakea, A. K., Chepelev, I., Cui, K. & Zhao, K. Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Res. 23, 1256–1269 (2013)
    Article CAS Google Scholar
  33. Yearim, A. et al. HP1 is involved in regulating the global impact of DNA methylation on alternative splicing. Cell Reports 10, 1122–1134 (2015)
    Article CAS Google Scholar
  34. Jones, P. A. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492 (2012)
    Article CAS Google Scholar
  35. Jones, P. A. & Baylin, S. B. The fundamental role of epigenetic events in cancer. Nat. Rev. Genet. 3, 415–428 (2002)
    Article CAS Google Scholar
  36. Gaudet, F. et al. Induction of tumors in mice by genomic hypomethylation. Science 300, 489–492 (2003)
    Article ADS CAS Google Scholar
  37. Feinberg, A. P. & Vogelstein, B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301, 89–92 (1983)
    Article ADS CAS Google Scholar
  38. Kanu, N. et al. SETD2 loss-of-function promotes renal cancer branched evolution through replication stress and impaired DNA repair. Oncogene 34, 5699–5708 (2015)
    Article CAS Google Scholar
  39. Fontebasso, A. M. et al. Mutations in SETD2 and genes affecting histone H3K36 methylation target hemispheric high-grade gliomas. Acta Neuropathol. 125, 659–669 (2013)
    Article CAS Google Scholar
  40. Duns, G. et al. Histone methyltransferase gene SETD2 is a novel tumor suppressor gene in clear cell renal cell carcinoma. Cancer Res. 70, 4287–4291 (2010)
    Article CAS Google Scholar
  41. Neri, F. et al. Genome-wide analysis identifies a functional association of Tet1 and Polycomb repressive complex 2 in mouse embryonic stem cells. Genome Biol. 14, R91 (2013)
    Article Google Scholar
  42. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
    Article ADS CAS Google Scholar
  43. Incarnato, D., Neri, F., Diamanti, D. & Oliviero, S. MREdictor: a two-step dynamic interaction model that accounts for mRNA accessibility and Pumilio binding accurately predicts microRNA targets. Nucleic Acids Res. 41, 8421–8433 (2013)
    Article CAS Google Scholar
  44. Incarnato, D., Neri, F., Anselmi, F. & Oliviero, S. Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome. Genome Biol. 15, 491 (2014)
    Article Google Scholar
  45. Incarnato, D., Krepelova, A. & Neri, F. High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly. Genomics 104, 121–127 (2014)
    Article CAS Google Scholar
  46. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013)
    Article Google Scholar
  47. Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013)
    Article CAS Google Scholar
  48. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)
    Article Google Scholar
  49. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009)
    Article Google Scholar
  50. Chen, C.-Y. A., Ezzeddine, N. & Shyu, A.-B. Messenger RNA half-life measurements in mammalian cells. Methods Enzymol. 448, 335–357 (2008)
    Article CAS Google Scholar
  51. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008)
    Article Google Scholar
  52. Li, J. Y. et al. Synergistic function of DNA methyltransferases Dnmt3a and Dnmt3b in the methylation of Oct4 and Nanog. Mol. Cell. Biol. 27, 8748–8759 (2007)
    Article CAS Google Scholar
  53. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008)
    Article ADS CAS Google Scholar
  54. Sharova, L. V. et al. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 16, 45–58 (2009)
    Article CAS Google Scholar
  55. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011)
    Article ADS Google Scholar

Download references

Acknowledgements

We thank T. Baubec and D. Schübeler for providing the Dnmt3b construct. We thank S. Yamanaka for anti-Dnmt3l antibody. We thank E. Guccione, R. Calogero and T. Bates for helpful suggestions and critical reading of the manuscript. This work was supported by the Associazione Italiana Ricerca sul Cancro (AIRC) IG 2014 Id15217.

Author information

Authors and Affiliations

  1. Human Genetics Foundation (HuGeF), via Nizza 52, Torino, 10126, Italy
    Francesco Neri, Anna Krepelova, Danny Incarnato, Caterina Parlato, Giulia Basile, Mara Maldotti, Francesca Anselmi & Salvatore Oliviero
  2. Leibniz Institute on Aging – Fritz Lipmann Institute (FLI), Beutenbergstrasse 11, Jena, 07745, Germany
    Francesco Neri
  3. Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, via Accademia Albertina 13, Torino, 10123, Italy
    Stefania Rapelli, Anna Krepelova, Mara Maldotti, Francesca Anselmi & Salvatore Oliviero

Authors

  1. Francesco Neri
    You can also search for this author inPubMed Google Scholar
  2. Stefania Rapelli
    You can also search for this author inPubMed Google Scholar
  3. Anna Krepelova
    You can also search for this author inPubMed Google Scholar
  4. Danny Incarnato
    You can also search for this author inPubMed Google Scholar
  5. Caterina Parlato
    You can also search for this author inPubMed Google Scholar
  6. Giulia Basile
    You can also search for this author inPubMed Google Scholar
  7. Mara Maldotti
    You can also search for this author inPubMed Google Scholar
  8. Francesca Anselmi
    You can also search for this author inPubMed Google Scholar
  9. Salvatore Oliviero
    You can also search for this author inPubMed Google Scholar

Contributions

F.N. and S.O. conceived the study; S.R. and A.K. performed genome-wide experiments, cloning and cell treatments; F.N. and D.I. performed genome-wide experiments and data analysis; M.M. performed cloning and cell treatments; C.P. and G.B. performed RNA-seq; F.A. performed CAPIP-seq experiments; F.N. and S.O. wrote the paper with input from all authors.

Corresponding authors

Correspondence toFrancesco Neri or Salvatore Oliviero.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

eviewer Information Nature thanks P. Carninci and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Generation of _Dnmt3b_−/− and mapping of the endogenous Dnmt3b in ES cells.

_Dnmt3b_−/− ES cell clones (B126 and B77) showed normal cell growth and alkaline phosphatase (AP) staining as well as impaired silencing, by promoter DNA methylation of Nanog expression during the differentiation into embryonic bodies (EBs) with respect to the wild-type cell line, indicating the bona fide nature of the transgenic cell lines. a, Schematic of the region of the Dnmt3b gene targeted by TALEN zinc-fingers, and representative sequences on the two alleles of two _Dnmt3b_−/− clones, compared to wild-type (KO #1 = B77; KO #2 = B126). b, Western blot analysis of Dnmt3b protein in the two _Dnmt3b_−/− clones compared to wild-type. Dnmt1 and Dnmt3a2 levels are not affected by loss of Dnmt3b. Actin is used as loading control. Notably, the mRNA level (data not shown) of the Dnmt3b gene is also almost completely lost in _Dnmt3b_−/− cells. c, Growth curve of wild-type and _Dnmt3b_−/− ES cells over 3 days. d, Alkaline phosphatase staining of wild-type and _Dnmt3b_−/− ES cell colonies. e, RT–qPCR of Nanog levels in embryoid bodies derived from _Dnmt3b_−/− clones, compared to wild-type ES cells, and embryoid bodies. Error bars represent the standard deviation of at least three independent experiments. f, Sanger sequencing of bisulphite-treated genomic DNA from wild-type ES cells and embryoid bodies, and _Dnmt3b_−/− ES-cell-derived embryoid bodies, at the region of the Nanog promoter previously shown to be target of Dnmt3b-mediated methylation upon differentiation52. g, Histogram showing the quantity of the DNA recovered in ChIP experiments performed with different antibodies directed against Dnmt3b protein. h, Western blot analysis of Dnmt3b protein in wild-type and _Dnmt3b_−/− ES cells. Actin was used as a loading control. i, Histogram showing the quantity (ng) of the DNA recovered in ChIP experiments performed with anti-Dnmt3b antibody (Ab122932) in wild-type and _Dnmt3b_−/− ES cells. j, Genomic views of the mapped reads from different ChIP-seq datasets in ES cells. IgG and Dnmt3b ChIP-seq and WGBS are from the present work, bio-Dnmt3b from GSE57413, MeDIP-seq from GSE44644, GLIB-seq from GSE44566, histone modifications from GSE12241. k, Left, heat map representations of Dnmt3b binding and relevant histone modifications on a window of ±3 kb centred on the TSS of RefSeq genes, sorted by their expression level, according to RNA-seq data. Right, plots of Dnmt3b binding and relevant histone modifications on a window of ±3 kb centred on the TSS of RefSeq genes, clustered in the four quartiles of expression (q4 = upper quartile, the most expressed genes). l, m, Binding enrichment of IgG (in wild-type ES cells) as well as IgG and Dnmt3b (in _Dnmt3b_−/− ES cells) on the exons or introns partitioned in quartiles on the basis of the expression of the related gene. These figures represent control experiments for the Fig. 1b. n, Hierarchical clustering of pairwise Pearson correlation of Dnmt3b, and third-party ChIP-seq datasets in ES cells, reveals a strong genome-wide association of Dnmt3b with H3K36me3 histone marks. o, Scatter plots comparing intragenic H3K36me3 and IgG/Dnmt3b enrichments (log2) in wild-type and _Dnmt3b_−/− cells. r, Pearson correlation. p, qPCR of ChIP analysis of Dnmt3b on the indicated regions. A specific enrichment can be observed on gene body of active genes. Error bars represent the standard deviation of at least three independent experiments. Primers used are reported in Supplementary Table 1. q, Immunoprecipitation experiment (using a different antibody for Dnmt3b, Ab2851) in ES cells reveals the interaction of Dnmt3b with H3K36me3, but not H3K4me3, in agreement with ChIP-seq data. r, Violin plots of the methylation level of all CpGs in both wild-type and _Dnmt3b_−/− ES cells as determined by WGBS on the indicated genomic features. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 2 Dnmt3b loss increases intragenic RNA transcription initiation.

a, Scatter plots of the log2 RPKM gene values in the indicated samples. b, Genomic views of the RNA-seq mapped reads from the indicated samples. c, Box plots of the ratio between normalized RNA-seq read counts (RPKM) for the second and the first exon (top left), the third and the first exon (bottom left), the average of the intermediate exons (from the fourth to penultimate) and the first exon (top right), the last and the first exon (bottom right), in wild-type (rep #2) and _Dnmt3b_−/− (rep #2 and clone B77) ES cells. P values calculated with Wilcoxon rank-sum test. d, Pie charts showing the percentage of transcripts with log2 fold change ≥1, ≤ −1 or between −1 and 1. e, RT–qPCR analysis of Ints2, Nodal, Gabpa and XpoI transcripts by using primers targeting different exons to discriminate different isoforms in wild-type, _Dnmt3b_−/− (cl. B126) and _Dnmt3b_−/− (cl. B77) ES cells. All the PCR were normalized to β-actin and on the wild-type condition. Error bars represent the standard deviation of at least three independent experiments. P values calculated against wild type condition for each experiment by using _t_-test. **P < 0.001, *P < 0.01. Primers used are reported in Supplementary Table 1.

Extended Data Figure 3 Dnmt3b loss does not extensively affect alternative promoter activation.

We investigate the activation or repression of alternative promoters on the subset of genes showing at least two annotated alternative promoters. On these genes, we measured the RPKM value of the first exon of all the isoforms transcribed from each of the alternative promoters in wild-type or _Dnmt3b_−/− cells. We observed that the _Dnmt3b_−/− cells showed a general trend to exhibit genes with the first exon less expressed independently from the isoform, thus suggesting a non-global general activation of intragenic promoters. Analysis of the ratio of the expression between the first promoter and the second downstream promoter identified four genes (on a total of 2,563 genes) with a reactivation of the intragenic promoter in _Dnmt3b_−/− cells. a, Schematic of the gene dataset used for alternative promoter analysis. The dataset is composed of total 2,563 genes showing at least two annotated alternative promoters, including 713 genes having at least three, 195 genes at least four and another 189 genes with multiple alternative promoters (from at least 5 to a maximum of 12). b, RPKM value of the first exon of all the isoforms transcribed from the alternative promoters in wild-type or _Dnmt3b_−/− ES cells. _Dnmt3b_−/− cells showed a general trend to have genes with the first exon less expressed independently from the isoform, and none of putative intragenic promoters (from the second to the twelfth) showed general activation. c, Analysis of the ratio of the expression of the first promoter over the second downstream promoter displayed high correlation between replicates and wild-type or _Dnmt3b_−/− ES cells. Only four genes (of a total of 2,563 genes) showed a reactivation of the intragenic promoter in _Dnmt3b_−/− ES cells. Further analysis of the ratio between RPKM of the first exon and of the whole transcript for each class of alternative-promoter-transcribed genes did not show any evidence for possible reactivation of any class of transcript isoforms derived from intragenic promoters. d, e, Analysis of the ratio of the RPKM value of the first exon over the whole transcript for each class of alternative promoters transcribed genes showed high correlation between wild-type and _Dnmt3b_−/− ES cells and did not reveal evidence for possible reactivation of any class of transcript isoforms derived from intragenic promoters.

Extended Data Figure 4 Dnmt3b loss does not globally affect elongating Pol II or H3K36me3 deposition on the gene bodies, but increases intragenic Pol II spurious entry.

a, Genomic views of the mapped reads from the indicated different ChIP-seq data sets in wild-type and _Dnmt3b_−/− ES cells normally cultured or treated with the Pol II elongation inhibitor DRB. b, Hierarchical clustering of pairwise Pearson correlation of ChIP-seq experiments performed in this work, and third-party ChIP-seq datasets in ES cells. c, Heat map representations of the indicated ChIP-seq (in wild-type and _Dnmt3b_−/− ES cells) peaks with respect to annotated RefSeq genes, sorted by their expression level, according to RNA-seq data. Each gene was extended by 3 kb upstream of its TSS, and downstream of its TES. d, Plots of the H3K36me3 distribution in wild-type and _Dnmt3b_−/− ES cells. e, Binding enrichment of H3K36me3 on intermediate exons and introns in wild-type and _Dnmt3b_−/− ES cells. f, Binding enrichment of the indicated ChIP-seq experiments in wild-type and _Dnmt3b_−/− ES cells treated (or not) with DRB on the intermediate exons and introns subdivided into quartiles on the basis of expression level. This result demonstrated that only non-elongating Pol II is enriched on the bodies of the most expressed genes (q3 and q4) in _Dnmt3b_−/− ES cells. P values calculated with Wilcoxon rank-sum test; **P < 2.2 × 10−16. g, Genomic views of the mapped reads from the ChIP-seq analyses for H3K4me3 and H3ac in wild-type and _Dnmt3b_−/− ES cells. h, Hierarchical clustering of pairwise Pearson correlation of ChIP-seq experiments performed in this work, compared with ENCODE ChIP-seq datasets. i, Heat map representations of the indicated ChIP-seq (in wild-type and _Dnmt3b_−/− ES cells) peaks with respect to annotated RefSeq genes, sorted by their expression level, according to RNA-seq data. Each gene was extended by 3 kb upstream of its TSS, and downstream of its TES. j, Binding enrichment of the indicated ChIP-seq experiments in wild-type and _Dnmt3b_−/− ES cells on the first exons, intermediate exons and introns subdivided into quartiles on the basis of expression level. This result demonstrates that H3K4me3 and H3ac distribution are enriched on the intermediate exons and introns of the most expressed genes of the _Dnmt3b_−/− ES cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 5 CAPIP-seq enrichment of the 5′ of the RNAs shows that Dnmt3b loss increases intragenic spurious transcription initiation.

a, Schematic view of the CAPIP-seq protocol used. Total RNA is chemically fragmented and then subjected to immunoprecipitation by using a specific anti-CAP antibody or a control anti-IgG antibody. Eluted RNA (as well as one-tenth of the starting material for input) is subjected to random primer reverse-transcription. The library is then completed, starting from second strand generation. b, Scatter plots of the log2 RPKM of CAPIP-seq data (anti-CAP antibody) and input in wild-type and _Dnmt3b_−/− ES cells. c, Hierarchical clustering of pairwise Pearson correlation of CAPIP-seq-related sequencings in wild-type and _Dnmt3b_−/− ES cells. d, Genomic views of the total mapped reads from the indicated CAPIP-seq related sequencings. Enrichment of the CAP signal is present on the 5′ of the RNA as a peak of about 150 bp broader with respect to the signal obtained by performing DECAP-seq. e, Plots of the CAPIP-seq mapped reads distribution in wild-type and _Dnmt3b_−/− ES cells with respect to annotated RefSeq genes, extended by 5 kb upstream of its TSS, and downstream of its TES. f, Box plots of the log2 enrichment of the CAPIP-seq signal rep #2 (CAP immunoprecipitation signal over input in wild-type and _Dnmt3b_−/− ES cells on the indicated genic features). P values calculated with Wilcoxon rank-sum test. g. Further analysis showing the increase of CAP localization from intragenic regions of the RNA. Intragenic ratio is calculated as the log2 ratio of cap signal gene-body enrichment in _Dnmt3b_−/− versus wild-type cells. The correlation between the two replicates is shown.

Extended Data Figure 6 DECAP-seq method maps, at single-base resolution, TSSs on the gene body in ES cells.

a, Schematic representation of the workflow of the DECAP-seq technique that is based on the RNA 5′ pyrophosphohydrolase (RppH) enzymatic activity that in Thermopol buffer is able to mediate decapping and pyrophosphate removal from the 5′ end of RNA to leave a 5′ monophosphate RNA (5′-P). 5′-P RNA is then used for selective adaptor ligation by T4 RNA ligase to the originally capped RNA fragments allowing single-base resolution mapping of the RNA capping sites. Treating sample in the same way, but without RppH enzyme generates a negative control (to detect technical background). Positive control is generated by treating sample with T4 polynucleotide kinase (PNK) for 5′ phosphorylation of all RNA fragments. This method represents an affordable alternative to the use of the tobacco acid pyrophosphatase (TAP) enzyme that has been used in several high-throughput techniques such as GRO-seq, CAP-seq, CIP-TAP53 because the EpiCentre Technologies (to our knowledge, the only company producing commercial TAP) has discontinued TAP and all kits containing it. b, Total RNA fragmentation was verified by using Fragment Analyzer (Advanced Analytical). c, Final DECAP-seq libraries were inspected on Fragment Analyzer before gel size selection. RppH-treated and untreated samples showed a double peak around 130 bp corresponding to the dimers of adaptor, but only the RppH-treated sample showed a higher enrichment (in the red box) corresponding to the decapped RNA fragments. The PNK-treated sample displayed a large peak around 200 bp. d, Final DECAP-seq libraries were quantified on Qubit (Invitrogen) after gel size selection and PCR enrichment. The library generated by treating RNA with RppH showed a fifty-fold higher concentration with respect to the library generated without RppH treatment (5 ng μl−1 versus 0.1 ng μl−1). e, Genomic views of the total DECAP-seq mapped reads from the indicated treatment on a gene (Actb) on the Crick DNA strand (− strand) and a gene (Rpl5) on the Watson DNA strand (+ strand). A pronounced sharp peak (red arrow) is present on the TSS only on the respective gene strand, thus reflecting both the cap- and strand-specificity of the method. Unstranded RNA-seq is shown as reference example. f, Plot of total TSSs (identified by using DECAP-seq rep #2) distribution along genes in _Dnmt3b_−/− (blue line) compared with wild-type (red line) ES cells. g, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and on gene body in wild-type and _Dnmt3b_−/− ES cells. P values calculated with Wilcoxon rank-sum test. h, Histogram showing the average RPM of novel identified TSSs by DECAP-seq in both replicates of wild-type ES cells. i, j, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in both replicate of DECAP-seq samples in wild-type and _Dnmt3b_−/− ES cells. k, Venn diagrams of intragenic TSSs with a DECAP-seq signal RPM > 6 showing the single-base resolution overlap between the DECAP-seq experiment replicates. P values calculated with Hypergeometric Distribution test. l, Venn diagram of intragenic TSSs with a DECAP-seq signal RPM > 6 showing single-base resolution overlap between _Dnmt3b_−/− and wild-type ES cells (rep #2). m, Pie charts of the DECAP-seq read distribution on TSSs RPM > 6 in wild-type (left) and _Dnmt3b_−/− (right) cells (rep #2). In green are shown the novel TSSs that overlap with RefSeq-annotated TSSs. Yellow, all the common TSSs distributed on the gene body; pink, the sample-specific TSSs on the gene body. n, Box plot distribution of the enrichment of the CAPIP-seq and Pol II ChIP-seq signals calculated as the log2 ratio in _Dnmt3b_−/− versus wild-type cells on the novel identified TSSs and on an intragenic random dataset. Green, those overlapping with RefSeq-annotated TSSs; pink, those specifically found on the gene bodies of _Dnmt3b_−/− ES cells. P values calculated with Wilcoxon rank-sum test. o, Box plot distribution of the ratio between downstream and upstream exon expression levels with respect to the novel identified intragenic TSSs or an intragenic random dataset in _Dnmt3b_−/− cells. The exon expression levels were calculated by counting the reads from the RNA-seq experiments in _Dnmt3b_−/− or wild-type cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 7 DECAP-seq maps the internal TSSs in _Dnmt3b_−/− ES cells revealing their correlation with the binding of methylation-sensitive transcription factors.

a, Sequence binding motifs of the indicated transcription factors. b, Schematic representation of CpG localization and putative transcription factor binding elements on the regions (±50 bp) of some intragenic TSSs specific to _Dnmt3b_−/− ES cells. c, RT–qPCR analysis of CAPIP (top) and qPCR analysis of ChIP (middle) experiments on the indicated genomic regions in wild-type and _Dnmt3b_−/− ES cells. For CAPIP RT–qPCR the primers were designed downstream the novel identified TSSs. Bottom panel represents the fold difference of the ratio between downstream and upstream exon expression levels with respect to the novel identified intragenic TSSs. For TSSs falling on exons, the downstream or upstream part of the same exon was considered as downstream or upstream exon if longer than 200 bp. P value was calculated against the wild-type condition using a _t_-test; **P < 0.01; *P < 0.05; n.s., not significant. d, Sanger bisulphite sequencing of intragenic TSSs previously described in wild-type and _Dnmt3b_−/− ES cells. e, qPCR analysis of ChIP experiments on the indicated genomic regions.

Extended Data Figure 8 SetD2 knockdown reduces H3K36me3 marks, Dnmt3b binding, intragenic DNA methylation, and spurious TSSs on the gene bodies.

a, RT–qPCR of SetD2 knockdown in wild-type and _Dnmt3b_−/− ES cells, using two independent shRNAs. Error bars represent the standard deviation of at least three independent experiments. b, Venn diagram showing the genome-wide number of H3K36me3 peaks in control and SetD2 knockdown ES cells. c, Plots of H3K36me3 distribution on genes in control and SetD2 knockdown cells show a decrease of H3K36me3 on the gene bodies of SetD2-silenced cells. d, Histograms of the percentage of Dnmt3b ChIP-seq peaks overlapping intronic and exonic regions of genes grouped into quartiles on the basis of expression in control or SetD2 knockdown cells. P value was calculated with a _χ_2 test; **P < 0.001. **e**, Genomic views of the mapped reads from H3K36me3 and Dnmt3b ChIP-seq datasets in control and two different SetD2 knockdowns ES cells. **f**, qPCR analysis of H3K36me3 and Dnmt3b ChIP experiments and MeDIP analysis in control and SetD2 knockdown cells for the indicated genomic regions. A specific loss of Dnmt3b and DNA methylation is observed only on the gene body of active genes. Error bars represent the standard deviation of at least three independent experiments. _P_ value was calculated against the wild-type condition for each experiment with a _t_-test; **_P_ < 0.001. Primers used are reported in Supplementary Table 1. **g**, Scatter plots of the log2 RPKM gene values in the indicated samples. **h**, Genomic views of the RNA-seq-mapped reads from the indicated samples. **i**, Plot of total TSSs (identified by using DECAP-seq) distribution along genes in SetD2 knockdown (yellow line) compared with control knockdown (red line) ES cells. **j**, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and on gene bodies in control and SetD2 knockdown ES cells. _P_ values calculated with Wilcoxon rank-sum test. **k**, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in control and SetD2 knockdown ES cells. **l**, Venn diagram of intragenic TSSs with a DECAP-seq signal RPM > 6 showing single-base resolution overlap between control and SetD2 knockdown ES cells. m, n, Venn diagrams of intragenic TSSs having DECAP-seq signal RPM > 6 showing single-base resolution overlap between the indicated samples. P values calculated with Hypergeometric Distribution test. o, Pie charts of the DECAP-seq read distribution on TSSs RPM > 6 in control knockdown (top) and SetD2 (bottom) ES cells. In green are the novel TSSs that overlap with RefSeq-annotated TSSs; in yellow, all the common TSSs distributed on gene bodies; and in pink, the sample-specific TSSs on gene bodies.

Extended Data Figure 9 Internal transcription activation in _Dnmt3b_−/− ES cells show the same intragenic TSSs as in SetD2 knockdown cells.

a, Genomic view of the indicated genes showing intragenic transcription initiation increase in _Dnmt3b_−/− and in shSetD2 wild-type cells. Below, Sanger bisulphite sequencing of shCTR (control) and shSetD2 wild-type ES cells on previously described intragenic TSSs. b, Genomic views of the mapped reads from H3K36me3 (in wild-type ES cells) and Dnmt3b ChIP-seq datasets (in mock or Dnmt3b-transfected _Dnmt3b_−/− ES cells). Both the wild-type and the catalytically inactive Dnmt3b(V725G) mutant showed intragenic binding enrichment. c, qPCR analysis of IgG and Dnmt3b ChIP experiments in mock or Dnmt3b (wild-type and V725G) transfected _Dnmt3b_−/− ES cells for the indicated intragenic regions. Error bars represent the standard deviation of at least three independent experiments. P value calculated against the mock condition using a _t_-test; **P < 0.001. Primers used are reported in Supplementary Table 1. d, Dot-blot analysis of genomic DNA isolated from mock or Dnmt3b (wild-type and V725G) transfected _Dnmt3b_−/− ES cells. Dot intensity quantification from three biological replicates revealed that wild-type Dnmt3b (but not the V725G mutant) significantly (P = 0.003) increased global DNA 5mC. P value calculated against the mock condition using a _t_-test. e, qPCR analysis of MeDIP experiments in mock or Dnmt3b (wild-type and V725G) transfected _Dnmt3b_−/− ES cells for the indicated intragenic regions. A significant intragenic increase of DNA methylation is evident in wild-type Dnmt3b (but not mutant) transfected _Dnmt3b_−/− ES cells. Error bars represent the standard deviation of at least three independent experiments. P value calculated against the mock condition using a _t_-test; **P < 0.001. Primers used are reported in Supplementary Table 1. f, Genomic views of the RNA-seq-mapped reads from the indicated samples. g, Scatter plots of the log2 RPKM gene values in the indicated samples. Of note, mock-treated ES cells showed higher correlation with Dnmt3b-mutant-transfected ES cells (r = 0.99) than with wild-type Dnmt3b-transfected ES cells (r = 0.95), suggesting that DNA methylation enzymatic activity is the major driver of the Dnmt3b-dependent transcriptome alterations. h, Western blot of _Dnmt3b_−/− ES cells transfected with mock, wild-type Dnmt3b, Dnmt3b(S277P) or Dnmt3b(VW-RR). β-Actin was used as protein loading control. i, j, qPCR analysis of ChIP and MeDIP experiments of the indicated regions in Dnmt3b mutant conditions. Specific impairment of Dnmt3b binding and DNA methylation is observed in both the mutants compared to rescue using the wild-type Dnmt3b enzyme. Error bars represent the standard deviation of at least three independent experiments. Primers used are reported in Supplementary Table 1.

Extended Data Figure 10 Cryptic RNA transcripts are degraded in part by the RNA exosome complex.

a, RNA-seq profile of _Dnmt3b_−/− cells transfected with mock or Dnmt3b mutants. b, Scatter plots of the log2 RPKM gene values in the indicated samples. c, Box plot of the ratio between normalized RNA-seq read counts (RPKM) for the second, third, intermediate (average) and last exons to the first exon in _Dnmt3b_−/− ES cells transfected with mock, wild-type Dnmt3b or mutant Dnmt3b (S277P and VW-RR). P values calculated with Wilcoxon rank-sum test. d, Pie chart showing the percentage of transcripts with log2 fold change (FC) >1, < −1 or between −1 and 1. **e**, **f**, Histogram and western blot showing mRNA and protein levels of _Dis3_ and _Rrp6_ genes in control or Dis3/Rrp6 double knockdown (dKD) in _Dnmt3b_−/− ES cells. β-Actin was used as protein loading control. **g**, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and gene bodies in control or Dis3/Rrp6 dKD _Dnmt3b_−/− ES cells. _P_ values calculated with Wilcoxon rank-sum test. **h**, Box plot of the normalized DECAP-seq read counts (RPM) on the intragenic TSSs in the indicated samples. _P_ values calculated with Wilcoxon rank-sum test. **i**, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) of the indicated samples. **j**, Venn diagrams of intragenic TSSs with a DECAP-seq signal RPM > 6 showing the single-base resolution overlap between the DECAP-seq experiment replicates performed in _Dnmt3b_−/− ES cells. P values calculated with Hypergeometric Distribution test. k, Pie charts of the DECAP-seq reads distribution on TSSs RPM > 6 in control (left) and Dis3/Rrp6 KD (right) _Dnmt3b_−/− ES cells. In green are shown the novel TSSs that overlap with RefSeq annotated TSSs; in yellow, all the common TSSs distributed on gene bodies; and in pink, the sample-specific TSSs on gene bodies. l, Scatter plots of the log2 RPKM gene values in the indicated samples. m, Genomic views of the RNA-seq mapped reads from the indicated samples. n, Box plots of the ratio between normalized poly(A)+ RNA-seq read counts (RPKM) for the second and the first exon, the third and the first exon, the average of the intermediates (from the fourth to the penultimate exons) and the first exon, and the last and the first exon in wild-type (rep #2) and _Dnmt3b_−/− (rep #2 and clone B77) ES cells. P values calculated with Wilcoxon rank-sum test. o, Pie-chart showing the percentage of transcripts with an intermediate to first exon ratio (in _Dnmt3b_−/− rep #2 and clone B77 poly(A)+ RNA-seq) versus an intermediate to first exon ratio (in wild-type rep #2 poly(A)+ RNA-seq) log2 fold change >1, < −1 or between −1 and 1. **p**, Box plots showing the number of total TSSs per gene on RefSeq-annotated TSSs and gene bodies identified by DECAP-seq in the indicated RNA compartments. _P_ values calculated with Wilcoxon rank-sum test. **q**, Scatter plots of the log2 RPM values on canonical annotated TSSs (±100 bp) in the indicated RNA compartments. **r**, Venn diagram of the common intragenic TSSs (defined as having RPM > 6 in both _Dnmt3b_−/− and wild-type ES cells) in the indicated RNA compartments. s, Box plot of the normalized DECAP-seq read counts (RPM) on the common intragenic TSSs (RPM > 6) in the indicated RNA compartments.

Extended Data Figure 11 Loss of Dnmt3b generates partial intragenic starting RNAs that are as stable as canonical mRNAs.

a, Genomic views of the RNA-seq mapped reads from the indicated samples. Genes with slow, medium and fast decay are shown. b, Gene Ontology (GO) analysis of the subsets of the mRNAs with fast decay (half-life lower than four hours) or slow decay (half-life higher than nine hours) in wild-type ES cells. The analysis revealed that fast-decay mRNAs are mainly involved in cell cycle and transcription biological processes, while slow-decay mRNAs are related to metabolism and translation. This result is in agreement with that previously observed in mouse ES cells54,55, supporting the bona fide nature of the experiment. c, Scatter plot of mRNA half-life (in hours) in wild-type and _Dnmt3b_−/− ES cells. d, e, Box plots of intron half-life (in hours) in wild-type and _Dnmt3b_−/− ES cells. Intron half-life is estimated by considering only the reads mapped to intronic regions. Intron half-life is generally lower than mRNA half-life, suggesting lower stability of the RNAs containing intronic parts. Intron half-life calculated in _Dnmt3b_−/− ES cells is significantly (P = 0.0016) higher than in wild-type ES cells. P values calculated with Wilcoxon rank-sum test. f, g, Frequency distribution of introns and mRNA half-life among all introns in wild-type and _Dnmt3b_−/− ES cells. h, Genomic views of the RNA-seq mapped reads from the indicated samples. ART-seq reads derived only from the coding sequences (CDS) of the mRNAs. RNA-seq is shown as reference example. i, Scatter plots of the log2 RPKM gene values in the indicated samples. j, Box plot of the normalized ART-seq rep #2 read counts (RPKM) on the indicated RNA regions in wild-type and _Dnmt3b_−/− ES cells. k, Box plot of the normalized ART-seq read counts (RPKM, for both the biological replicates) on the introns in wild-type and _Dnmt3b_−/− ES cells. P values calculated with Wilcoxon rank-sum test.

Extended Data Figure 12 Models from of the obtained results.

a, Scheme of the functional role of the Dnmt3b-dependent intragenic DNA methylation in ES cells. In wild-type cells, Dnmt3b is able to methylate gene bodies to favour a repressive chromatin environment that inhibits spurious entries of Pol II. In the absence of Dnmt3b, gene bodies are hypomethylated, leading to Pol II intragenic entries that generate intragenic transcription initiation. b, Epigenetic crosstalk between Pol II, SetD2 and Dnmt3b and relative H3K36me3 and 5mC chromatin modifications unveils how Pol II, through the transcription elongation process, triggers a safety mechanism to ensure its transcription initiation fidelity.

Supplementary information

PowerPoint slides

Rights and permissions

About this article

Cite this article

Neri, F., Rapelli, S., Krepelova, A. et al. Intragenic DNA methylation prevents spurious transcription initiation.Nature 543, 72–77 (2017). https://doi.org/10.1038/nature21373

Download citation