Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan - PubMed (original) (raw)
doi: 10.1038/nmeth.1470. Epub 2010 Jun 13.
Nicolas Bertin, Hazuki Takahashi, Roberto Simone, Md Salimullah, Timo Lassmann, Morana Vitezic, Jessica Severin, Signe Olivarius, Dejan Lazarevic, Nadine Hornig, Valerio Orlando, Ian Bell, Hui Gao, Jacqueline Dumais, Philipp Kapranov, Huaien Wang, Carrie A Davis, Thomas R Gingeras, Jun Kawai, Carsten O Daub, Yoshihide Hayashizaki, Stefano Gustincich, Piero Carninci
Affiliations
- PMID: 20543846
- PMCID: PMC2906222
- DOI: 10.1038/nmeth.1470
Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan
Charles Plessy et al. Nat Methods. 2010 Jul.
Abstract
Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and noncoding RNAs with capped 5' ends that vary in size. Methods to identify the 5' ends of transcripts will facilitate the discovery of new promoters and 5' ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we developed nano-cap analysis of gene expression (nanoCAGE), a method that captures the 5' ends of transcripts from as little as 10 ng of total RNA, and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5' ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
Conflict of interest statement
Statement of competing interest
CP, PC and RS are inventors of the Japanese patent application held by RIKEN on the moderately suppressive PCR step of the nanoCAGE protocol.
Figures
Figure 1. Experimental outline of the nanoCAGE and CAGEscan protocols
(a) nanoCAGE captures the 5′ ends of molecules by template switching. When polymerizing the cDNA of a capped mRNA, the reverse transcriptase adds extra cytosines that are complementary to the cap. Thus each 5′ full-length cDNAs is extended upon hybridization of the riboguanosine-tailed “template-switching” oligonucleotides to these extra cytosines. (b) In the semi-suppressive PCR, the short templates fold intramolecularly and prevent the binding of primers which precludes amplification; longer molecules are less likely to fold and are thus amplified. Templates derived from reaction artifacts form stable homo-duplexes also precluding amplification. (c) Preparation of nanoCAGE tags. After template-switching, semi-suppressive PCR and EcoP151 cleavage, 25 bp are ligated to oligonucleotide adapters that contain a sequence identifier (red box). After PCR amplification, the nanoCAGE tags are subjected to sequencing by synthesis. (d) Preparation of 5′-full-length cDNA libraries for paired-end sequencing with the CAGEscan protocol. Capped mRNAs capture is similar to a. The ends of the amplified cDNA constructs are replaced by PCR with adapters for sequencing in the Illumina Genome Analyzer, that produces paired-end reads from single cDNAs.
Figure 2. nanoCAGE specifically captures capped 5′ ends
(a) nanoCAGE detects 5′ ends of capped RNA molecules. The relative frequency of CAGE tags over all RefSeq transcript models was plotted on a compound scale going from 500 bp to the start of the RefSeq (in gray) and then from 0% to 100% of the RefSeq (in black). Decapping a sample results in decreasing the prevalence of tags representing the 5′ end. Combination of decapping and fragmentation completely abolishes the detection of 5′ ends. (b) Three independent methods of 5′ end capture, respectively based on oligo-capping, CAP trapper and template switching, detect same 5′ ends as exemplified here for the histone gene HIST1H3C, represented on a horizontal axis. The RefSeq model starts with the coding sequence at position 26,153,618 of the chromosome 6. TSSs are represented by vertical bars proportional to the number of tags they contain. The size of the highest bar is normalized for all three experiments and its expression value is written in gray at its left side.
Figure 3. Promotome-transcriptome analysis with CAGEscan
(a) Annotation matrix summarizing the connections between genomic regions by CAGEscan mate pairs. We divided the genome in features that are intergenic, intron, promoter, 5′ UTR, coding sequence (CDS), 3′ UTR, antisense in introns, or antisense in exons according to RefSeq. The CAGEscan mate pairs were counted for each combination of features. For the libraries made from cytoplasmic, nuclear, cytoplasmic poly-A− and nuclear poly-A− RNAs, a matrix of 8 rows by 7 columns representing the indicated transcript features is plotted. The area of each cell is proportional to the number of pairs connecting a given combination of features. The percentages indicated below each column represent the fraction of mate pairs initiated from the same feature. The pairs starting in an intergenic feature were discarded to better visualize the differences between the other combinations. Notable combinations of features are colored. (b) The nuclear compartment contains more intron-intron pairs. Pairs of bars indicate that the libraries are technical replicates. For the experiment with six biological replicates, the percentages were averaged (error bars represent s.d., n = 6), and we observe a statistically significant difference (#) (P = 0.019. paired Student’s t-test).
Figure 4. CAGEscan connects promoters and downstream sequences
(a) Schematic representation of CAGEscan paired-end tags clustering. Tags arising from overlapping 5′ ends are used as seed to aggregate 3′ tags into unique cluster. Depicted in blue and red are two overlapping but distinct resulting CAGEscan clusters. (b) Genomic representation of the CAGEscan data. Horizontal bars indicate annotation features (Chromosomal coordinates, Entrez Gene loci, RefSeq transcript models, CAGEscan clusters), and vertical bars represent quantitative activity of the promoters detected by the 5′ reads of the CAGEscan libraries (CAGEscan expression) in tag per million. Features and expression arising from the plus or minus strand are colored in green and purple respectively. The FTL gene (thick green bar) has a strongly active promoter (asterisk), from which originate enough CAGEscan pairs to reconstitute the gene’s intron-exon structure in a single CAGEscan cluster (blue). Antisense transcripts to FTL loci are more abundant in the nuclear libraries.
Figure 5. Expressed repeat elements surveyed by CAGEscan
Expression in tags per million of LINE, LTR, SINE, and srpRNA repeated elements in cytoplasmic (C) and nuclear fractions (N) from total and non-polyadenlyated RNA (poly-A−). Adjacent bars indicate technical replicates. Whisker plots summaries data from six additional biological replicates. The boxplots and whisker plots sub-panels use different scales. The nucleus appears strongly enriched for LINE, LTR and SINE transcripts, which are non-polyadenylated, while the cytoplasm appears strongly enriched in srpRNA, both in the total RNA and the non-polyadenylated fraction.
Similar articles
- NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes.
Salimullah M, Sakai M, Plessy C, Carninci P. Salimullah M, et al. Cold Spring Harb Protoc. 2011 Jan 1;2011(1):pdb.prot5559. doi: 10.1101/pdb.prot5559. Cold Spring Harb Protoc. 2011. PMID: 21205859 Free PMC article. - NanoCAGE: A Method for the Analysis of Coding and Noncoding 5'-Capped Transcriptomes.
Poulain S, Kato S, Arnaud O, Morlighem JÉ, Suzuki M, Plessy C, Harbers M. Poulain S, et al. Methods Mol Biol. 2017;1543:57-109. doi: 10.1007/978-1-4939-6716-2_4. Methods Mol Biol. 2017. PMID: 28349422 - NanoCAGE-XL: An Approach to High-Confidence Transcription Start Site Sequencing.
Ivanchenko MG, Megraw M. Ivanchenko MG, et al. Methods Mol Biol. 2018;1830:225-237. doi: 10.1007/978-1-4939-8657-6_13. Methods Mol Biol. 2018. PMID: 30043373 - The RNA continent.
Yasuda J, Hayashizaki Y. Yasuda J, et al. Adv Cancer Res. 2008;99:77-112. doi: 10.1016/S0065-230X(07)99003-X. Adv Cancer Res. 2008. PMID: 18037407 Review. - The complexity of the mammalian transcriptome.
Gustincich S, Sandelin A, Plessy C, Katayama S, Simone R, Lazarevic D, Hayashizaki Y, Carninci P. Gustincich S, et al. J Physiol. 2006 Sep 1;575(Pt 2):321-32. doi: 10.1113/jphysiol.2006.115568. Epub 2006 Jul 20. J Physiol. 2006. PMID: 16857706 Free PMC article. Review.
Cited by
- Multiplex generation and single cell analysis of structural variants in a mammalian genome.
Pinglay S, Lalanne JB, Daza RM, Koeppel J, Li X, Lee DS, Shendure J. Pinglay S, et al. bioRxiv [Preprint]. 2024 Feb 12:2024.01.22.576756. doi: 10.1101/2024.01.22.576756. bioRxiv. 2024. PMID: 38405830 Free PMC article. Preprint. - Nano-CUT&Tag for multimodal chromatin profiling at single-cell resolution.
Bárcenas-Walls JR, Ansaloni F, Hervé B, Strandback E, Nyman T, Castelo-Branco G, Bartošovič M. Bárcenas-Walls JR, et al. Nat Protoc. 2024 Mar;19(3):791-830. doi: 10.1038/s41596-023-00932-6. Epub 2023 Dec 21. Nat Protoc. 2024. PMID: 38129675 Review. - Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome.
Maeng JH, Jang HJ, Du AY, Tzeng SC, Wang T. Maeng JH, et al. Genome Res. 2023 Dec 27;33(12):2143-2155. doi: 10.1101/gr.277061.122. Genome Res. 2023. PMID: 38065624 Free PMC article. - An improved method for the highly specific detection of transcription start sites.
Seki M, Kuze Y, Zhang X, Kurotani KI, Notaguchi M, Nishio H, Kudoh H, Suzaki T, Yoshida S, Sugano S, Matsushita T, Suzuki Y. Seki M, et al. Nucleic Acids Res. 2024 Jan 25;52(2):e7. doi: 10.1093/nar/gkad1116. Nucleic Acids Res. 2024. PMID: 37994784 Free PMC article. - ZBTB12 is a molecular barrier to dedifferentiation in human pluripotent stem cells.
Han D, Liu G, Oh Y, Oh S, Yang S, Mandjikian L, Rani N, Almeida MC, Kosik KS, Jang J. Han D, et al. Nat Commun. 2023 Feb 9;14(1):632. doi: 10.1038/s41467-023-36178-9. Nat Commun. 2023. PMID: 36759523 Free PMC article.
References
- Kodzius R, et al. CAGE: cap analysis of gene expression. Nat Methods. 2006;3:211–222. - PubMed
- Carninci P. Cap-Analysis Gene Expression (CAGE): Genome-Scale Promoter Identification and Association with Expression Profile and Regulatory Networks. Pan Stanford Publishing; Singapore: 2009.
- Carninci P, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38:626–635. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- U54 HG004557-04/HG/NHGRI NIH HHS/United States
- U54 HG004557-03/HG/NHGRI NIH HHS/United States
- U54 HG004557-02/HG/NHGRI NIH HHS/United States
- U54 HG004557/HG/NHGRI NIH HHS/United States
- TCR05001/TI_/Telethon/Italy
- U54 HG004557-02S1/HG/NHGRI NIH HHS/United States
- U54 HG004557-01/HG/NHGRI NIH HHS/United States
- U54 HG004557-05/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous