Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan - PubMed (original) (raw)

doi: 10.1038/nmeth.1470. Epub 2010 Jun 13.

Nicolas Bertin, Hazuki Takahashi, Roberto Simone, Md Salimullah, Timo Lassmann, Morana Vitezic, Jessica Severin, Signe Olivarius, Dejan Lazarevic, Nadine Hornig, Valerio Orlando, Ian Bell, Hui Gao, Jacqueline Dumais, Philipp Kapranov, Huaien Wang, Carrie A Davis, Thomas R Gingeras, Jun Kawai, Carsten O Daub, Yoshihide Hayashizaki, Stefano Gustincich, Piero Carninci

Affiliations

Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan

Charles Plessy et al. Nat Methods. 2010 Jul.

Abstract

Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and noncoding RNAs with capped 5' ends that vary in size. Methods to identify the 5' ends of transcripts will facilitate the discovery of new promoters and 5' ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we developed nano-cap analysis of gene expression (nanoCAGE), a method that captures the 5' ends of transcripts from as little as 10 ng of total RNA, and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5' ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.

PubMed Disclaimer

Conflict of interest statement

Statement of competing interest

CP, PC and RS are inventors of the Japanese patent application held by RIKEN on the moderately suppressive PCR step of the nanoCAGE protocol.

Figures

Figure 1

Figure 1. Experimental outline of the nanoCAGE and CAGEscan protocols

(a) nanoCAGE captures the 5′ ends of molecules by template switching. When polymerizing the cDNA of a capped mRNA, the reverse transcriptase adds extra cytosines that are complementary to the cap. Thus each 5′ full-length cDNAs is extended upon hybridization of the riboguanosine-tailed “template-switching” oligonucleotides to these extra cytosines. (b) In the semi-suppressive PCR, the short templates fold intramolecularly and prevent the binding of primers which precludes amplification; longer molecules are less likely to fold and are thus amplified. Templates derived from reaction artifacts form stable homo-duplexes also precluding amplification. (c) Preparation of nanoCAGE tags. After template-switching, semi-suppressive PCR and EcoP151 cleavage, 25 bp are ligated to oligonucleotide adapters that contain a sequence identifier (red box). After PCR amplification, the nanoCAGE tags are subjected to sequencing by synthesis. (d) Preparation of 5′-full-length cDNA libraries for paired-end sequencing with the CAGEscan protocol. Capped mRNAs capture is similar to a. The ends of the amplified cDNA constructs are replaced by PCR with adapters for sequencing in the Illumina Genome Analyzer, that produces paired-end reads from single cDNAs.

Figure 2

Figure 2. nanoCAGE specifically captures capped 5′ ends

(a) nanoCAGE detects 5′ ends of capped RNA molecules. The relative frequency of CAGE tags over all RefSeq transcript models was plotted on a compound scale going from 500 bp to the start of the RefSeq (in gray) and then from 0% to 100% of the RefSeq (in black). Decapping a sample results in decreasing the prevalence of tags representing the 5′ end. Combination of decapping and fragmentation completely abolishes the detection of 5′ ends. (b) Three independent methods of 5′ end capture, respectively based on oligo-capping, CAP trapper and template switching, detect same 5′ ends as exemplified here for the histone gene HIST1H3C, represented on a horizontal axis. The RefSeq model starts with the coding sequence at position 26,153,618 of the chromosome 6. TSSs are represented by vertical bars proportional to the number of tags they contain. The size of the highest bar is normalized for all three experiments and its expression value is written in gray at its left side.

Figure 3

Figure 3. Promotome-transcriptome analysis with CAGEscan

(a) Annotation matrix summarizing the connections between genomic regions by CAGEscan mate pairs. We divided the genome in features that are intergenic, intron, promoter, 5′ UTR, coding sequence (CDS), 3′ UTR, antisense in introns, or antisense in exons according to RefSeq. The CAGEscan mate pairs were counted for each combination of features. For the libraries made from cytoplasmic, nuclear, cytoplasmic poly-A− and nuclear poly-A− RNAs, a matrix of 8 rows by 7 columns representing the indicated transcript features is plotted. The area of each cell is proportional to the number of pairs connecting a given combination of features. The percentages indicated below each column represent the fraction of mate pairs initiated from the same feature. The pairs starting in an intergenic feature were discarded to better visualize the differences between the other combinations. Notable combinations of features are colored. (b) The nuclear compartment contains more intron-intron pairs. Pairs of bars indicate that the libraries are technical replicates. For the experiment with six biological replicates, the percentages were averaged (error bars represent s.d., n = 6), and we observe a statistically significant difference (#) (P = 0.019. paired Student’s t-test).

Figure 4

Figure 4. CAGEscan connects promoters and downstream sequences

(a) Schematic representation of CAGEscan paired-end tags clustering. Tags arising from overlapping 5′ ends are used as seed to aggregate 3′ tags into unique cluster. Depicted in blue and red are two overlapping but distinct resulting CAGEscan clusters. (b) Genomic representation of the CAGEscan data. Horizontal bars indicate annotation features (Chromosomal coordinates, Entrez Gene loci, RefSeq transcript models, CAGEscan clusters), and vertical bars represent quantitative activity of the promoters detected by the 5′ reads of the CAGEscan libraries (CAGEscan expression) in tag per million. Features and expression arising from the plus or minus strand are colored in green and purple respectively. The FTL gene (thick green bar) has a strongly active promoter (asterisk), from which originate enough CAGEscan pairs to reconstitute the gene’s intron-exon structure in a single CAGEscan cluster (blue). Antisense transcripts to FTL loci are more abundant in the nuclear libraries.

Figure 5

Figure 5. Expressed repeat elements surveyed by CAGEscan

Expression in tags per million of LINE, LTR, SINE, and srpRNA repeated elements in cytoplasmic (C) and nuclear fractions (N) from total and non-polyadenlyated RNA (poly-A−). Adjacent bars indicate technical replicates. Whisker plots summaries data from six additional biological replicates. The boxplots and whisker plots sub-panels use different scales. The nucleus appears strongly enriched for LINE, LTR and SINE transcripts, which are non-polyadenylated, while the cytoplasm appears strongly enriched in srpRNA, both in the total RNA and the non-polyadenylated fraction.

Similar articles

Cited by

References

    1. Shiraki T, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA. 2003;100:15776–15781. - PMC - PubMed
    1. Kodzius R, et al. CAGE: cap analysis of gene expression. Nat Methods. 2006;3:211–222. - PubMed
    1. Carninci P. Cap-Analysis Gene Expression (CAGE): Genome-Scale Promoter Identification and Association with Expression Profile and Regulatory Networks. Pan Stanford Publishing; Singapore: 2009.
    1. Carninci P, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38:626–635. - PubMed
    1. Suzuki H, et al. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet. 2009;41:553–562. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources