An in-depth map of polyadenylation sites in cancer - PubMed (original) (raw)

. 2012 Sep 1;40(17):8460-71.

doi: 10.1093/nar/gks637. Epub 2012 Jun 29.

Affiliations

An in-depth map of polyadenylation sites in cancer

Yuefeng Lin et al. Nucleic Acids Res. 2012.

Abstract

We present a comprehensive map of over 1 million polyadenylation sites and quantify their usage in major cancers and tumor cell lines using direct RNA sequencing. We built the Expression and Polyadenylation Database to enable the visualization of the polyadenylation maps in various cancers and to facilitate the discovery of novel genes and gene isoforms that are potentially important to tumorigenesis. Analyses of polyadenylation sites indicate that a large fraction (∼30%) of mRNAs contain alternative polyadenylation sites in their 3' untranslated regions, independent of the cell type. The shortest 3' untranslated region isoforms are preferentially upregulated in cancer tissues, genome-wide. Candidate targets of alternative polyadenylation-mediated upregulation of short isoforms include POLR2K, and signaling cascades of cell-cell and cell-extracellular matrix contact, particularly involving regulators of Rho GTPases. Polyadenylation maps also helped to improve 3' untranslated region annotations and identify candidate regulatory marks such as sequence motifs, H3K36Me3 and Pabpc1 that are isoform dependent and occur in a position-specific manner. In summary, these results highlight the need to go beyond monitoring only the cumulative transcript levels for a gene, to separately analysing the expression of its RNA isoforms.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Characteristics of polyadenylation sites. DRS reads from normal breast tissue are used for this illustration. (A) DRS reads predominantly map to annotated 3′ ends of known genes (bin size = 10 nts). (B) Cleavage sites of short isoforms in either normal (normal short) or tumor (tumor short), are generally (9/10) more variant than that of the corresponding long isoforms (P values on top). (C) In breast, the majority (∼90%) of DRS reads match to sense strands of transcriptionally active regions and the remaining reads mainly map to intergenic regions and introns. For illustration purposes, intergenic polyadenylation sites are assigned to the sense strand because categorizing (see ‘Materials and Methods’) them into sense and antisense strands separately can be ambiguous. (D) Polyadenylation density (number of sites/Mb) in internal introns is higher than that of terminal (5′ and 3′) introns for both sense and antisense intronic transcripts. (E) A considerable fraction (∼30%) of genes contains tandem polyadenylation (pA) sites within the same 3′ UTR. (F) Distribution of distances between adjacent tandem 3′ UTR pA sites of genes expressed in normal breast (bin size = 50 nts). Because of long 3′ UTRs, the separation between adjacent pA sites within the same UTR can be very large (>5 Kb) in some cases (1%).

Figure 2.

Figure 2.

Genomic view of polyadenylated non-coding RNAs and novel gene locations that are aberrantly expressed in cancer using xPAD. (A–D) All polyadenylation sites detected by DRS reads (green, normal; red, tumor) are indicated for all four gene regions. (A, B) GAS5 (A) and TMEM191A (B) represent lncRNAs that are upregulated in the majority of tumor samples, as indicated. In contrast to the polycistronic GAS5 which hosts multiple polyadenylated snoRNAs, polyadenylation of TMEM191A is limited to its 3′ UTR. (C and D) End locations and the expression levels of two potentially differentially regulated novel genes that are distantly located from known genes. Real-time PCR results also reveal similar expression patterns (fold change, P < 0.001); error bars represent standard deviation (n = 3).

Figure 3.

Figure 3.

Polyadenylation site usage reveal up-regulation of short isoforms in tumors. (A) Illustration on determining various 3′ UTR isoform-specific quantities (blue box) using DRS. Total number of normalized DRS reads for each isoform, abbreviated as Short (red) and Long (blue), are used to measure various changes between tumor and normal. (B and C) Median (thick line), 75% quantile (upper border), 25% quantile (lower border) and interquartile range (whiskers) are shown for each distribution (P values on top). (B) Short isoforms tend to be up-regulated in all tumor samples as indicated by the median (median >0, P values on top). (C) In contrast to short isoforms that have a consistent pattern, the median expression of long isoforms seems to flip between up/neutral/downregulation.

Figure 4.

Figure 4.

Polyadenylation maps enable the identification of isoform-dependent regulatory marks. (A–C) A total of 3270 genes containing both long and short forms that are genomically separated by at least 100 nts at their 3′ ends are used for the analysis. (A) Motifs that are preferentially located near polyadenylation sites (position 0), and are more prevalently used by either short (red) or long (blue) isoform. Readily noticeable locational preferences of the consensus motif, visible as peaks (arrows), are generally within ±20 nts of the polyadenylation sites. (B and C) Chip-Seq/RIP-Seq data comparisons of H3K36Me3 and Pabpc1 to their functional analogs (Pol2 and Elav1) in identical cell lines (B: HepG2, C: GM12878) suggest preferential marking of polyadenylation sites of short isoforms by H3K36Me3 and Pabpc1. The curves correspond to the distance distribution between the location of a given polyadenylation site and the nearest regulatory mark (H3K36Me3, Pol2, Pabpc1 and Elav1), as inferred using Chip-Seq/RIP-Seq.

Figure 5.

Figure 5.

Illustration of xPAD. xPAD integrates the UCSC genome browser to provide a web-interface to visualize both the precise polyadenylation locations of different isoforms, as well as their expression levels across tissues of interest. The complete gene structure of POLR2K highlights the utility of DRS; in both normal and tumor tissues, all polyadenylation sites exclusively occur in the 3′ UTR of the gene and within the sense strand, and the 3′ end of the reads (green/red bars) mapping to the long isoform matches within 2 nts of the 3′ UTR polyadenylation site. Normal breast contains 120 reads of long, and 109 reads of short isoforms, whereas breast tumor contains 257 reads of short isoform, which is upregulated, and 134 reads of long isoform that is almost unchanged. For brevity, many additional features such as evolutionary conservation (bottom track) and methylation marks (not shown), which are available via the UCSC browser panel (right) are not illustrated.

References

    1. Chatterjee S, Pal JK. Role of 5′- and 3′-untranslated regions of mRNAs in human diseases. Biol. Cell. 2009;101:251–262. - PubMed
    1. Pickering BM, Willis AE. The implications of structured 5′ untranslated regions on translation and disease. Semin. Cell. Dev. Biol. 2005;16:39–47. - PubMed
    1. Hesketh J. 3′-Untranslated regions are important in mRNA localization and translation: lessons from selenium and metallothionein. Biochem. Soc. Trans. 2004;32:990–993. - PubMed
    1. Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science. 2008;320:1643–1647. - PMC - PubMed
    1. Mayr C, Bartel DP. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009;138:673–684. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources