Extensive alternative polyadenylation during zebrafish development - PubMed (original) (raw)

Extensive alternative polyadenylation during zebrafish development

Igor Ulitsky et al. Genome Res. 2012 Oct.

Abstract

The post-transcriptional fate of messenger RNAs (mRNAs) is largely dictated by their 3' untranslated regions (3' UTRs), which are defined by cleavage and polyadenylation (CPA) of pre-mRNAs. We used poly(A)-position profiling by sequencing (3P-seq) to map poly(A) sites at eight developmental stages and tissues in the zebrafish. Analysis of over 60 million 3P-seq reads substantially increased and improved existing 3' UTR annotations, resulting in confidently identified 3' UTRs for >79% of the annotated protein-coding genes in zebrafish. mRNAs from most zebrafish genes undergo alternative CPA, with those from more than a thousand genes using different dominant 3' UTRs at different stages. These included one of the poly(A) polymerase genes, for which alternative CPA reinforces its repression in the ovary. 3' UTRs tend to be shortest in the ovaries and longest in the brain. Isoforms with some of the shortest 3' UTRs are highly expressed in the ovary, yet absent in the maternally contributed RNAs of the embryo, perhaps because their 3' UTRs are too short to accommodate a uridine-rich motif required for stability of the maternal mRNA. At 2 h post-fertilization, thousands of unique poly(A) sites appear at locations lacking a typical polyadenylation signal, which suggests a wave of widespread cytoplasmic polyadenylation of mRNA degradation intermediates. Our insights into the identities, formation, and evolution of zebrafish 3' UTRs provide a resource for studying gene regulation during vertebrate development.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Reannotation of zebrafish 3′ UTRs. (A) Nucleotide sequence composition around all 197,350 3P-seq–identified poly(A) sites. (Black arrow) Cleavage position. As previously noted (Jan et al. 2011), the sharp adenosine peak at position +1, the depletion of A at position –1, and blurring of sequence composition at other positions was partly due to cases of cleavage after an A, for which the templated A was assigned to the poly(A) tail, resulting in a –1-nt offset from the cleavage-site register. (Inset) Sequence composition around poly(A) sites in C. elegans (Jan et al. 2011), redrawn for comparison. (B) Frequencies of sites containing the canonical PAS motif AAUAAA or one of its ten common variants in the region from –40 to –10 relative to the poly(A) site. Known distal 3′ ends are the distal-most poly(A) sites annotated in Ensembl, and known proximal 3′ ends are all other annotated poly(A) sites. Novel distal 3′ ends are poly(A) sites more distal than the distal-most annotated 3′ end. All other novel 3′ ends were designated as proximal. (C) Classification of poly(A) sites as fractions of sites or as fractions of the 3P tags. The poly(A) site classification scheme is described in Supplemental Figure S1. (D) Genes with alternative 3′ UTR isoforms in Ensembl v66 and following 3P-seq-based annotation. (E) Maximal 3′ UTR lengths in Ensembl v66 and following the 3P-seq-based annotations. For the new models, the longest 3′ UTR was supported by at least 10% of the 3P tags in at least one sample. (F) 3′ UTRs annotated for the magi2 gene. 3P-seq and RNA-seq tracks indicate all tags mapping to this locus. No 3′ UTR was annotated for this gene in Ensembl v66.

Figure 2.

Figure 2.

Changes in 3′ UTR lengths in different developmental stages. (A) Distribution of 3′ UTR lengths in different stages and tissues. In each sample, for each gene with a single annotated or predicted stop codon and 3P-seq data, the mean 3′ UTR length was computed by averaging the lengths of all the 3′ UTRs, weighted by the number of 3P tags supporting each of them. Box plots show the median length, flanked by 25th and 75th percentiles. The whiskers are drawn to the fifth and 95th percentile. (B) Negative correlation between 3′ UTR length and transcript levels 24 hpf. For each gene, the mean 3′ UTR length was computed as in A, and the RPKM was computed using available RNA-seq data from the same developmental stage (SRA accession ERP000016), considering only protein-coding regions. (C) Lack of correlation between 3′ UTR length and transcript levels in the pre-MZT embryo. As in B, except RNA-seq RPKM was computed using available RNA-seq data from the two-cell embryo (SRA accession ERX008924). (D) 3′ UTR lengths of genes expressed in the ovary and in the brain. Lengths were computed as in A. (E) Lengths of 3′ UTRs resulting from proximal and distal poly(A) sites in analysis of genes with substantial differences in isoform fractions (>0.3) when comparing ovary and brain samples. (F) Poly(A) sites of rilpl2. The gene model shown is as annotated in Ensembl v66. 3P-seq tracks show tags from clusters containing at least 10% of the tags in the indicated samples. (Red and black arrows) Position of the qRT-PCR primers for the constant and the alternative regions of the transcript, respectively. (G) qRT-PCR analysis of changes in 3′ UTR usage during early embryogenesis. RT was performed with random primers and expression levels were computed using probes located in the constant and alternative regions of the transcript (cUTR and aUTR, respectively) (Supplemental Fig. S4) and normalized to expression at 6 hpf. (n.d.) aUTR could not be detected at that time point.

Figure 3.

Figure 3.

Changes in expression and polyadenylation of CPA factors during zebrafish development. (A) Relative expression of CPA-related genes (listed in Liu et al. 2007) in ovary and brain. Mammalian homologs of CstF factors are in parentheses. (B) Expression of mRNA for CstF factors in different stages and tissues. (C) Differential alternative polyadenylation of papolb. The transcript models shown are as annotated in Ensembl v66. The height of each plot indicates the number of 3P tags ending at each position, normalized to the maximum value, which is indicated at the top of each axis. (D) Expression of mRNA for poly(A) polymerases in different stages and tissues.

Figure 4.

Figure 4.

Differential accumulation of transcripts with short 3′ UTRs in the ovary and pre-MZT embryo. (A) Comparison of 3′ UTR lengths of the short and the long isoforms in genes with exactly two isoforms in the ovary. (B) Poly(A) sites of rnaset2 in the indicated samples. The shown 3′ UTR structure is as annotated in Ensembl v66. The height of each plot indicates the number of 3P tags ending at each position, normalized to the maximum value, which is indicated at the top of each axis. (C) Relationship between length of the shorter isoform and relative abundance of the shorter isoform in the pre-MZT embryo, as inferred from 3P tags for genes with two alternative poly(A) sites in the ovary. (D) Relationship between the length of the 3′ UTR in the ovary and the change in mRNA observed in the pre-MZT embryo relative to that in the ovary, as inferred by 3P tags. Analysis was for genes with a single 3′ UTR supported by at least 20 3P tags in the ovary. (E) Frequency of the indicated motifs or nucleotides flanking poly(A) sites of isoforms that were not reduced in the pre-MZT embryo compared to the ovary (stable) and those that were reduced at least twofold (unstable). For the hexamer motifs, the frequencies shown at each position are averages of a window of 11 consecutive nucleotides centered at that position. (F) The motif identified by Amadeus (Linhart et al. 2008) as significantly enriched in regions upstream of the stable sites. (G) Destabilization in the pre-MZT embryo of shorter isoforms lacking the U-rich motif. Genes with two UTR isoforms in the ovary and a U-rich motif 20–90 nt upstream of only one of the poly(A) sites were stratified based on the location of the motif—upstream of the distal site (red line) or upstream of the proximal site (blue line). A U-rich motif was defined as present if a decamer with up to two mismatches from the UK(U)8 consensus appeared 20–90 nt upstream of the poly(A) site.

Figure 5.

Figure 5.

Many noncanonical poly(A) sites in the pre-MZT embryo. (A) Abundance of sample-specific poly(A) sites. Site classification was as described in Supplemental Figure S1. Downstream sites are those appearing up to 8 kb downstream from the annotated 3′ ends but without the support for connectivity with the stop codon required for assignment as a novel 3′ UTR. (B) Sequence composition near poly(A) sites specific to the testis. (C) Poly(A) sites of edc3. The 3′ UTRs shown are as annotated in Ensembl v66. (D) Density of poly(A) sites occurring within 3′ UTRs from the indicated samples. Poly(A)-site density was defined as the ratio between the number of poly(A) sites and the length of the longest 3′ UTR. (E) Densities of poly(A) sites in the coding sequence and 3′ UTRs plotted for genes expressed in the pre-MZT embryo. (F) Sequence composition near poly(A) sites specific to the pre-MZT embryo.

Figure 6.

Figure 6.

Genome-wide estimation of poly(A)-tail lengths. (A) Outline of 3P-PEseq. See text for description. (B) Poly(A) sites and poly(A)-tail lengths pre-MZT and post-MZT at the acsl4 3′ UTR. The 3′ UTR shown is as annotated in Ensembl v66. The height of the 3P-Seq plots shows the number of 3P tags at each position, normalized to the maximum value, which is indicated at the top of each axis. The height of the 3P-PEseq plots shows the average poly(A)-tail length measured at each position. The length is zero at positions for which no 3P-PEseq tags were obtained. (Arrow) The poly(A) site corresponding to the paired reads shown below. (Bold) Untemplated nucleotides. In this example, the cleavage position is within three genomically encoded A's, and thus, at least 77 of the 80 T's of read #1 correspond to untemplated A's of the the poly(A) tail. The adapter sequence in read #2 is in green italics. (C) Distribution of pre-MZT poly(A)-tail lengths at poly(A) sites mapping to the indicated loci and RNA classes. Canonical 3′ UTR ends are tallied as 3P-PE tags that map within 20 nt of a 3′ UTR end annotated using samples from other stages. Noncanonical 3′ UTR ends are tallied as 3P-PE tags that map within a 3′ UTR annotated in other stages but not within 20 nt of a poly(A) site defined at any other stage. rRNA and mitochondrial ends are tallied as tags that map within rRNA repeats and the mitochondrial chromosome, respectively. (D) Abundance of noncanonical isoforms. Plotted is the ratio of 3P or 3P-PE tags mapping to noncanonical 3′ UTR ends (defined as in C) relative to those mapping to canonical 3′ UTR ends.

Figure 7.

Figure 7.

Analysis of paralogous gene pairs generated in the teleost whole-genome duplication. (A) Relationship of coding-sequence lengths for pairs of paralogous genes. Genes of each pair were arbitrarily assigned to each axis. (B) Relationship of 3′ UTR lengths for pairs of paralogous genes. 3′ UTR lengths are the weighted average across all the samples in which the transcript had at least one 3P tag.

References

    1. Aanes H, Winata CL, Lin CH, Chen JP, Srinivasan KG, Lee SG, Lim AY, Hajan HS, Collas P, Bourque G, et al. 2011. Zebrafish mRNA sequencing decipher novelties in transcriptome dynamics during maternal to zygotic transition. Genome Res 21: 1328–1338 - PMC - PubMed
    1. Anderson JT 2005. RNA turnover: Unexpected consequences of being tailed. Curr Biol 15: R635–R638 - PubMed
    1. Andreassi C, Riccio A 2009. To localize or not to localize: mRNA fate is in 3′UTR ends. Trends Cell Biol 19: 465–474 - PubMed
    1. Boguski MS, Lowe TM, Tolstoshev CM 1993. dbEST–database for “expressed sequence tags.” Nat Genet 4: 332–333 - PubMed
    1. Brennecke J, Stark A, Russell RB, Cohen SM 2005. Principles of microRNA-target recognition. PLoS Biol 3: e85 doi: 10.1371/journal.pbio.0030085 - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources