A large-scale analysis of mRNA polyadenylation of human and mouse genes - PubMed (original) (raw)

A large-scale analysis of mRNA polyadenylation of human and mouse genes

Bin Tian et al. Nucleic Acids Res. 2005.

Abstract

mRNA polyadenylation is a critical cellular process in eukaryotes. It involves 3' end cleavage of nascent mRNAs and addition of the poly(A) tail, which plays important roles in many aspects of the cellular metabolism of mRNA. The process is controlled by various cis-acting elements surrounding the cleavage site, and their binding factors. In this study, we surveyed genome regions containing cleavage sites [herein called poly(A) sites], for 13,942 human and 11,155 mouse genes. We found that a great proportion of human and mouse genes have alternative polyadenylation ( approximately 54 and 32%, respectively). The conservation of alternative polyadenylation type or polyadenylation configuration between human and mouse orthologs is statistically significant, indicating that alternative polyadenylation is widely employed by these two species to produce alternative gene transcripts. Genes belonging to several functional groups, indicated by their Gene Ontology annotations, are biased with respect to polyadenylation configuration. Many poly(A) sites harbor multiple cleavage sites (51.25% human and 46.97% mouse sites), leading to heterogeneous 3' end formation for transcripts. This implies that the cleavage process of polyadenylation is largely imprecise. Different types of poly(A) sites, with regard to their relative locations in a gene, are found to have distinct nucleotide composition in surrounding genomic regions. This large-scale study provides important insights into the mechanism of polyadenylation in mammalian species and represents a genomic view of the regulation of gene expression by alternative polyadenylation.

PubMed Disclaimer

Figures

Figure 1

Figure 1

(A) Schematic representation of a poly(A) site and polyadenylation configuration. In this study, a poly(A) site is a region containing cleavage site(s) (arrowed lines). The 5′-most cleavage site is the reference point (position 0) for the poly(A) site. Thus, the genomic location of a poly(A) site is represented by the location of the 5′-most cleavage site it contains. The sequence −300 to +300 is defined as a terminal sequence. The sites for CPSF and CstF are also depicted. (B) Three types of polyadenylation configuration. A type 1 gene has a single poly(A) site; a type II gene has alternative poly(A) sites all located in the 3′-most exon; and a type III gene has alternative poly(A) sites located in different exons. Types of poly(A) sites are also marked. 1S, a single poly(A) site; 2F, the 5′-most poly(A) site in a type II gene; 2L, the 3′-most poly(A) site in a type II gene; 2M, a middle poly(A) site between 2F and 2L in a type II gene; 3U, a poly(A) site located upstream of the 3′-most exon; and 3S, a single site in the 3′-most exon of a type III gene. Not shown in the graph are 3F, 3M and 3L, which are similar to 2F, 2M and 2L, respectively, except that the formers are located in the 3′-most exon of a type III gene. Exons are represented as boxes; pA, poly(A) site.

Figure 2

Figure 2

Poly(A) sites of human genes. (A) Histogram of the genomic distance between adjacent poly(A) sites in a gene. (B) Histogram of the distance between adjacent poly(A) sites, both located in the 3′-most exon of a gene (median = 288 nt). (C) Histogram of the distance between the stop codon and its closest downstream poly(A) site (median = 324 nt). The _x_-axes in all graphs are in base-2 logarithmic scale. For each histogram, a Gaussian smoothing kernel method was used to generate a density line.

Figure 3

Figure 3

Multiple cleavage sites in a poly(A) site. (A) Histogram of the genomic distance between adjacent cleavage sites in genes. (B) Histogram of the distance between the 5′-most cleavage site and other downstream cleavage sites when multiple cleavage sites are present in a poly(A) site (mean = 7.9 nt, median = 5 nt). (C) The relationship between the number of PAS hexamers (AAUAAA and other 11 variants) associated with a poly(A) site and the number of cleavage sites in the poly(A) site. Error bars are standard error of the mean (SEM). (D) Correlation between the number of cleavage sites and the number of supporting cDNA/EST sequences for poly(A) sites (Pearson correlation coefficient R = 0.83). (E) Histogram of the distance between a poly(A) site and the associated PAS when only one PAS is present. Only human poly(A) sites are used in (A–E).

Figure 4

Figure 4

Conservation of polyadenylation configuration between human and mouse orthologs. (A) Conservation of polyadenylation configuration between human (rows) and mouse (columns) orthologs (χ2-test, _P_-value = 2.0 × 10−132). Expected values, based on the null hypothesis that there is no correlation, are shown in parentheses. Observed values in (A) are plotted in (B), with the closed bars corresponding to conserved configurations, i.e. human type I versus mouse type I, etc.

Figure 5

Figure 5

Characteristics of different types of poly(A) sites. (A) Association of various PAS hexamers with different types of poly(A) sites [for detailed definition of nine types of poly(A) sites see Figure 1B and Results]. (B) Cluster analysis of PAS hexamers and poly(A) types. The grayscale heat map represents the percentages of usage of PAS hexamers in different poly(A) types, with the sum of all values for each poly(A) type set to 100%. The shade of a cell indicates its value, with darker ones corresponding to higher values. Two-way hierarchical clustering was conducted using Euclidean distance as the metric. (C) Percentage of the number of supporting cDNA/EST sequences for different types of poly(A) sites. The total number of supporting cDNA/EST sequences for a gene is set to 100%. (D) Distribution of the number of cleavage sites per poly(A) site for different types of poly(A) site. Different shades are used to represent the number of cleavage sites per poly(A) site.

Figure 6

Figure 6

Nucleotide composition of human terminal sequences. Human terminal sequences containing nine types of poly(A) sites are plotted. The poly(A) site type is marked in each graph, and the number of sequences used for each graph is shown in parentheses. The _y_-axis for each graph is the percentage of a nucleotide (%) and the _x_-axis is the genomic location (nt) relative to the poly(A) site. See Figure 1B and Results for detailed definitions of nine poly(A) site types.

References

    1. Lewis J.D., Gunderson S.I., Mattaj I.W. The influence of 5′ and 3′ end structures on pre-mRNA metabolism. J. Cell. Sci. Suppl. 1995;19:13–19. - PubMed
    1. Jacobson A., Peltz S.W. Interrelationships of the pathways of mRNA decay and translation in eukaryotic cells. Annu. Rev. Biochem. 1996;65:693–739. - PubMed
    1. Wickens M., Anderson P., Jackson R.J. Life and death in the cytoplasm: messages from the 3′ end. Curr. Opin. Genet. Dev. 1997;7:220–232. - PubMed
    1. Maniatis T., Reed R. An extensive network of coupling among gene expression machines. Nature. 2002;416:499–506. - PubMed
    1. Proudfoot N. New perspectives on connecting messenger RNA 3′ end formation to transcription. Curr. Opin. Cell. Biol. 2004;16:272–278. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources