Detection and evaluation of intron retention events in the human transcriptome - PubMed (original) (raw)

Detection and evaluation of intron retention events in the human transcriptome

Pedro Alexandre Favoretto Galante et al. RNA. 2004 May.

Abstract

Alternative splicing is a very frequent phenomenon in the human transcriptome. There are four major types of alternative splicing: exon skipping, alternative 3' splice site, alternative 5' splice site, and intron retention. Here we present a large-scale analysis of intron retention in a set of 21,106 known human genes. We observed that 14.8% of these genes showed evidence of at least one intron retention event. Most of the events are located within the untranslated regions (UTRs) of human transcripts. For those retained introns interrupting the coding region, the GC content, codon usage, and the frequency of stop codons suggest that these sequences are under selection for coding potential. Furthermore, 26% of the introns within the coding region participate in the coding of a protein domain. A comparison with mouse shows that at least 22% of all informative examples of retained introns in human are also present in the mouse transcriptome. We discuss that the data we present suggest that a significant fraction of the observed events is not spurious and might reflect biological significance. The analyses also allowed us to generate a reliable set of intron retention events that can be used for the identification of splicing regulatory elements.

PubMed Disclaimer

Figures

FIGURE 1.

FIGURE 1.

Possible cases of intron retention. The prototype sequence and the sequence containing the retained intron can be either a full-insert cDNA or an EST. Cases in which both sequences correspond to ESTs were excluded from our data set. Exons are represented as open bars, introns as lines, and retained introns are shown as black bars (not in scale).

FIGURE 2.

FIGURE 2.

(A) RT-PCR validation of a representative cDNA (BC004239) showing intron retention. When primers flanking the retained intron are used, two bands are visible corresponding to both variants (even numbers). If a primer specific to the intronic sequence is used, a single band corresponding to the variant with the retention is observed (odd numbers). cDNAs from colon, brain, prostate, and lung were used in the reactions. Template was excluded in the control reactions. (B) Experimental validation for all remaining genes in all tissues. (+) The retained intron was observed in the corresponding tissue. A gel containing all validation data is presented as Supplemental Figure 2, which can be found at

http://www.compbio.ludwig.org.br/\~pgalante/IR/

FIGURE 3.

FIGURE 3.

Contribution of retained introns to the coding of protein domains. A protein domain can be either entirely encoded by a retained intron (AF267861) or partially encoded by a retained intron (AJ005891 and AF230393). In the sequence AF267861, the retained intron encodes the entire “Elongation factor Tu” domain. In the sequence AJ005891, the entire retained intron codes for part of the “Fork head” domain. In the sequence AF230393, part of the retained intron codes for part of the SPRY domain.

FIGURE 4.

FIGURE 4.

Nucleotide composition of retained and nonretained introns and exons bordering nonretained introns, all of them located within CDS. Because nucleotide composition varies in introns of different length in humans, all comparisons were performed within specific length classes. Retained introns have a higher GC content in all length categories when compared to nonretained introns (p < 10−50). There is no difference in the distribution of retained introns and their bordering exons (p = 0.23).

FIGURE 5.

FIGURE 5.

Similarity of codon usage in introns (I), exons (E), and retained introns (RI). Codon usage of retained introns is considerably more similar to exons than to that of nonretained introns as evaluated by two different measurements: (A) the mean χ2 value for the comparison of all 61 codons at once in 1000 simulations; (B) the average number of amino acids with similar codon usage frequency. Curved arrows represent internal variations within each subset.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
    1. Baker, B.S. 1989. Sex in flies: The splice of life. Nature 340: 521–524. - PubMed
    1. Bashirullah, A., Cooperstock, R.L., and Lipshitz, H.D. 2001. Spatial and temporal control of RNA stability. Proc. Natl. Acad. Sci. 98: 7025–7028. - PMC - PubMed
    1. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276–280. - PMC - PubMed
    1. Black, D.L. 2003. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72: 291–336. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources