A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing - PubMed (original) (raw)

A comprehensive computational characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative splicing

Rodger B Voelker et al. Genome Res. 2007 Jul.

Abstract

Orthologous mammalian introns contain many highly conserved sequences. Of these sequences, many are likely to represent protein binding sites that are under strong positive selection. In order to identify conserved protein binding sites that are important for splicing, we analyzed the composition of intronic sequences that are conserved between human and six eutherian mammals. We focused on all completely conserved sequences of seven or more nucleotides located in the regions adjacent to splice-junctions. We found that these conserved intronic sequences are enriched in specific motifs, and that many of these motifs are statistically associated with either alternative or constitutive splicing. In validation of our methods, we identified several motifs that are known to play important roles in alternative splicing. In addition, we identified several novel motifs containing GCT that are abundant and are associated with alternative splicing. Furthermore, we demonstrate that, for some of these motifs, conservation is a strong indicator of potential functionality since conserved instances are associated with alternative splicing while nonconserved instances are not. A surprising outcome of this analysis was the identification of a large number of AT-rich motifs that are strongly associated with constitutive splicing. Many of these appear to be novel and may represent conserved intronic splicing enhancers (ISEs). Together these data show that conservation provides important insights into the identification and possible roles of cis-acting intronic sequences important for alternative and constitutive splicing.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Example of mammalian genomic alignment showing conserved exonic and intronic sequences. Shown is a small portion of the human gene LMO7 aligned against the orthologous region of the six mammalian genomes used in this study. (A) Four exons (represented as boxes) and their corresponding introns. The central graphic indicates the observed splicing events. Below that, the conservation is represented as a histogram where the height is proportional to the degree of conservation. (B) An expanded view of the sequence flanking the 5′ splice-junction of the second, alternatively spliced, exon. The actual sequence is displayed below the conservation histogram. Boxes are placed around conserved sequences (CSs). The original graphics were drawn using the UCSC Genome Browser (

http://genome.ucsc.edu

; Kent et al. 2002).

Figure 2.

Figure 2.

Schematic representation of mammalian introns detailing the regions used in this study. The positions of the 5′ and 3′ splice-junctions are indicated as 5′ SJ and 3′ SJ. Sequence logos, composed from 5000 randomly sampled human introns, are used to show the frequency composition of the splice-junctions. The intronic regions that are the basis of this study are indicated as DI (donor intronic) and AI (acceptor intronic).

Figure 3.

Figure 3.

Distribution of total lengths of CSs found in donor and acceptor intronic regions and associations between CS length and alternative splicing. (A,B) Distributions of the lengths of the CSs found in the donor or acceptor intronic regions. In cases where more than one CS was found in a particular intron, the lengths were combined (total length). For each data set, the bin width is equivalent to two bases. The number of CSs in each bin is indicated along the _Y_-axis. The inset plots are the same data displayed using a log scaled _Y_-axis to better visualize the longer CSs. (C) The relationship between the percentage of introns that are alternatively spliced and the total length of CS found within the intron. The horizontal bar indicates the average percentage of splice-junctions that are alternatively spliced (3%). (D) The number (and corresponding percentages) of all alternatively spliced (Alt+) or constitutively spliced (Alt−) splice-junctions that contain a CS in the donor (DI) or acceptor (AI) intronic flanking sequence.

Figure 4.

Figure 4.

Scatter-plots for the counts of all _n_-mers (4–7 nt) in the CS samples (NCS) vs. the counts in the corresponding random samples (NRS). Overlaid on plots A and C are the _n_-mers that were significantly enriched (according to the confidence intervals described in the Supplemental Materials and Methods) in the donor intronic (DI) and acceptor intronic (AI) regions. Overlaid on plots B and D are all _n_-mers containing the substrings indicated. These substrings are examples of substrings that are enriched in the corresponding regions.

Figure 5.

Figure 5.

Samples of GCCS clusters derived from the donor intronic (DI) region. Shown are the graph clusters representing the clustered _n_-mers used to construct the CSMs for the putative Fox and QKI protein binding sites (Fig. 7, DI-1 and DI-2, respectively). Vertices are colored according to their conservation Z-score (see color key). The graphs were drawn using GraphViz (

http://www.graphviz.org/

).

Figure 6.

Figure 6.

Box-plots showing the distributions of TA-scores observed for several representative CSMs. The greatest common substrings (GCS) for each CSM are shown to the left. CSMs that were significantly enriched in _n_-mers associated with alternative splicing are shown in red, those significantly associated with constitutive splicing are in blue, and no association is shown in yellow.

Figure 7.

Figure 7.

Intronic conserved sequence motifs (CSMs) showing significant associations with alternative (above line) or constitutive splicing (below line) are shown against a schematic representation of an intron to indicate the region within which they are located. Motifs marked with an asterisk are compositionally similar to the equivalently numbered motif in the other region.

Figure 8.

Figure 8.

Association between motif conservation and alternative splicing for several 5-mers. Vertical bars represent the percentage of occurrences of each 5-mer that was observed in the DI region of an alternatively spliced intron. The lower horizontal bar indicates the average for all non-CS 5-mers. The upper horizontal bar indicates the average for all CS 5-mers.

Similar articles

Cited by

References

    1. Agresti A., Coull B.A., Coull B.A. Approximate is better than exact for interval estimation of binomial proportions. Am. Stat. 1998;52:119–126.
    1. Baraniak A.P., Chen J.R., Garcia-Blanco M.A., Chen J.R., Garcia-Blanco M.A., Garcia-Blanco M.A. Fox-2 mediates epithelial cell-specific fibroblast growth factor receptor 2 exon choice. Mol. Cell. Biol. 2006;26:1209–1222. - PMC - PubMed
    1. Barreau C., Paillard L., Osborne H.B., Paillard L., Osborne H.B., Osborne H.B. AU-rich elements and associated factors: Are there unifying principles? Nucleic Acids Res. 2005;33:7138–7150. - PMC - PubMed
    1. Barreau C., Paillard L., Mereau A., Osborne H.B., Paillard L., Mereau A., Osborne H.B., Mereau A., Osborne H.B., Osborne H.B. Mammalian CELF/Bruno-like RNA-binding proteins: Molecular characteristics and biological functions. Biochimie. 2006;88:515–525. - PubMed
    1. Berglund J.A., Chua K., Abovich N., Reed R., Rosbash M., Chua K., Abovich N., Reed R., Rosbash M., Abovich N., Reed R., Rosbash M., Reed R., Rosbash M., Rosbash M. The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Cell. 1997;89:781–787. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources