Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes - PubMed (original) (raw)

Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes

Kian Huat Lim et al. Proc Natl Acad Sci U S A. 2011.

Abstract

We present an intuitive strategy for predicting the effect of sequence variation on splicing. In contrast to transcriptional elements, splicing elements appear to be strongly position dependent. We demonstrated that exonic binding of the normally intronic splicing factor, U2AF65, inhibits splicing. Reasoning that the positional distribution of a splicing element is a signature of its function, we developed a method for organizing all possible sequence motifs into clusters based on the genomic profile of their positional distribution around splice sites. Binding sites for serine/arginine rich (SR) proteins tended to be exonic whereas heterogeneous ribonucleoprotein (hnRNP) recognition elements were mostly intronic. In addition to the known elements, novel motifs were returned and validated. This method was also predictive of splicing mutations. A mutation in a motif creates a new motif that sometimes has a similar distribution shape to the original motif and sometimes has a different distribution. We created an intraallelic distance measure to capture this property and found that mutations that created large intraallelic distances disrupted splicing in vivo whereas mutations with small distances did not alter splicing. Analyzing the dataset of human disease alleles revealed known splicing mutants to have high intraallelic distances and suggested that 22% of disease alleles that were originally classified as missense mutations may also affect splicing. This category together with mutations in the canonical splicing signals suggest that approximately one third of all disease-causing mutations alter pre-mRNA splicing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Exonic binding of the intronic activator, U2AF65, inhibits splicing. (A) SELEX motifs were mapped to a dataset of 312,275 human splice site regions and plotted on an amalgamated exon. (B) The synthetic polypyrimidine tract returned by the SELEX consensus U2AF65 motifs and a genomic polypyrimidine tract were ligated into an exon and tested for U2AF65 binding by UV cross-linking in extract without antibody (lane 1, 3, and 5) or in extract that was blocked by an anti-U2AF65 antibody (lane 2 and 4). The radiolabel transferred to several products of differing mobility—a 65 kD interaction that was sensitive to preincubation with antiU2AF65 antibody is indicated with an arrow. (C) The sizes of RT-PCR products reflecting varying degrees of splicing are shown by the arrows. The disruptive effects of ligating the synthetic and natural PPT into the test exon of pZW4 is shown by RT-PCR in lane 7 and 8.

Fig. 2.

Fig. 2.

Clustering motifs according to their positional distribution around splice sites. The positional distributions of all 4,096 possible hexamers were plotted around a database of human splice sites. (A) Several comparisons of two hypothetical hexamers (word 1 and word 2) are drawn to illustrate three different scenarios. L1 distance (shaded blue area) is used to compare normalized frequency distributions. Low L1 distance indicates there are small differences between two positional distributions and the two hexamers have the same or no difference in splicing function. High L1 distance denotes the two positional distributions are vastly different and likely differ in their role in splicing. (B) L1 distance was used to cluster the hexamers into 51 distinct groups based on the shape of their positional distributions around splice sites. Motifs and positional distributions of all 51 clusters can be found in the supplement. The clusters that correspond to the canonical splicing elements are indicated in red. (C) The arrangement of these elements on a prototypical pre-mRNA is annotated on the exon diagram. Hexamers within these clusters were aligned into motifs. Average occurrence frequencies of all the cluster’s hexamer were calculated at each position around the splice site database.

Fig. 3.

Fig. 3.

Minigene assay of element function confirms splicing differences between wild-type cluster exemplars and predicted mutants. (A) The clusters selected for functional analysis are indicated in red. (B) Exemplars drawn from each cluster are tested with their variants and no insert controls in several splicing reporter constructs. Total RNA from transfection into 293 cells was analyzed by RT-PCR. Arrows indicate the nature of the splicing product. M2 denotes the point mutant with the highest intraallelic L1 distance predicted to be most deleterious to the splicing function of the wild-type insert. (C) Additional exemplars for clusters 30 and 35, along with exemplars for clusters 8 and 17 were used to contrast the effect of predicted neutral mutations (M1) or the effect of predicted change-of-function mutations (M2) with wild-type splicing. As before, the M2 mutation is the variation with the highest intraallelic L1 distance, and the negative control, the M1 mutation, has the lowest intraallelic L1 distance.

Fig. 4.

Fig. 4.

Human disease alleles are predicted to disrupt splicing. (A) Average intraallelic L1 distances for each category of mutation (HGMD splicing and HGMD missense/nonsense) and their corresponding background models of simulated mutations divided by location with respect to the splice sites. Error bars denote 95% confidence intervals. (B) Receiver operating characteristics (ROC) curve analysis using HGMD splicing mutants in regions around the 3′ss and 5′ss as “true positives” and simulated mutations as “true negatives.” ROC curve analysis classifies these mutations at decreasing thresholds of L1 stringency plotting the false against true positive rates. The exonic region is shown in red; upstream and downstream intronic regions are shown in green and blue, respectively. (C) Exemplars were selected from the HGMD missense mutants with the highest intraallelic L1 distance. Total RNA from transfection into 293 cells was analyzed by RT-PCR. The HGMD ID, gene name, and the mutational position are shown for each experiment. Quantifications on exon inclusion products are also shown. Arrows indicate the identity of the splicing product.

Similar articles

Cited by

References

    1. Jurica MS, Licklider LJ, Gygi SR, Grigorieff N, Moore MJ. Purification and characterization of native spliceosomes suitable for three-dimensional structural analysis. RNA. 2002;8:426–439. - PMC - PubMed
    1. Jurica MS, Sousa D, Moore MJ, Grigorieff N. Three-dimensional structure of C complex spliceosomes by electron microscopy. Nat Struct Mol Biol. 2004;11:265–269. - PubMed
    1. Nilsen TW. The spliceosome: No assembly required? Mol Cell. 2002;9:8–9. - PubMed
    1. Chen YI, et al. Proteomic analysis of in vivo-assembled pre-mRNA splicing complexes expands the catalog of participating factors. Nucleic Acids Res. 2007;35:3928–3944. - PMC - PubMed
    1. Nilsen TW. The spliceosome: the most complex macromolecular machine in the cell? Bioessays. 2003;25:1147–1149. - PubMed

MeSH terms

Substances

LinkOut - more resources